Amazon Athena is an interactive query service to query data and analyze big data in Amazon S3 using standard SQL.
Objects extracted
The Metadata Command Center service extracts the following objects from an Amazon Athena source system:
•Database
•Schema
•View
•ViewColumn
•External Table
•External Column
•Calculation
•Resource
Prerequisites for configuring an Amazon Athena catalog source
Use the Amazon Athena connector to connect to the Amazon Athena source system. For information about configuring a connection in Administrator, see Connections in the Cloud Common Services help.
Configure permissions or access to Amazon Athena
Permissions to extract metadata
This section addresses permissions for configuring an Amazon Athena connection. Amazon Athena uses Amazon S3 buckets to store query results.
Grant the following Identity and Access Management (IAM) permissions to the user for the INFORMATION_SCHEMA database and all user-defined databases that you want to scan:
Grant the following IAM permissions to the user to perform operations on Amazon S3 buckets:
s3:PutObject s3:GetObject s3:GetBucketLocation
Grant permissions that allow you to perform the following operations:
- select on INFORMATION_SCHEMA.SCHEMATA
- show tables
Permissions to run data profiles
You do not need additional permissions to run data profiles.
Data profiling for Amazon Athena
Configure data profiling to run profiles on the metadata extracted from an Amazon Athena source system. You can run data profiles on the following Amazon Athena objects:
•External tables created in the following file formats:
- Avro
- CSV
- Delta
- JSON
- Parquet
•External columns
You can view the profiling statistics in Data Governance and Catalog. The data profiling task runs profiles on the following data types for Amazon Athena objects:
•Bigint
•Boolean
•Char
•Date
•Decimal
•Double
•Float
•Int
•Smallint
•String
•Timestamp
•Tinyint
•Varchar
Sampling type
Determine the sample rows on which you want to run the data profiling task. You can choose one of the following sampling types for an Amazon Athena catalog source:
- All Rows
- Limit N Rows
- Custom Query. Enter the sampling method to specify a percentage of rows on which you want to run the data profiling task. For example, TABLESAMPLE BERNOULLI(10) or TABLESAMPLE SYSTEM(10)
Note: You can run data quality only on views and external tables that are created in Amazon Athena.
Create a connection to Amazon Athena
When you configure a connection to the Amazon Athena source system in Administrator, you can view the connection properties for that connection on the Registration page in Metadata Command Center.
The following table describes the Amazon Athena connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
Amazon Athena
Use Secret Vault
Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
This property appears only if secrets manager is set up for your organization.
When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
The name of the runtime environment where you want to run tasks.
Authentication Type
The authentication mechanism to connect to Amazon Athena. Select Permanent IAM Credentials or EC2 instance profile. Permanent IAM credentials is the default authentication mechanism. Permanent IAM requires an access key and secret key to connect to Amazon Athena. Use the EC2 instance profile when the Secure Agent is installed on an Amazon Elastic Compute Cloud (EC2) system. This way, you can configure AWS Identity and Access Management (IAM) authentication to connect to Amazon Athena.