Create an Open Table connection to securely load data to Open Table formats available in a catalog.
Prerequisites
Before you create an Open Table connection, complete the prerequisites.
If you use an AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg, you need to have access to the following AWS services that manage the tables on AWS:
•AWS Glue Catalog: AWS Glue Catalog manages the metadata associated with the Apache Iceberg tables.
•Amazon S3 Storage: Amazon S3 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
•Amazon Athena: Amazon Athena uses the AWS Glue Data Catalog to store metadata such as table and column names for your data stored in Amazon S3. Open Table Connector uses the Amazon Athena JDBC driver to connect to the AWS Glue Catalog to access Apache Iceberg table metadata.
You need to create separate policies to access these services.
Create minimal IAM policies
You need to create IAM policies with the minimum required permissions to interact with Apache Iceberg tables managed by AWS Glue Catalog. For more information on configuring these policies, refer to the AWS documentation.
Minimum policy for Amazon Athena
The following sample policy shows the minimal Amazon IAM policy to access Amazon Athena:
You can configure an EC2 role to assume an IAM role and generate temporary security credentials to connect to Amazon S3 from the same or different AWS accounts.
When you configure EC2 role to assume role, ensure that you have the sts:AssumeRole permission and a trust relationship established within the AWS accounts to use the temporary security credentials. The trust relationship is defined in the trust policy of the IAM role when you create the role. The IAM role adds the EC2 role as a trusted entity allowing the EC2 role to use the temporary security credentials and access the AWS accounts.
When the trusted EC2 role requests for the temporary security credentials, the AWS Security Token Service (AWS STS) dynamically generates the temporary security credentials that are valid for a specified period and provides the credentials to the trusted EC2 role.
Before you use the EC2 Role to Assume Role authentication, consider the following prerequisites:
•Install the Secure Agent on the AWS EC2 instance.
•The EC2 role attached to the AWS EC2 instance must have permissions to assume another IAM role.
The following is a sample permission policy of EC2 role that is attached to the AWS EC2 instance:
Resource value must include the ARN of IAM role that the EC2 role needs to assume.
•The IAM role that the EC2 role needs to assume must have a permission policy and a trust policy attached to access AWS Glue Catalog, Amazon Athena, and Amazon S3.
You can also specify the external ID of your AWS account for a more secure access. The external ID must be a string.
The following sample shows the assumed IAM role's trust policy with the external ID:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::001234567890:root" //anyone in this account 001234567890 can assume this role, this can also be limited to one role. }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aws_externalid" } } } ] }
Let's configure the Open Table connection properties to connect to AWS Glue Catalog.
Before you begin
Before you get started, you will need to create the minimal IAM policies to interact with Apache Iceberg tables managed by AWS Glue Catalog. You also need to configure the authentication-specific prerequisites to connect to Amazon S3.
Permanent IAM Credentials authentication for Amazon S3 requires the access key and secret key values of the IAM user. EC2 Role to Assume Role authentication for Amazon S3 requires the ARN of the IAM role that the EC2 role assumes to generate temporary security credentials.
Check out Prerequisites to learn more about how to configure policies and role to access Apache Iceberg tables.
Connection details
The following table describes the Open Table connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Use Secret Vault
Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
This property appears only if secrets manager is set up for your organization.
This property is not supported by Data Ingestion and Replication and the Data Access Management services.
When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
Note: If you’re using this connection to apply data access policies through pushdown or proxy services, you cannot use the Secret Vault configuration option.
For example, jdbc:athena://Region=us-west1;OutputLocation=s3://working/dir.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to AWS Glue Catalog without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a Client ID and Client Secret to obtain an access token from the authorization server.
Storage types
You can choose Amazon S3 as the storage type to store the Open Table format tables.
Select the storage type and configure the storage specific authentication parameters.
Amazon S3
If you use AWS Glue Catalog as the catalog type, configure the properties specific to Amazon S3 storage.
Permanent IAM Credentials authentication
You can use Permanent IAM Credentials authentication for Amazon S3 storage when you connect to an AWS Glue Catalog.
The following table describes the properties to configure Permanent IAM Credentials authentication:
Property
Description
Access Key
The key to access the AWS Glue Catalog.
Secret Key
The secret key to access the AWS Glue Catalog. The secret key is associated with the access key and uniquely identifies the account.
EC2 Role to Assume Role authentication
You can use EC2 Role to Assume Role authentication for Amazon S3 storage only when you read Apache Iceberg tables from AWS Glue Catalog.
The following table describes the properties to configure EC2 Role to Assume Role authentication:
Property
Description
IAM Role ARN
The ARN of the IAM role assumed by the EC2 role to generate the temporary session credentials.
External ID
A unique, user-defined string value that the IAM role requires the EC2 role to provide when calling the sts:AssumeRole API.