Before you create an Open Table connection, complete the prerequisites.
Using AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables
If you use an AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables, you need to have access to the following AWS services that manage the tables on AWS:
•AWS Glue Catalog: AWS Glue Catalog manages the metadata associated with the Apache Iceberg or Delta Lake tables.
•Amazon S3 Storage: Amazon S3 stores the Apache Iceberg or Delta Lake tables containing actual records in columnar format, organized in partitioned directories.
•Amazon Athena: Amazon Athena uses the AWS Glue Data Catalog to store metadata such as table and column names for your data stored in Amazon S3. Open Table Connector uses the Amazon Athena JDBC driver to connect to the AWS Glue Catalog to access Apache Iceberg or Delta Lake tables metadata.
You need to create separate policies to access these services.
Using Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables
If you use a Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Microsoft Azure Data Lake Storage Gen2:
•Hive Metastore catalog: Hive Metastore catalog manages the metadata associated with the Apache Iceberg tables. The metastore must use Hive version 4.0.
•Microsoft Azure Data Lake Storage Gen2: Microsoft Azure Data Lake Storage Gen2 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
•Hive JDBC driver: Hive JDBC driver connects to the Hive server to access the metadata of Apache Iceberg tables.
Using Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables
If you use a Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:
•Hive Metastore catalog: Hive Metastore catalog manages the metadata associated with the Apache Iceberg tables. The metastore must use Hive version 4.0.
•Amazon S3 storage: Amazon S3 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
•Hive JDBC driver: Hive JDBC driver connects to the Hive server to access the metadata of Apache Iceberg tables.
Using REST catalog and Amazon S3 to interact with Apache Iceberg tables
If you use a REST catalog such as Polaris catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:
•REST catalog: REST catalog manages the metadata associated with the Apache Iceberg tables.
•Amazon S3 storage: Amazon S3 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
Note:
For
Data Ingestion and Replication
, you can use only AWS Glue Catalog with Amazon S3 Storage for Iceberg tables. Delta tables and other catalogs are not applicable.
Create minimal IAM policies
You need to create IAM policies with the minimum required permissions to interact with Apache Iceberg or Delta Lake tables managed by AWS Glue Catalog. For more information on configuring these policies, refer to the AWS documentation.
Minimum policy for Amazon Athena
The following sample policy shows the minimal Amazon IAM policy to access Amazon Athena:
Before you use Open Table Connector, you need to copy the Amazon Athena or Hive JDBC driver on the Linux machine where you installed the Secure Agent. You need to use the Amazon Athena driver for the AWS Glue Catalog and the Hive JDBC driver for the Hive Metastore catalog.
1Download the latest Hive JDBC driver from the website.
2Navigate to the following directory on the Secure Agent machine: <Secure Agent installation directory>/ext/connectors/thirdparty/
3Create the following folder: informatica.opentableformat/common
4Add the JDBC driver to the folder.
5Restart the Secure Agent.
Configure EC2 role to assume role
You can configure an EC2 role to assume an IAM role and generate temporary security credentials to connect to Amazon S3 from the same or different AWS accounts.
When you configure EC2 role to assume role, ensure that you have the sts:AssumeRole permission and a trust relationship established within the AWS accounts to use the temporary security credentials. The trust relationship is defined in the trust policy of the IAM role when you create the role. The IAM role adds the EC2 role as a trusted entity allowing the EC2 role to use the temporary security credentials and access the AWS accounts.
When the trusted EC2 role requests for the temporary security credentials, the AWS Security Token Service (AWS STS) dynamically generates the temporary security credentials that are valid for a specified period and provides the credentials to the trusted EC2 role.
Before you use the EC2 Role to Assume Role authentication, consider the following prerequisites:
•Install the Secure Agent on the AWS EC2 instance.
•The EC2 role attached to the AWS EC2 instance must have permissions to assume another IAM role.
The following is a sample permission policy of EC2 role that is attached to the AWS EC2 instance:
Resource value must include the ARN of IAM role that the EC2 role needs to assume.
•The IAM role that the EC2 role needs to assume must have a permission policy and a trust policy attached to access AWS Glue Catalog, Amazon Athena, and Amazon S3.
You can also specify the external ID of your AWS account for a more secure access. The external ID must be a string.
The following sample shows the assumed IAM role's trust policy with the external ID:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::001234567890:root" //anyone in this account 001234567890 can assume this role, this can also be limited to one role. }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aws_externalid" } } } ] }
You can configure cross-account access when the Secure Agent running on an EC2 instance is in one AWS account, while Apache Iceberg tables, AWS Glue Catalogs, and Athena workgroups reside in a different AWS account.
Consider the following example to set up cross-account IAM roles and policies to allow an EC2 agent running in Account B to access AWS Glue Catalog, Athena workgroup, and Apache Iceberg tables located in Account A:
•Account A hosts the IAM Role to be assumed, AWS Glue Catalog, Athena workgroup, and Iceberg tables.
•Account B hosts the EC2 instance running the Secure Agent that needs to assume the role in Account A.
Perform the following steps to configure cross-account access for an EC2 instance:
1Create IAM policies in Account A for AWS Glue Catalog, Amazon Athena, and Amazon S3.
2Create an IAM role in Account A with trust policy allowing Account B to assume the role using an External ID.
3Create an IAM policy in Account B allowing EC2 instance to assume the role in Account A.
4Create and assign the IAM Role to the EC2 instance in Account B.
Create IAM policies in Account A
You need to create IAM policies with the minimum required permissions to interact with AWS Glue Catalog, Amazon Athena, and Amazon S3.
1Log in to the IAM console.
2Click Policies > Create policy.
3On the JSON tab, add the following policies:
Minimum policy for Amazon Athena
The following sample policy shows the minimal Amazon IAM policy to access Amazon Athena:
Create an IAM role in Account A with trust policy allowing Account B to assume the role using an External ID.
1In the IAM console, click IAM > Roles > Create Role.
2On the Step 1: Select trusted entities tab, under the Trusted entity type section, select AWS account.
3In the An AWS account section, select Another AWS account, and enter the Account ID for Account B that hosts the EC2 instance where the Secure Agent is installed.
4Under Options, select Require External ID and enter a secure external ID.
5Click Next.
6On the Step 2 Add permissions tab, select and attach the created permission policies to access AWS Glue Catalog, Amazon Athena, and Amazon S3.
7Click Next.
8On the Step 3 Name, review, and create tab, enter a Name for the role.
9Click Create role.
After you create a role, copy the IAM role ARN. You must specify the IAM role ARN in the Open Table connection to use cross-account access for EC2 Role to Assume Role authentication.
10On the Trust relationships tab, you can view trust relationship policy for the IAM role.
The following example shows the trust relationship policy for the IAM role:
The value of the Resource field in the policy is the ARN of the IAM Role created in Account A. Replace arn:aws:iam::AccountAID:role/assumerolename with the actual IAM role ARN.
3Click Next.
4Name the policy, and then click Create Policy.
Create and assign the IAM Role to the EC2 instance in Account B.
To create an IAM role for the EC2 instance in Account B, perform the following steps:
1In the IAM console, click IAM > Roles > Create Role.
2On the Step 1: Select trusted entities tab, under the Trusted entity type section, select AWS service.
3Under Use case, select EC2.
4Click Next.
5On the Step 2 Add permissions tab, select and attach the created policy in Account B.
6Click Next.
7On the Step 3 Name, review, and create tab, enter the Name and Description for the role.
8Click Create role.
To assign the IAM Role to the EC2 instance, perform the following steps:
1Log in to the EC2 Console.
2Select your instance, and then click Actions > Security > Modify IAM role.
3Select the role you created, and then click Update IAM role.
Optionally, to whitelist an IP address, perform the following steps:
1On the Security tab, click on the ID of the attached Security Group.
2Click Edit inbound rules.
3Click Add rule, select the Protocol and Port range.
4Under Source, select Custom and enter the specific IP address you want to whitelist.