Create an Open Table connection to securely load data to Open Table formats available in a catalog.
Prerequisites
Before you create an Open Table connection, complete the prerequisites.
Using AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables
If you use an AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables, you need to have access to the following AWS services that manage the tables on AWS:
•AWS Glue Catalog: AWS Glue Catalog manages the metadata associated with the Apache Iceberg or Delta Lake tables.
•Amazon S3 Storage: Amazon S3 stores the Apache Iceberg or Delta Lake tables containing actual records in columnar format, organized in partitioned directories.
•Amazon Athena: Amazon Athena uses the AWS Glue Data Catalog to store metadata such as table and column names for your data stored in Amazon S3. Open Table Connector uses the Amazon Athena JDBC driver to connect to the AWS Glue Catalog to access Apache Iceberg or Delta Lake tables metadata.
You need to create separate policies to access these services.
Using Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables
If you use a Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Microsoft Azure Data Lake Storage Gen2:
•Hive Metastore catalog: Hive Metastore catalog manages the metadata associated with the Apache Iceberg tables. The metastore must use Hive version 4.0.
•Microsoft Azure Data Lake Storage Gen2: Microsoft Azure Data Lake Storage Gen2 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
•Hive JDBC driver: Hive JDBC driver connects to the Hive server to access the metadata of Apache Iceberg tables.
Using Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables
If you use a Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:
•Hive Metastore catalog: Hive Metastore catalog manages the metadata associated with the Apache Iceberg tables. The metastore must use Hive version 4.0.
•Amazon S3 storage: Amazon S3 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
•Hive JDBC driver: Hive JDBC driver connects to the Hive server to access the metadata of Apache Iceberg tables.
Using REST catalog and Amazon S3 to interact with Apache Iceberg tables
If you use a REST catalog such as Polaris catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:
•REST catalog: REST catalog manages the metadata associated with the Apache Iceberg tables.
•Amazon S3 storage: Amazon S3 stores the Apache Iceberg tables containing actual records in columnar format, organized in partitioned directories.
Create minimal IAM policies
You need to create IAM policies with the minimum required permissions to interact with Apache Iceberg or Delta Lake tables managed by AWS Glue Catalog. For more information on configuring these policies, refer to the AWS documentation.
Minimum policy for Amazon Athena
The following sample policy shows the minimal Amazon IAM policy to access Amazon Athena:
Before you use Open Table Connector, you need to copy the Amazon Athena or Hive JDBC driver on the Linux machine where you installed the Secure Agent. You need to use the Amazon Athena driver for the AWS Glue Catalog and the Hive JDBC driver for the Hive Metastore catalog.
1Download the latest Hive JDBC driver from the website.
2Navigate to the following directory on the Secure Agent machine: <Secure Agent installation directory>/ext/connectors/thirdparty/
3Create the following folder: informatica.opentableformat/common
4Add the JDBC driver to the folder.
5Restart the Secure Agent.
Configure EC2 role to assume role
You can configure an EC2 role to assume an IAM role and generate temporary security credentials to connect to Amazon S3 from the same or different AWS accounts.
When you configure EC2 role to assume role, ensure that you have the sts:AssumeRole permission and a trust relationship established within the AWS accounts to use the temporary security credentials. The trust relationship is defined in the trust policy of the IAM role when you create the role. The IAM role adds the EC2 role as a trusted entity allowing the EC2 role to use the temporary security credentials and access the AWS accounts.
When the trusted EC2 role requests for the temporary security credentials, the AWS Security Token Service (AWS STS) dynamically generates the temporary security credentials that are valid for a specified period and provides the credentials to the trusted EC2 role.
Before you use the EC2 Role to Assume Role authentication, consider the following prerequisites:
•Install the Secure Agent on the AWS EC2 instance.
•The EC2 role attached to the AWS EC2 instance must have permissions to assume another IAM role.
The following is a sample permission policy of EC2 role that is attached to the AWS EC2 instance:
Resource value must include the ARN of IAM role that the EC2 role needs to assume.
•The IAM role that the EC2 role needs to assume must have a permission policy and a trust policy attached to access AWS Glue Catalog, Amazon Athena, and Amazon S3.
You can also specify the external ID of your AWS account for a more secure access. The external ID must be a string.
The following sample shows the assumed IAM role's trust policy with the external ID:
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::001234567890:root" //anyone in this account 001234567890 can assume this role, this can also be limited to one role. }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "sts:ExternalId": "aws_externalid" } } } ] }
Let's configure the Open Table connection properties to connect to AWS Glue Catalog, Hive Metastore, or Polaris REST catalog.
Before you begin
Before you get started, you will need to create the minimal IAM policies to interact with Apache Iceberg or Delta Lake tables managed by AWS Glue Catalog and install the Hive JDBC driver for Hive metastore. You also need to configure the authentication-specific prerequisites to connect to Amazon S3 or Microsoft Azure Data Lake Storage Gen2 storage.
Permanent IAM Credentials authentication for Amazon S3 requires the access key and secret key values of the IAM user. EC2 Role to Assume Role authentication for Amazon S3 requires the ARN of the IAM role that the EC2 role assumes to generate temporary security credentials.
To configure Service Principal authentication for Microsoft Azure Data Lake Storage Gen2, you need the Azure account name, client secret, client ID, and tenant ID for your application registered in the Azure Active Directory.
Check out Prerequisites to learn more about how to configure policies and role to access Apache Iceberg or Delta Lake tables.
Open Table formats with associated catalog and storage types
You can choose the Open Table format that you want to use and its associated catalog type and storage type to interact with data.
The following table summarizes the Open Table formats that you can use, their catalog types, storage types, and the authentication options available for each storage type:
Open Table format
Catalog type
Catalog authentication type
Storage type
Storage authentication type
Apache Iceberg
AWS Glue Catalog*
None
Amazon S3*
- Permanent IAM Credentials authentication
- EC2 Role to Assume Role
Hive Metastore
None
Amazon S3
Permanent IAM Credentials authentication
Hive Metastore
None
Microsoft Azure Data Lake Storage Gen2
Service Principal authentication
REST Catalog
OAuth 2.0 Credentials
Amazon S3
Permanent IAM Credentials authentication
Delta Lake
AWS Glue Catalog
None
Amazon S3
Permanent IAM Credentials authentication
*Apache Iceberg tables managed by the AWS Glue Catalog with Amazon S3 storage apply to both mappings and mappings in advanced mode.
Open Table formats with other catalog and storage types apply only to mappings in advanced mode.
Connection details
The following table describes the Open Table connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Use Secret Vault
Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
This property appears only if secrets manager is set up for your organization.
This property is not supported by Data Ingestion and Replication and the Data Access Management services.
When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
Note: If you’re using this connection to apply data access policies through pushdown or proxy services, you cannot use the Secret Vault configuration option.
The name of the runtime environment where you want to run tasks.
You cannot run a database ingestion task on a Hosted Agent or in a serverless runtime environment.
Open Table Format
The Open Table format that you want to use to read from or write data to a catalog.
Select Apache Iceberg or Delta Lake from the list.
Catalog types
You can select AWS Glue Catalog, Hive Metastore, or REST Catalog as the catalog type to manage the metadata of the Open Table format that you selected.
Select the catalog type that your Open Table format uses and then configure the catalog specific parameters.
AWS Glue Catalog
If the Apache Iceberg or Delta Lake Open Table format uses AWS Glue Catalog as the catalog type, configure the properties specific to AWS Glue Catalog.
The following table describes the property to configure AWS Glue Catalog:
For example, jdbc:athena://Region=us-west1;OutputLocation=s3://working/dir.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to AWS Glue Catalog or Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a Client ID and Client Secret to obtain an access token from the authorization server.
Hive Metastore
If the Apache Iceberg Open Table format uses Hive Metastore as the catalog type, configure the properties specific to Hive Metastore.
The following table describes the properties to configure Hive Metastore:
Property
Description
Hive Metastore URI
The Hive thrift server URL to connect to Hive Metastore.
Hive JDBC URL
The JDBC URL to connect to Hive4 server.
Hive User Name
The user name of your Hive account to connect to Hive Metastore.
Hive Password
The password of your Hive account to connect to Hive Metastore.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to AWS Glue Catalog or Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a Client ID and Client Secret to obtain an access token from the authorization server.
REST Catalog
If the Apache Iceberg Open Table format uses REST catalog as the catalog type, configure the properties specific to REST Catalog.
The following table describes the properties to configure REST catalog:
Property
Description
REST Catalog Type
The type of REST catalog that you want to connect to.
Select Polaris Catalog.
Catalog Endpoint URL
The endpoint URL of the REST catalog.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to an AWS Glue Catalog or a Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a client ID and client secret to obtain an access token from the OAuth 2.0 authorization server.
Access Token URL
The URL provided by the OAuth 2.0 authorization server to obtain an access token.
Client ID
The client ID of the REST endpoint registered with the OAuth 2.0 authorization server.
Client Secret
The client secret of the REST endpoint registered with the OAuth 2.0 authorization server.
Scope
The scope parameters that define the permissions an access token grants to the REST endpoint.
Credential Vending
Determines if the storage for the REST catalog requires authentication.
If credential vending is enabled, it indicates that the REST catalog is configured to automatically generate the temporary credentials to access the associated storage. You do not need to provide the storage credentials separately.
If credential vending is disabled, it indicates that you need to provide the storage credentials separately.
When credential vending is disabled, temporary staging directory is not deleted from table storage location for the update, upsert, and delete operations.
Storage types
You can choose Amazon S3 or Microsoft Azure Data Lake Storage Gen2 as the storage type to store the Open Table format tables.
Select the storage type and configure the storage specific authentication parameters.
Amazon S3
If you use AWS Glue Catalog, Hive Metastore, or REST Catalog as the catalog type, configure the properties specific to Amazon S3 storage.
Permanent IAM Credentials authentication
You can use Permanent IAM Credentials authentication for Amazon S3 storage when you connect to an AWS Glue Catalog, Hive Metastore, or REST Catalog.
The following table describes the properties to configure Permanent IAM Credentials authentication:
Property
Description
Access Key
The key to access the AWS Glue Catalog.
Secret Key
The secret key to access the AWS Glue Catalog. The secret key is associated with the access key and uniquely identifies the account.
EC2 Role to Assume Role authentication
You can use EC2 Role to Assume Role authentication for Amazon S3 storage only when you read Apache Iceberg tables from AWS Glue Catalog.
The following table describes the properties to configure EC2 Role to Assume Role authentication:
Property
Description
IAM Role ARN
The ARN of the IAM role assumed by the EC2 role to generate the temporary session credentials.
External ID
A unique, user-defined string value that the IAM role requires the EC2 role to provide when calling the sts:AssumeRole API.
Microsoft Azure Data Lake Storage Gen2
If you use Hive Metastore as the catalog type, configure the properties specific to Microsoft Azure Data Lake Storage Gen2.
Select Service Principal authentication as the authentication type to access Open Table formats in Microsoft Azure Data Lake Storage Gen2.
Service Principal authentication
The following table describes the properties to configure Service Principal authentication:
Property
Description
Azure Account Name
The name of the Microsoft Azure Data Lake Storage Gen2 account to stage the files.
Azure Client ID
The client ID of your application.
Enter the application ID or client ID for your application registered in the Azure Active Directory.
Azure Client Secret
The client secret for your application.
Azure Tenant ID
The directory ID or tenant ID for your application.