Connectors and Connections > Data Ingestion and Replication connection properties > Open Table connection properties
  

Open Table connection properties

Create an Open Table connection to securely load data to Open Table formats available in a catalog.

Prerequisites

Before you create an Open Table connection, complete the prerequisites.

Using AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables

If you use an AWS Glue Catalog and Amazon S3 Storage to interact with Apache Iceberg or Delta Lake tables, you need to have access to the following AWS services that manage the tables on AWS:
You need to create separate policies to access these services.

Using Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables

If you use a Hive Metastore catalog and Microsoft Azure Data Lake Storage Gen2 to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Microsoft Azure Data Lake Storage Gen2:

Using Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables

If you use a Hive Metastore catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:

Using REST catalog and Amazon S3 to interact with Apache Iceberg tables

If you use a REST catalog such as Polaris catalog and Amazon S3 storage to interact with Apache Iceberg tables, you need to have access to the following services that manage the tables on Amazon S3 storage:

Create minimal IAM policies

You need to create IAM policies with the minimum required permissions to interact with Apache Iceberg or Delta Lake tables managed by AWS Glue Catalog. For more information on configuring these policies, refer to the AWS documentation.
Minimum policy for Amazon Athena
The following sample policy shows the minimal Amazon IAM policy to access Amazon Athena:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"athena:CreatePreparedStatement",
"athena:GetPreparedStatement",
"athena:GetWorkGroup",
"athena:GetTableMetadata",
"athena:StartQueryExecution",
"athena:GetQueryResultsStream",
"athena:ListDatabases",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:GetDatabase",
"athena:ListTableMetadata",
"athena:GetDataCatalog",
"athena:DeletePreparedStatement"
],
"Resource": [
"arn:aws:athena:*:*:workgroup/*",
"arn:aws:athena:*:*:datacatalog/*"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"athena:ListDataCatalogs",
"athena:GetQueryExecution",
"athena:ListWorkGroups",
"athena:GetPreparedStatement"
],
"Resource": "*"
}
]
}
Minimum policy for AWS Glue
The following sample policy shows the minimal Amazon IAM policy to access AWS Glue Catalog:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"glue:*"
],
"Resource": [
"*"
]
}
]
}
Minimum policy for AWS S3
The following sample policy shows the minimal Amazon IAM policy to read from or write data to an Amazon S3 bucket:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket",
"s3:DeleteObject"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListAllMyBuckets",
"s3:GetBucketAcl"
],
"Resource": [
"*"
]
}
]
}

Install the JDBC driver

Before you use Open Table Connector, you need to copy the Amazon Athena or Hive JDBC driver on the Linux machine where you installed the Secure Agent. You need to use the Amazon Athena driver for the AWS Glue Catalog and the Hive JDBC driver for the Hive Metastore catalog.
    1Download the latest Hive JDBC driver from the website.
    2Navigate to the following directory on the Secure Agent machine: <Secure Agent installation directory>/ext/connectors/thirdparty/
    3Create the following folder: informatica.opentableformat/common
    4Add the JDBC driver to the folder.
    5Restart the Secure Agent.

Configure EC2 role to assume role

You can configure an EC2 role to assume an IAM role and generate temporary security credentials to connect to Amazon S3 from the same or different AWS accounts.
When you configure EC2 role to assume role, ensure that you have the sts:AssumeRole permission and a trust relationship established within the AWS accounts to use the temporary security credentials. The trust relationship is defined in the trust policy of the IAM role when you create the role. The IAM role adds the EC2 role as a trusted entity allowing the EC2 role to use the temporary security credentials and access the AWS accounts.
When the trusted EC2 role requests for the temporary security credentials, the AWS Security Token Service (AWS STS) dynamically generates the temporary security credentials that are valid for a specified period and provides the credentials to the trusted EC2 role.
Before you use the EC2 Role to Assume Role authentication, consider the following prerequisites:
For more information about the minimum permission policies, see Create minimal IAM policies.

Connect to Open Table

Let's configure the Open Table connection properties to connect to AWS Glue Catalog, Hive Metastore, or Polaris REST catalog.

Before you begin

Before you get started, you will need to create the minimal IAM policies to interact with Apache Iceberg or Delta Lake tables managed by AWS Glue Catalog and install the Hive JDBC driver for Hive metastore. You also need to configure the authentication-specific prerequisites to connect to Amazon S3 or Microsoft Azure Data Lake Storage Gen2 storage.
Permanent IAM Credentials authentication for Amazon S3 requires the access key and secret key values of the IAM user. EC2 Role to Assume Role authentication for Amazon S3 requires the ARN of the IAM role that the EC2 role assumes to generate temporary security credentials.
To configure Service Principal authentication for Microsoft Azure Data Lake Storage Gen2, you need the Azure account name, client secret, client ID, and tenant ID for your application registered in the Azure Active Directory.
Check out Prerequisites to learn more about how to configure policies and role to access Apache Iceberg or Delta Lake tables.

Open Table formats with associated catalog and storage types

You can choose the Open Table format that you want to use and its associated catalog type and storage type to interact with data.
The following table summarizes the Open Table formats that you can use, their catalog types, storage types, and the authentication options available for each storage type:
Open Table format
Catalog type
Catalog authentication type
Storage type
Storage authentication type
Apache Iceberg
AWS Glue Catalog*
None
Amazon S3*
  • - Permanent IAM Credentials authentication
  • - EC2 Role to Assume Role
Hive Metastore
None
Amazon S3
Permanent IAM Credentials authentication
Hive Metastore
None
Microsoft Azure Data Lake Storage Gen2
Service Principal authentication
REST Catalog
OAuth 2.0 Credentials
Amazon S3
Permanent IAM Credentials authentication
Delta Lake
AWS Glue Catalog
None
Amazon S3
Permanent IAM Credentials authentication
*Apache Iceberg tables managed by the AWS Glue Catalog with Amazon S3 storage apply to both mappings and mappings in advanced mode.
Open Table formats with other catalog and storage types apply only to mappings in advanced mode.

Connection details

The following table describes the Open Table connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Use Secret Vault
Stores sensitive credentials for this connection in the secrets manager that is configured for your organization.
This property appears only if secrets manager is set up for your organization.
This property is not supported by Data Ingestion and Replication and the Data Access Management services.
When you enable the secret vault in the connection, you can select which credentials that the Secure Agent retrieves from the secrets manager. If you don't enable this option, the credentials are stored in the repository or on a local Secure Agent, depending on how your organization is configured.
Note: If you’re using this connection to apply data access policies through pushdown or proxy services, you cannot use the Secret Vault configuration option.
For information about how to configure and use a secrets manager, see Secrets manager configuration.
Runtime Environment
The name of the runtime environment where you want to run tasks.
You cannot run a database ingestion task on a Hosted Agent or in a serverless runtime environment.
Open Table Format
The Open Table format that you want to use to read from or write data to a catalog.
Select Apache Iceberg or Delta Lake from the list.

Catalog types

You can select AWS Glue Catalog, Hive Metastore, or REST Catalog as the catalog type to manage the metadata of the Open Table format that you selected.
Select the catalog type that your Open Table format uses and then configure the catalog specific parameters.

Storage types

You can choose Amazon S3 or Microsoft Azure Data Lake Storage Gen2 as the storage type to store the Open Table format tables.
Select the storage type and configure the storage specific authentication parameters.