Let's configure the Open Table connection properties to connect to AWS Glue Catalog or Hive Metastore.
Before you begin
Before you get started, you will need to add the Amazon Athena or Hive JDBC driver to the Secure Agent machine and configure the authentication-specific prerequisites.
Permanent IAM Credentials authentication requires the access key and secret key values of the IAM user. Keep the access key and secret key handy before creating the connection. For more information about creating an access key and secret key, see the AWS documentation.
To configure Service Principal authentication, you need the Azure account name, client secret, client ID, and tenant ID for your application registered in the Azure Active Directory. Keep the Azure account name, client secret, client ID, and tenant ID handy before creating the connection. For more information about Azure account name, client secret, client ID, and tenant ID for your application, see the Microsoft Azure documentation.
Check out Prerequisites to learn more about how to configure policies and role to access Apache Iceberg or Delta Lake tables.
Open Table formats with associated catalog and storage types
You can choose the Open Table format that you want to use and its associated catalog type and storage type to interact with data.
The following table summarizes the Open Table formats that you can use, their catalog types, storage types, and the authentication options available for each storage type:
Open Table format
Catalog type
Catalog authentication type
Storage type
Storage authentication type
Apache Iceberg
AWS Glue Catalog
None
Amazon S3
- Permanent IAM Credentials authentication
- EC2 Role to Assume Role
Hive Metastore
None
Amazon S3
Permanent IAM Credentials authentication
Hive Metastore
None
Microsoft Azure Delta Lake Storage Gen2
Service Principal authentication
REST Catalog
OAuth 2.0 Credentials
Amazon S3
Permanent IAM Credentials authentication
Delta Lake
AWS Glue Catalog
None
Amazon S3
Permanent IAM Credentials authentication
Connection details
The following table describes the Open Table connection properties:
Property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
Type
Open Table
Runtime Environment
The name of the runtime environment where you want to run tasks.
Select a Secure Agent, Hosted Agent, or serverless runtime environment.
Open Table Format
The Open Table format that you want to use to read from or write data to a catalog.
Select Apache Iceberg or Delta Lake from the list.
Catalog types
You can select AWS Glue Catalog, Hive Metastore, or REST Catalog as the catalog type to manage the metadata of the Open Table format that you selected.
Select the catalog type that your Open Table format uses and then configure the catalog specific parameters.
AWS Glue Catalog
If the Apache Iceberg or Delta Lake Open Table format uses AWS Glue Catalog as the catalog type, configure the properties specific to AWS Glue Catalog.
The following table describes the property to configure AWS Glue Catalog:
For example, jdbc:athena://Region=us-west1;OutputLocation=s3://working/dir.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to AWS Glue Catalog or Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a Client ID and Client Secret to obtain an access token from the authorization server.
Hive Metastore
If the Apache Iceberg Open Table format uses Hive Metastore as the catalog type, configure the properties specific to Hive Metastore.
The following table describes the properties to configure Hive Metastore:
Property
Description
Hive Metastore URI
The Hive thrift server URL to connect to Hive Metastore.
Hive JDBC URL
The JDBC URL to connect to Hive4 server.
Hive User Name
The user name of your Hive account to connect to Hive Metastore.
Hive Password
The password of your Hive account to connect to Hive Metastore.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to AWS Glue Catalog or Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a Client ID and Client Secret to obtain an access token from the authorization server.
REST Catalog
If the Apache Iceberg Open Table format uses REST catalog as the catalog type, configure the properties specific to REST Catalog.
The following table describes the properties to configure REST catalog:
Property
Description
REST Catalog Type
The type of REST catalog that you want to connect to.
Select Polaris Catalog.
Catalog Endpoint
The endpoint URL of the REST catalog.
Catalog Authentication Type
The authentication method to connect to the catalog.
Select one of the following options:
- None. Connects to an AWS Glue Catalog or a Hive Metastore without any authentication credentials.
- OAuth 2.0 Client Credentials. Connects to a REST catalog using a client ID and client secret to obtain an access token from the OAuth 2.0 authorization server.
Access Token URL
The URL provided by the OAuth 2.0 authorization server to obtain an access token.
Client ID
The client ID of the REST endpoint registered with the OAuth 2.0 authorization server.
Client Secret
The client secret of the REST endpoint registered with the OAuth 2.0 authorization server.
Scope
The scope parameters that define the permissions an access token grants to the REST endpoint.
Credential Vending
Determines if the storage for the REST catalog requires authentication.
If credential vending is enabled, it indicates that the REST catalog is configured to automatically generate the temporary credentials to access the associated storage. You do not need to provide the storage credentials separately.
If credential vending is disabled, it indicates that you need to provide the storage credentials separately.
When credential vending is disabled, temporary staging directory is not deleted from table storage location for the update, upsert, and delete operations.
Storage types
You can choose Amazon S3 or Microsoft Azure Data Lake Storage Gen2 as the storage type to store the Open Table format tables.
Select the storage type and configure the storage specific authentication parameters.
Amazon S3
If you use AWS Glue Catalog, Hive Metastore, or REST Catalog as the catalog type, configure the properties specific to Amazon S3 storage.
Permanent IAM Credentials authentication
You can use Permanent IAM Credentials authentication for Amazon S3 storage when you connect to an AWS Glue Catalog, Hive Metastore, or REST Catalog.
The following table describes the properties to configure Permanent IAM Credentials authentication:
Property
Description
Access Key
The key to access the AWS Glue Catalog.
Secret Key
The secret key to access the AWS Glue Catalog. The secret key is associated with the access key and uniquely identifies the account.
EC2 Role to Assume Role authentication
You can use EC2 Role to Assume Role authentication for Amazon S3 storage only when you read Apache Iceberg tables from AWS Glue Catalog.
The following table describes the properties to configure EC2 Role to Assume Role authentication:
Property
Description
EC2 Role
The ARN of the IAM role assumed by the EC2 role to generate the temporary session credentials.
External ID
A unique, user-defined string value that the IAM role requires the EC2 role to provide when calling the sts:AssumeRole API.
Microsoft Azure Data Lake Storage Gen2
If you use Hive Metastore as the catalog type, configure the properties specific to Microsoft Azure Data Lake Storage Gen2.
Select Service Principal authentication as the authentication type to access Open Table formats in Microsoft Azure Data Lake Storage Gen2.
Service Principal authentication
The following table describes the properties to configure Service Principal authentication:
Property
Description
Azure Account Name
The name of the Microsoft Azure Data Lake Storage Gen2 account to stage the files.
Azure Client ID
The client ID of your application.
Enter the application ID or client ID for your application registered in the Azure Active Directory.
Azure Client Secret
The client secret for your application.
Azure Tenant ID
The directory ID or tenant ID for your application.