Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Assign the required permissions.
•Configure authentication.
•Configure a connection to the Amazon Redshift source system in Administrator.
•Create endpoint catalog sources for connection assignment.
•Optionally, if you want to identify pairs of similar columns and relationships between tables within a catalog source, import a relationship inference model.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions to extract metadata
Ensure that you have the required permissions to enable metadata extraction.
Configure the following permissions to configure a connection to an Amazon Redshift catalog source:
•Read permission on the Amazon Redshift external source.
• Grant permissions that allow you to perform the following operations:
•Grant permissions to run the SHOW EXTERNAL TABLE operation on the tables that you want to process.
•Permissions to access tables from a specific schema:
- GRANT ALL ON SCHEMA <Schema name> to <User>;
- GRANT ALL ON ALL TABLES IN SCHEMA <Schema name> TO <user>;
Permissions to run data profiles
Ensure that you have the required permissions to run profiles.
To perform data profiling, you need to unload data to the Amazon Redshift source system.
To unload data, configure the following connector permissions:
•ListBucket. Required to view objects from Amazon S3 buckets.
•GetBucketPolicy. Required to get the IAM policy information for access privilege details on Amazon S3 buckets or folders.
•GetObject. Required to read objects from Amazon S3 buckets.
•PutObject. Required to process staging data for Avro and Parquet files.
•DeleteObject. Required to delete staging data of Avro and Parquet files.
Permissions to perform data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to perform relationship discovery
You can perform relationship discovery with the permissions required to perform metadata extraction.
Permissions to perform glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Create a connection
Create an Amazon Redshift connection object in Administrator with the connection details of the Amazon Redshift source system.
1In Administrator, select Connections.
2Click New Connection.
3In the Connection Details section, enter the following connection details:
Connection property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
4Select the Amazon Redshift V2 connection type.
5Enter properties specific to the Amazon Redshift connection:
Property
Description
JDBC URL
The JDBC URL to connect to the Amazon Redshift cluster.
You can get the JDBC URL from your Amazon AWS Redshift cluster configuration page.
Enter the JDBC URL in the following format:
jdbc:redshift://<cluster_endpoint>:<port_number>/<database_name>, where the endpoint includes the Redshift cluster name and region.
For example, jdbc:redshift://infa-rs-cluster.abc.us-west-2.redshift.amazonaws.com:5439/rsdb
In the example,
- infa-rs-qa-cluster is the name of the Redshift cluster.
- us-west-2.redshift.amazonaws.com is the Redshift cluster endpoint, which is the US West (Oregon) region.
- 5439 is the port number for the Redshift cluster.
- rsdb is the specific database instance in the Redshift cluster to which you want to connect.
6Select the authentication type.
7Enter the following connection details:
Default
Property
Description
Username
User name of your database instance in the Amazon Redshift cluster.
Password
Password of the Amazon Redshift database user.
Use EC2 Role to Assume Role
For instructions, see Generate temporary security credentials using AssumeRole for EC2 in the Data Integration Connectors help.
S3 IAM Role ARN
The Amazon Resource Number (ARN) of the IAM role assumed by the IAM user or EC2 to use the dynamically generated temporary security credentials to stage data in Amazon S3.
This property applies when you want to generate temporary security credentials to access the S3 staging buckets by using either the EC2 instance or the IAM user who assumes the S3 IAM role.
Specify the S3 IAM role name to use the temporary security credentials to access the Amazon S3 staging bucket.
For more information about how to get the ARN of the S3 IAM role, see the AWS documentation.
Redshift IAM Authentication via AssumeRole
Property
Description
Username
User name of your database instance in the Amazon Redshift cluster.
Cluster Identifier
The unique identifier of the cluster that hosts Amazon Redshift.
Specify the Amazon Redshift cluster name.
Database Name
Name of the Amazon Redshift database where the tables that you want to access are stored.
Redshift IAM Role ARN
The Amazon Resource Number (ARN) of the IAM role assumed by EC2 to use the dynamically generated temporary security credentials to access Amazon Redshift.
Enter the Redshift IAM role ARN to access the Amazon Redshift cluster.
Use EC2 Role to Assume Role
S3 IAM Role ARN
The Amazon Resource Number (ARN) of the S3 IAM role assumed by the IAM user or EC2 to use the dynamically generated temporary security credentials to stage data in Amazon S3.
This property applies when you want to generate the temporary security credentials to access the S3 staging buckets by using either the EC2 instance or the IAM user who assumes the S3 IAM role.
Specify the S3 IAM role name to use the temporary security credentials to access the Amazon S3 staging bucket.
For more information about how to get the ARN of the IAM role, see the AWS documentation.
8Click Test Connection.
9Click Save.
For more information about the Amazon Redshift connector, see "Create a Minimal Amazon IAM Policy" in the following How-to article: Configuring AWS IAM Authentication for Amazon Redshift and Amazon Redshift V2 Connectors
Create endpoint catalog sources for connection assignment
An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.
You can then perform connection assignment to reference source systems to view complete lineage with source system objects.
Import a relationship inference model
Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
1In Metadata Command Center, click Explore on the navigation panel.
2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:
3Select one of the following options:
- Import Predefined Content. Imports a predefined relationship inference model called Column Similarity Model v1.0.
- Import. Imports the predefined relationship inference model from your local machine. Select this if you previously imported predefined content into your local machine and the inference model is stored on the machine.
To import a file, click Choose File in the Import Relationship Inference Model window and navigate to the model file on your local machine. You can also drag and drop the file.
The imported models appear in the list of relationship inference models on the Relationship Discovery tab.