Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Assign the required permissions.
•Configure authentication.
•Configure a connection to the Amazon Redshift source system in Administrator.
•Create endpoint catalog sources for connection assignment.
•Optionally, if you want to identify pairs of similar columns and relationships between tables within a catalog source, import a relationship inference model.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions to extract metadata
Ensure that you have the required permissions to enable metadata extraction.
Configure the following permissions to configure a connection to an Amazon Redshift catalog source:
•Read permission on the Amazon Redshift external source.
•Permissions that allow you to perform the following operations:
•Permissions to run the SHOW EXTERNAL TABLE operation on the tables that you want to process.
•Permissions to access tables from a specific schema:
- GRANT USAGE ON SCHEMA <Schema name> to <User>;
- GRANT SELECT ON ALL TABLES IN SCHEMA <Schema name> TO <User>;
Optionally, to obtain more detailed results, grant permissions that allow you to perform the following operation:
•select on pg_catalog.PG_DATABASE
Permissions to run data profiles
Ensure that you have the required permissions to run profiles.
To perform data profiling, you need to unload data to the Amazon Redshift source system.
To unload data, configure the following connector permissions:
•ListBucket. Required to view objects from Amazon S3 buckets.
•GetBucketPolicy. Required to get the IAM policy information for access privilege details on Amazon S3 buckets or folders.
•GetObject. Required to read objects from Amazon S3 buckets.
•PutObject. Required to process staging data for Avro and Parquet files.
•DeleteObject. Required to delete staging data of Avro and Parquet files.
Permissions to perform data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to perform relationship discovery
You can perform relationship discovery with the permissions required to perform metadata extraction.
Permissions to perform glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Create a connection
Create an Amazon Redshift connection object in Administrator with the connection details of the Amazon Redshift source system.
1In Administrator, select Connections.
2Click New Connection.
3In the Connection Details section, enter the following connection details:
Connection property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
4Select the Amazon Redshift V2 connection type.
5Enter properties specific to the Amazon Redshift connection:
Property
Description
JDBC URL
The JDBC URL to connect to the Amazon Redshift cluster.
You can get the JDBC URL from your Amazon AWS Redshift cluster configuration page.
Enter the JDBC URL in the following format:
jdbc:redshift://<cluster_endpoint>:<port_number>/<database_name>, where the endpoint includes the Redshift cluster name and region.
For example, jdbc:redshift://infa-rs-cluster.abc.us-west-2.redshift.amazonaws.com:5439/rsdb
In the example,
- infa-rs-qa-cluster is the name of the Redshift cluster.
- us-west-2.redshift.amazonaws.com is the Redshift cluster endpoint, which is the US West (Oregon) region.
- 5439 is the port number for the Redshift cluster.
- rsdb is the specific database instance in the Redshift cluster to which you want to connect.
6Select the Default authentication type.
7Enter the following connection details:
Property
Description
Username
User name of your database instance in the Amazon Redshift cluster.
Password
Password of the Amazon Redshift database user.
8Click Test Connection.
9Click Save.
Create endpoint catalog sources for connection assignment
An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.
You can then perform connection assignment to reference source systems to view complete lineage with source system objects.
Import a relationship inference model
Import a relationship inference model if you want to configure the relationship discovery capability. You can either import a predefined relationship inference model, or import a model file from your local machine.
1In Metadata Command Center, click Explore on the navigation panel.
2Expand the menu and select Relationship Inference Model. The following image shows the Explore page with the Relationship Inference Model menu:
3Select one of the following options:
- Import Predefined Content. Imports a predefined relationship inference model called Column Similarity Model v1.0.
- Import. Imports the predefined relationship inference model from your local machine. Select this if you previously imported predefined content into your local machine and the inference model is stored on the machine.
To import a file, click Choose File in the Import Relationship Inference Model window and navigate to the model file on your local machine. You can also drag and drop the file.
The imported models appear in the list of relationship inference models on the Relationship Discovery tab.