Catalog Source Configuration > Amazon Redshift > Introduction to Amazon Redshift catalog sources

Introduction to Amazon Redshift catalog sources

You can use Metadata Command Center to extract metadata from a source system.

A source system is any system that contains data or metadata. For example, Amazon Redshift is a source system from which you can extract metadata through an Amazon Redshift catalog source. A catalog source is an object that represents and contains metadata from the source system.

Before you extract metadata from a source system, you first create and register a catalog source that represents the source system. Then you configure capabilities for the catalog source. A capability is a task that Metadata Command Center can perform, such as metadata extraction, lineage discovery, data profiling, data classification, or glossary association.

When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.

The following table describes the capabilities of the catalog source:

Capability	Description
Serverless Runtime Environment	A serverless runtime environment is an advanced serverless deployment solution that doesn't require downloading, installing, configuring, or maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a Secure Agent when you configure a catalog source.
Advanced Programming Language Parsing	Advanced Programming Language Parsing parses the source system code in addition to extracting objects from the source system.
Lineage Discovery	Builds the complete lineage of a catalog source by recommending endpoint catalog source objects to assign to reference catalog source connections. When you run the catalog source job, Metadata Command Center assigns the reference catalog source connections to CLAIRE recommended endpoint catalog source objects. You can then view the list of CLAIRE recommendations and accept or reject them.
Data Profiling and Quality	- Data Profiling. Assesses source metadata and analyzes the collected statistics to discover content and structure, such as value distribution, patterns, and data types. - Data Quality. Measures the reliability of the data and enables data usage. - Data Observability. Identifies anomalies in the characteristics of the data.
Data Classification	Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of the data. Classifying data can help your organization manage risks, compliance, and data security.
Relationship Discovery	You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.
Glossary Association	You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.

Extraction and view process

To extract metadata from a source system, configure the catalog source and run the extraction job in Metadata Command Center. Then view the results in Data Governance and Catalog.

The following image shows the process to extract metadata from a source system:

The image shows the process of metadata extraction from an Amazon Redshift source system that begins with prerequisites verification, continues with the creation of the catalog source, and ends with viewing the extraction results.

After you verify prerequisites, perform the following tasks to extract metadata from Amazon Redshift:

1Register a catalog source. Create a catalog source object, select Amazon Redshift, and then select and test the connection.
2Configure the catalog source. Specify the runtime environment and configure parameters for metadata extraction. Optionally, add filters to include or exclude source system assets from metadata extraction. You can also configure other capabilities such as data profiling and quality, data classification, or glossary association.
3Optionally, associate stakeholders. Associate users with technical assets, giving the users permission to perform actions determined by their roles.
4Run or schedule the catalog source job.
5Optionally, if the catalog source job generates referenced asset objects, you can assign a connection to referenced source system assets.

You can view the lineage with object references without performing connection assignment. After connection assignment, you can view the objects.

After you run the catalog source job, you view the results in Data Governance and Catalog.

About the Amazon Redshift catalog source

You can use the Amazon Redshift catalog source to extract metadata from an Amazon Redshift source system.

Amazon Redshift is a cloud-based data warehousing service that you can use to analyze and store data.

Extracted metadata

You can use the Amazon Redshift catalog source to extract metadata from an Amazon Redshift source system.

Metadata Command Center extracts the following metadata from an Amazon Redshift source system:

•Database
•Schema
•External Schema
•Table
•External Table
•View
•Materialized View

Note: Objects of the Materialized View type appear as View in Data Governance and Catalog.

•Function
•Procedure
•Column

Data profiling for Amazon Redshift objects

Configure data profiling to run profiles on the metadata extracted from an Amazon Redshift source system.

You can run data profiles on the following objects:

•Table
•View
•Spectrum External Table

You can run profiles on Spectrum external tables that are created using the following file formats:

•PARQUET
•AVRO
•TEXT
•RC
•SEQUENCE
•ORC

Note: If the Spectrum external tables include columns with string data types, those columns are skipped and are not considered for data profiling and data quality.

The data profiling task runs profiles on the following data types:

•SMALLINT
•INTEGER
•BIGINT
•DECIMAL
•REAL
•DOUBLE PRECISION
•BOOLEAN
•CHAR
•VARCHAR
•DATE
•TIMESTAMP
•STRING
•SUPER

Compatible connectors

Before you configure an Amazon Redshift catalog source, you must connect to the Amazon Redshift source system.

Use the Amazon Redshift V2 connector to connect to the Amazon Redshift source system.

For information about configuring a connection, see Connections.