Introduction to Amazon S3 catalog sources

You can use Metadata Command Center to extract metadata from a source system.

A source system is any system that contains data or metadata. For example, Amazon S3 is a source system from which you can extract metadata through an Amazon S3 catalog source. A catalog source is an object that represents and contains metadata from the source system.

Before you extract metadata from a source system, you first create and register a catalog source that represents the source system. Then you configure capabilities for the catalog source. A capability is a task that Metadata Command Center can perform, such as metadata extraction, lineage discovery, data profiling, data classification, or glossary association.

When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.

The following table describes the capabilities of the catalog source:

Capability	Description
Incremental metadata extraction	An incremental metadata extraction extracts only the changed and new objects since the last catalog source job run. Incremental metadata extraction doesn’t remove deleted objects from the catalog and doesn’t extract metadata of code-based objects if applicable.
Serverless Runtime Environment	A serverless runtime environment is an advanced serverless deployment solution that doesn't require downloading, installing, configuring, or maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a Secure Agent when you configure a catalog source.
Data Profiling and Quality	- Data Profiling. Assesses source metadata and analyzes the collected statistics to discover content and structure, such as value distribution, patterns, and data types. - Data Quality. Measures the reliability of the data and enables data usage. - Data Observability. Identifies anomalies in the characteristics of the data.
Data Classification	Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of the data. Classifying data can help your organization manage risks, compliance, and data security.
Relationship Discovery	The relationship discovery capability identifies pairs of similar columns and relationships between tables within a catalog source.
Glossary Association	You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.

Extraction and view process

About the Amazon S3 catalog source

Extracted metadata

File Type	Partition structure
AVRO	Single partition, multiple partitions, schema merge
CSV	Single partition, multiple partitions, schema merge
JSON	Single partition, multiple partitions, schema merge
Parquet	Single partition, multiple partitions, schema merge
XML	Single partition, multiple partitions, schema merge

You can extract metadata from XML and XSD file formats as XML file objects. Metadata Command Center extracts only elements and attributes from XML and XSD files. If the size of the XML file exceeds 100 KB, Metadata Command Center extracts metadata from the initial 100 KB of the file. However, for XSD file types, Metadata Command Center extracts complete metadata.

File format	Data type
Avro	- INT - STRING
CSV	- STRING
JSON	- ARRAY - DOUBLE - INTEGER - OBJECT - STRING
Parquet	- BOOLEAN - INT32 - INT64 - FLOAT - DOUBLE - DATE - DECIMAL - STRING

File format

Data type

Avro

- INT
- STRING

CSV

- STRING

JSON

- ARRAY
- DOUBLE
- INTEGER
- OBJECT
- STRING

Parquet

- BOOLEAN
- INT32
- INT64
- FLOAT
- DOUBLE
- DATE
- DECIMAL
- STRING

Introduction to Amazon S3 catalog sources

Extraction and view process

About the Amazon S3 catalog source

Extracted metadata

Data profiling for Amazon S3 objects

Compatible connectors