Catalog Source Configuration > SFTP File System > Introduction to SFTP File System catalog sources
  

Introduction to SFTP File System catalog sources

You can use Metadata Command Center to extract metadata from a source system.
A source system is any system that contains data or metadata. For example, SFTP File System is a source from which you can extract metadata through an SFTP File System catalog source with Metadata Command Center. A catalog source is an object that represents and contains metadata from the source system.
Before you extract metadata from a source system, you first create and register a catalog source that represents the source system. Then you configure capabilities for the catalog source. A capability is a task that Metadata Command Center can perform, such as metadata extraction, data profiling, data classification, or glossary association.
When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.
The following table describes the capabilities of the catalog source:
Capability
Description
Data Classification
Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of the data. Classifying data can help your organization manage risks, compliance, and data security.
Glossary Association
You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.

Extraction and view process

To extract metadata from a source system, configure the catalog source and run the extraction job in Metadata Command Center. Then view the results in Data Governance and Catalog.
The following image shows the process to extract metadata from a source system:
An image showing the steps required to set up a catalog source for metadata extraction.
After you verify prerequisites, perform the following tasks to extract metadata from SFTP File System:
  1. 1Register a catalog source. Create a catalog source object, select SFTP File System, and then select and test the connection.
  2. 2Configure the catalog source. Specify the runtime environment and configure parameters for metadata extraction. Optionally, add filters to include or exclude source system assets from metadata extraction. You can also configure other capabilities such as data profiling and quality, data classification, or glossary association.
  3. 3Optionally, associate stakeholders. Associate users with technical assets, giving the users permission to perform actions determined by their roles.
  4. 4Run or schedule the catalog source job.
  5. 5Optionally, if the catalog source job generates referenced asset objects, you can assign a connection to referenced source system assets.
  6. You can view the lineage with object references without performing connection assignment. After connection assignment, you can view the objects.
After you run the catalog source job, you view the results in Data Governance and Catalog.

About the SFTP File System catalog source

You can use the SFTP File System catalog source to extract metadata from the SFTP File System source.
SFTP is a secure file transfer protocol that enables you to transfer files over SSH, also known as Secure Shell protocol.

Extracted metadata

You can use the SFTP File System catalog source to extract metadata from the SFTP File System source.
Metadata Command Center extracts metadata from the following objects:
You can extract metadata from the following files:
You can extract workbooks, worksheets, and columns from Microsoft Excel files.
The following table lists the structures associated with the file types that you can extract metadata from:
File Type
Partition structure
AVRO
Single partition, multiple partitions, schema merge
CSV
Single partition, multiple partitions, schema merge
JSON
Single partition
Parquet
Single partition, multiple partitions, schema merge
You can extract metadata from the following Microsoft Excel file types: