You can use Metadata Command Center to extract metadata from a source system.
A source system is any system that contains data or metadata. For example, Snowflake is a source system from which you can extract metadata through a Snowflake catalog source. A catalog source is an object that represents and contains metadata from the source system.
Before you extract metadata from a source system, you first create and register a catalog source that represents the source system. Then you configure capabilities for the catalog source. A capability is a task that Metadata Command Center can perform, such as metadata extraction, data profiling, data classification, or glossary association.
When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.
The following table describes the capabilities of the catalog source:
Capability
Description
Incremental metadata extraction
An incremental metadata extraction extracts only the changed and new objects since the last catalog source job run. Incremental metadata extraction doesn’t remove deleted objects from the catalog and doesn’t extract metadata of code-based objects if applicable.
Serverless Runtime Environment
A serverless runtime environment is an advanced serverless deployment solution that doesn't require downloading, installing, configuring, or maintaining a Secure Agent or Secure Agent group. You can use a serverless runtime environment in the same way that you use a Secure Agent when you configure a catalog source.
Advanced Programming Language Parsing
Advanced Programming Language Parsing parses the source system code in addition to extracting objects from the source system.
Data Profiling and Quality
- Data Profiling. Assesses source metadata and analyzes the collected statistics to discover content and structure, such as value distribution, patterns, and data types.
- Data Quality. Measures the reliability of the data and enables data usage.
- Data Observability. Identifies anomalies in the characteristics of the data.
Data Classification
Data classification is the process of identifying and organizing data into relevant categories based on the functional meaning of the data. Classifying data can help your organization manage risks, compliance, and data security.
Relationship Discovery
The relationship discovery capability identifies pairs of similar columns and relationships between tables within a catalog source.
Glossary Association
You can associate terms that are in the glossary with technical assets to provide user-friendly business names to technical assets. Glossary Association automatically associates glossary terms with technical assets or recommends glossary terms that you can manually associate with technical assets in Data Governance and Catalog.
Extraction and view process
To extract metadata from a source system, configure the catalog source and run the catalog source job in Metadata Command Center. Then view the results in Data Governance and Catalog.
The following image shows the process to extract metadata from a source system:
After you verify prerequisites, perform the following tasks to extract metadata from Snowflake:
1Register a catalog source. Create a catalog source object, select the source system, and specify values for connection properties.
2Configure the catalog source. Specify the runtime environment and configure parameters for metadata extraction. Optionally add filters to include or exclude source system assets from metadata extraction. You can also configure other capabilities such as data profiling and quality.
3Optionally, associate stakeholders. Associate users with technical assets, giving the users permission to perform actions determined by their roles.
4Run or schedule the catalog source job.
5If the catalog source job generates referenced asset objects, assign a connection to referenced source system assets.
After you run the catalog source job, you view the results in Data Governance and Catalog.
About the Snowflake catalog source
You can use the Snowflake catalog source to extract metadata from the Snowflake source system.
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS). The Snowflake data warehouse uses an SQL database engine with a unique architecture designed for cloud services.
Extracted metadata
You can use the Snowflake catalog source to extract metadata from a Snowflake source system.
The metadata extraction service extracts the following objects from a Snowflake source system:
•Database
•Schema
•Tables
•Tags
•View
•Materialized View
Note: Objects of the Materialized View type appear as View in Data Governance and Catalog.
•Function
•Stored Procedure
•Pipe
•Stage
•Column
You can extract metadata from stored procedures that use the following languages:
•JavaScript
•Snowflake SQL scripting
•Snowpark Python
Note: Effective in the 2024.11.S release, extracting metadata from stored procedures that use Snowpark Python is available for preview.
Preview functionality is supported for evaluation purposes but is unwarranted and is not supported in production environments or any environment that you plan to push to production. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support.
Secure Data Sharing in Snowflake allows you to share selected database assets with other Snowflake accounts without copying or transferring any actual data. To share data, providers create a share of their database, select specific objects to include, and then add consumer accounts to this share.
You can extract metadata from the following database assets shared through Snowflake Secure Data Sharing:
•Database
•Table
•External table
•Secure view
•Secure materialized view
To view the complete data lineage and all the metadata extracted for the consumer account, including shared assets, perform a connection assignment between catalog sources created for the provider and consumer Snowflake accounts.
Data profiling for Snowflake objects
Configure data profiling to run profiles on the metadata extracted from a Snowflake source system.
You can run data profiles on the following Snowflake objects:
•Table
•View
The data profiling task runs profiles on the following data types for Snowflake objects:
•NUMBER
•FLOAT
•DOUBLE
•VARCHAR
•BOOLEAN
•DATE
•TIME
•TIMESTAMP_LTZ
•TIMESTAMP_NTZ
•TIMESTAMP_TZ
•OBJECT
•ARRAY
•VARIANT
Compatible connectors
Before you configure a Snowflake catalog source, you must connect to the Snowflake source system.
Use the Snowflake Data Cloud connector to connect to the Snowflake source system. For information about configuring a connection, see Connections in the Administrator help.