You can use Metadata Command Center to extract metadata from a source system.
A source system is any system that contains data or metadata. For example, Apache Atlas is a source system from which you can extract metadata through an Apache Atlas catalog source with Metadata Command Center. A catalog source is an object that represents and contains metadata from the source system.
Before you extract metadata from a source system, you first create and register a catalog source that represents the source system.
When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.
You can only extract metadata using this catalog source.
Extraction and view process
To extract metadata from a source system, configure the catalog source and run the extraction job in Metadata Command Center. Then view the results in Data Governance and Catalog.
The following image shows the process to extract metadata from an Apache Atlas source system:
After you verify prerequisites, perform the following tasks to extract metadata from Apache Atlas:
1Register a catalog source. Create a catalog source object, select the source system, and specify values for connection properties.
2Configure the catalog source. Specify the runtime environment, optionally configure parameters for the metadata extraction capability, and add filters for metadata extraction.
3Associate stakeholders. Optionally, associate users with technical assets, giving the users permission to perform actions determined by their roles.
4Run or schedule the catalog source job.
5Optionally, assign a connection to referenced source system assets.
After you run the catalog source job, you view the results in Data Governance and Catalog.
About the Apache Atlas catalog source
You can use the Apache Atlas catalog source to extract metadata from an Apache Atlas source system.
Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and extensible architecture that can be plugged into many Hadoop components to manage their metadata in a central repository.
Extracted metadata
You can extract metadata from an Apache Atlas source system.
Objects extracted
Metadata Command Center extracts the following metadata from an Apache Atlas source system:
•Atlas Server
•Hive Process
•Sqoop Process
•Calculation
Note: Calculation objects are extracted when there is column-level lineage from one asset to another in Hive and Sqoop processes.
•Spark Application
•Spark Process
The Apache Atlas catalog source extracts data lineage from the following data sources:
•Oracle
•MySQL
•PostgreSQL
•Apache Hive
•Hadoop Distributed File System (HDFS)
•Apache HBase
Note: Metadata Command Center skips extraction of Hive processes and the associated lineage links for the following operation types:
•CREATETABLE
•CREATEVIEW
•CREATE_MATERIALIZED_VIEW
Metadata Command Center extracts folders as reference objects from Hadoop Distributed File System.
Metadata Command Center extracts the following objects as reference objects from Apache Hive:
•Schema
•Table
•View
•External Table
•Column
Field and column objects are extracted when there is column-level lineage from one asset to another in Apache Atlas.