Catalog Source Configuration > Apache Atlas > Introduction to Apache Atlas catalog sources
  

Introduction to Apache Atlas catalog sources

You can use Metadata Command Center to extract metadata from a source system.
A source system is any system that contains data or metadata. For example, Apache Atlas is a source system from which you can extract metadata through an Apache Atlas catalog source with Metadata Command Center. A catalog source is an object that represents and contains metadata from the source system.
Before you extract metadata from a source system, you first create and register a catalog source that represents the source system.
When Metadata Command Center extracts metadata, Data Governance and Catalog displays the extracted metadata and its attributes as technical assets. You can then perform tasks such as analyzing the assets, viewing lineage, and creating links between those assets and their business context.
You can only extract metadata using this catalog source.

Extraction and view process

To extract metadata from a source system, configure the catalog source and run the extraction job in Metadata Command Center. Then view the results in Data Governance and Catalog.
The following image shows the process to extract metadata from an Apache Atlas source system:
The image shows the process of metadata extraction from an Apache Atlas source system that begins with prerequisites verification, continues with the creation of the catalog source, and ends with viewing the results and lineage.
After you verify prerequisites, perform the following tasks to extract metadata from Apache Atlas:
  1. 1Register a catalog source. Create a catalog source object, select the source system, and specify values for connection properties.
  2. 2Configure the catalog source. Specify the runtime environment, optionally configure parameters for the metadata extraction capability, and add filters for metadata extraction.
  3. 3Associate stakeholders. Optionally, associate users with technical assets, giving the users permission to perform actions determined by their roles.
  4. 4Run or schedule the catalog source job.
  5. 5Optionally, assign a connection to referenced source system assets.
After you run the catalog source job, you view the results in Data Governance and Catalog.

About the Apache Atlas catalog source

You can use the Apache Atlas catalog source to extract metadata from an Apache Atlas source system.
Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and extensible architecture that can be plugged into many Hadoop components to manage their metadata in a central repository.

Extracted metadata

You can extract metadata from an Apache Atlas source system.

Objects extracted

Metadata Command Center extracts the following metadata from an Apache Atlas source system:
The Apache Atlas catalog source extracts data lineage from the following data sources:
Note: Metadata Command Center skips extraction of Hive processes and the associated lineage links for the following operation types:
Metadata Command Center extracts folders as reference objects from Hadoop Distributed File System.
Metadata Command Center extracts the following objects as reference objects from Apache Hive:
Field and column objects are extracted when there is column-level lineage from one asset to another in Apache Atlas.