Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and extensible architecture that can be plugged into many Hadoop components to manage their metadata in a central repository.
Objects extracted
Metadata Command Center extracts the following metadata from an Apache Atlas source system:
•Atlas Server
•Hive Process
•Sqoop Process
•Calculation
Note: Calculation objects are extracted when there is column-level lineage from one asset to another in Hive and Sqoop processes.
•Spark Application
•Spark Process
The Apache Atlas catalog source extracts data lineage from the following data sources:
•Oracle
•MySQL
•PostgreSQL
•Apache Hive
•Hadoop Distributed File System (HDFS)
•Apache HBase
Note: Metadata Command Center skips extraction of Hive processes and the associated lineage links for the following operation types:
•CREATETABLE
•CREATEVIEW
•CREATE_MATERIALIZED_VIEW
Metadata Command Center extracts folders as reference objects from Hadoop Distributed File System.
Metadata Command Center extracts the following objects as reference objects from Apache Hive:
•Schema
•Table
•View
•External Table
•Column
Field and column objects are extracted when there is column-level lineage from one asset to another in Apache Atlas.
Data lineage
The following lineage data is available for Oracle, Hive, and HDFS data sources:
•Data set
•Data element
For more information about data lineage, see the Working with Assets help.