Enterprise Data Catalog Scanner Configuration Guide > Configuring Data Engineering Resources > Apache Atlas
  

Apache Atlas

Apache Atlas is the governance and metadata framework for Hadoop. Apache Atlas has a scalable and extensible architecture that can be plugged into many Hadoop components to manage their metadata in a central repository.

Objects Extracted

The Apache Atlas resource extracts data object links from the following data sources:

Prerequisites

If the cluster is Kerberos-enabled, perform the following steps:
  1. 1. Copy the krb5.conf and keytab files from the cluster to the following locations on the Informatica domain:
  2. 2. Specify the krb5.conf file path in the Informatica domain and the Informatica Cluster Service nodes.
  3. 3. Add the Service Principal Name (SPN) and the keytab properties to the Data Integration Service properties.
  4. 4. Run the kinit command on the Informatica domain with the required SPN and keytab file.
  5. 5. Run the kinit command on the cluster with the required SPN and keytab file.
  6. 6. Restart the Data Integration Service and recycle the Catalog Service.

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

The following table describes the connection properties for the Atlas resource type:
Property
Description
URL
URL to access Apache Atlas.
Authentication
Select one of the following options to specify the authentication type configured for Apache Atlas:
  • - Simple. Specify the following parameters:
    • - Login. Specify the username configured to access Apache Atlas.
    • - Password. Specify the password configured to access Apache Atlas.
  • - Kerberos. Specify the following parameters:
    • - Kerberos configuration file. Click Choose to select and upload the Kerberos configuration file used for authentication.
    • - Kerberos Keytab file. Click Choose to select and upload the Kerberos keytab file used for authentication.
    • - Principal. Specify the Kerberos principal used for authentication.
Entities filter
Filters the entities set that the Atlas Bridge uses to import metadata and build lineage. The filter uses Atlas basic or Advanced search syntax.
Lineage direction
Allows you to set lineage extract direction.
The following image shows sample connection properties on the General tab:
The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:
Property
Description
Enable Source Metadata
Select to extract metadata from the data source.
Auto Assign Connections
Select this option to specify that the connection must be assigned automatically.
Enable Reference Resources
Option to extract metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports.
Retain Unresolved Reference Assets
Option to retain unresolved reference assets in the catalog after you assign connections. Retaining unresolved reference assets help you view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource.
Memory
Specify the memory value required to run a scanner job.
Specify one of the following memory values:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
Custom Options
JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.
Agent Options
Specify the Enterprise Data Catalog Agent options to run the scanner job. Use the following arguments to configure the parameters:
  • - -entities.limit <max number of entities>. Specifies the maximum number of entities that the entities filter returns during filtering For example: -entities.limit 500
  • - -request.limit <number of entities per one request> Specifies the number of entities that the entities filter obtains for one offset request. For example: -request.limit 500