Enterprise Data Catalog Scanner Configuration Guide > Configuring Data Engineering Resources > Cloudera Navigator
  

Cloudera Navigator

Cloudera Navigator is a fully integrated data management and security system for the Hadoop platform.

Objects Extracted

The Cloudera Navigator resource extracts Hive, Impala, and Spark operations.

Permissions to Configure the Resource

Before you create a Cloudera Navigator resource, you must configure the Java heap size for the Cloudera Navigator server and the maximum heap size for Catalog Service. If you do not correctly configure the heap sizes, the metadata load can fail.
Configure the following heap sizes:
If you perform simultaneous resource loads, increase the maximum heap size by at least 1024 MB (1 GB) for each resource load. For example, to load two Cloudera Navigator resources simultaneously, increase the maximum heap size by 2048 MB (2 GB). Therefore, you would set Max Heap Size to at least 6144 MB.
Note: Some Cloudera distributions might require a maximum heap size larger than 4 GB. If the metadata load fails with an out of memory error, increase the maximum heap size.
On the Cloudera Navigator data source, assign the Navigator Administartor permission for the user account that you use to access the data source. This is the minimum permission required to extract metadata from the Cloudera Navigator data source.

Basic Information

The General tab includes the following basic information about the resource:
Information
Description
Name
The name of the resource.
Description
The description of the resource.
Resource type
The type of the resource.
Execute On
You can choose to execute on the default catalog server or offline.

Resource Connection Properties

Configure the connection properties when you create or edit a Cloudera Navigator resource.
The following table describes the connection properties:
Property
Description
Navigator URL
URL of the Cloudera Navigator Server.
You can provide the following options in the URL to tune the resource:
  • - searchLimit. Specifies the Solr search limit. Use this parameter if you want to restrict the number of objects retrieved from Cloudera Navigator. Default is 10000.
  • - maximumPoolSize. Specifies the maximum number of connections to Cloudera Navigator. Default is 20. Minimum is 10.
  • - queueSize. Specifies the number of tasks held in the queue before the tasks are run. Default is five times the value of maximumPoolSize. A higher value for the queueSize option requires more memory.
  • - disableIncremental. Set this option to true to disable incremental scans by the resource.
Provide the URL with the tuning options as shown in the following sample format: http://<host name:<port>>/?searchLimit=10000&maximumPoolSize=15&queueSize=1000&disableIncremental=true
User
Name of the user account that connects to Cloudera Navigator.
Password
Password for the user account that connects to Cloudera Navigator.
The following image shows sample connection properties on the General tab:
The image displays the connection properties for a Cloudera Navigator resource.
The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:
Property
Description
Enable Source Metadata
Select to extract metadata from the data source.
Auto Assign Connections
Select this option to specify that the connection must be assigned automatically.
Enable Reference Resources
Option to extract metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports.
Retain Unresolved Reference Assets
Option to retain unresolved reference assets in the catalog after you assign connections. Retaining unresolved reference assets help you view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource.
Hive Database
Name of the Hive database or a schema from where you want to import a table.
Detailed Lineage
Select to extract and ingest metadata related to transformation logic for assets that include transformations.
Transformation
A transformation indicates generation, modification, or passage of data between source and target connections.
Transformation logic
A transformation logic displays the mappings or data flow relation types between source assets and target assets related to the asset you select in Enterprise Data Catalog.
Memory
Specify the memory value required to run a scanner job.
Specify one of the following memory values:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal.
Custom Options
JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.