Cloudera Navigator
Cloudera Navigator is a fully integrated data management and security system for the Hadoop platform.
Objects Extracted
The Cloudera Navigator resource extracts Hive, Impala, and Spark operations.
Permissions to Configure the Resource
Before you create a Cloudera Navigator resource, you must configure the Java heap size for the Cloudera Navigator server and the maximum heap size for Catalog Service. If you do not correctly configure the heap sizes, the metadata load can fail.
Configure the following heap sizes:
- •Java heap size for the Navigator server. Before you create a Cloudera Navigator resource, set the Java heap size for the Cloudera Navigator Server to at least 2 GB. If the heap size is not sufficient, the resource load fails with a connection refused error.
- •Maximum heap size for the Catalog Service. Before you create a Cloudera Navigator resource, open the Administrator tool and check the value of the Max Heap Size property for the Catalog Service. Set the maximum heap size to at least 4096 MB (4 GB).
If you perform simultaneous resource loads, increase the maximum heap size by at least 1024 MB (1 GB) for each resource load. For example, to load two Cloudera Navigator resources simultaneously, increase the maximum heap size by 2048 MB (2 GB). Therefore, you would set Max Heap Size to at least 6144 MB.
Note: Some Cloudera distributions might require a maximum heap size larger than 4 GB. If the metadata load fails with an out of memory error, increase the maximum heap size.
On the Cloudera Navigator data source, assign the Navigator Administartor permission for the user account that you use to access the data source. This is the minimum permission required to extract metadata from the Cloudera Navigator data source.
Basic Information
The General tab includes the following basic information about the resource:
Information | Description |
---|
Name | The name of the resource. |
Description | The description of the resource. |
Resource type | The type of the resource. |
Execute On | You can choose to execute on the default catalog server or offline. |
Resource Connection Properties
Configure the connection properties when you create or edit a Cloudera Navigator resource.
The following table describes the connection properties:
Property | Description |
---|
Navigator URL | URL of the Cloudera Navigator Server. You can provide the following options in the URL to tune the resource: - - searchLimit. Specifies the Solr search limit. Use this parameter if you want to restrict the number of objects retrieved from Cloudera Navigator. Default is 10000.
- - maximumPoolSize. Specifies the maximum number of connections to Cloudera Navigator. Default is 20. Minimum is 10.
- - queueSize. Specifies the number of tasks held in the queue before the tasks are run. Default is five times the value of maximumPoolSize. A higher value for the queueSize option requires more memory.
- - disableIncremental. Set this option to true to disable incremental scans by the resource.
Provide the URL with the tuning options as shown in the following sample format: http://<host name:<port>>/?searchLimit=10000&maximumPoolSize=15&queueSize=1000&disableIncremental=true |
User | Name of the user account that connects to Cloudera Navigator. |
Password | Password for the user account that connects to Cloudera Navigator. |
The following image shows sample connection properties on the General tab:
The following table describes the Additional and Advanced properties for source metadata settings on the Metadata Load Settings tab:
Property | Description |
---|
Enable Source Metadata | Select to extract metadata from the data source. |
Auto Assign Connections | Select this option to specify that the connection must be assigned automatically. |
Enable Reference Resources | Option to extract metadata about assets that are not included in this resource, but referred to in the resource. Examples include source and target tables in PowerCenter mappings, and source tables and files from Tableau reports. |
Retain Unresolved Reference Assets | Option to retain unresolved reference assets in the catalog after you assign connections. Retaining unresolved reference assets help you view the complete lineage. The unresolved assets include deleted files, temporary tables, and other assets that are not present in the primary resource. |
Hive Database | Name of the Hive database or a schema from where you want to import a table. |
Detailed Lineage | Select to extract and ingest metadata related to transformation logic for assets that include transformations. - Transformation
- A transformation indicates generation, modification, or passage of data between source and target connections.
- Transformation logic
- A transformation logic displays the mappings or data flow relation types between source assets and target assets related to the asset you select in Enterprise Data Catalog.
|
Memory | Specify the memory value required to run a scanner job. Specify one of the following memory values: Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal. |
Custom Options | JVM parameters that you can set to configure scanner container. Use the following arguments to configure the parameters: - - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
- - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
- - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value.
- - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. The default value is 1.
|