Enterprise Data Catalog Scanner Configuration Guide > Configuring Hadoop Multi-Part File Handler Resources > Hadoop Multi-Part File Handler
  

Hadoop Multi-Part File Handler

Use the Hadoop Multi-Part File Handler resource to fetch lineage from a combined multi-part file to a relational target.

Prerequisites

Perform the following steps to complete the prerequisites:

Resource Connection Properties

The General tab includes the following properties:
Property
Description
HDFS Resource Name
Name of the HDFS resource.
Source Relational Resource Name
Name of the relational source.
Target Relational Resource Name
Name of the target relational source.
The Metadata Load Settings tab includes the following properties:
Property
Description
Enable Source Metadata
Extracts metadata from the data source.
Memory
The memory required to run the scanner job. Select one of the following values based on the data set size imported:
  • - Low
  • - Medium
  • - High
Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
JVM Options
JVM parameters that you can set to configure the scanner container. Use the following arguments to configure the parameters:
  • - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO.
  • - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number.
  • - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value.
  • - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.