Hadoop Multi-Part File Handler

Use the Hadoop Multi-Part File Handler resource to fetch lineage from a combined multi-part file to a relational target.

Prerequisites

Perform the following steps to complete the prerequisites:

•In the Data Integration Service where the data engineering mappings are deployed, set the HDFSRetainOriginialTargetFile custom property value as True.

•Configure and run the relational and the HDFS metadata sources that are used to create the data engineering mappings.

•Configure and run the Infomatica Platform metadata resource.

•For the required relational and HDFS metadata sources, assign the connections using the Connection Assignment option in the Catalog.

Resource Connection Properties

The General tab includes the following properties:

Property	Description
HDFS Resource Name	Name of the HDFS resource.
Source Relational Resource Name	Name of the relational source.
Target Relational Resource Name	Name of the target relational source.

The Metadata Load Settings tab includes the following properties:

Property	Description
Enable Source Metadata	Extracts metadata from the data source.
Memory	The memory required to run the scanner job. Select one of the following values based on the data set size imported: - Low - Medium - High Note: For more information about the memory values, see the Tuning Enterprise Data Catalog Performance article on How To-Library Articles tab in the Informatica Doc Portal
JVM Options	JVM parameters that you can set to configure the scanner container. Use the following arguments to configure the parameters: - -Dscannerloglevel=<DEBUG/INFO/ERROR>. Changes the log level of scanner to values, such as DEBUG, ERROR, or INFO. Default value is INFO. - -Dscanner.container.core=<No. of core>. Increases the core for the scanner container. The value should be a number. - -Dscanner.yarn.app.environment=<key=value>. Key pair value that you need to set in the Yarn environment. Use a comma to separate the key pair value. - -Dscanner.pmem.enabled.container.memory.jvm.memory.ratio=<1.0/2.0>. Increases the scanner container memory when pmem is enabled. Default value is 1.