Big Data Management Administrator Guide > Connections > Configuring Hadoop Connection Properties

Configuring Hadoop Connection Properties

When you create a Hadoop connection, default values are assigned to cluster environment variables, cluster path properties, and advanced properties. You can add or edit values for these properties. You can also reset to default values.

You can configure the following Hadoop connection properties based on the cluster environment and functionality that you use:

•Cluster Environment Variables
•Cluster Library Path
•Cluster ClassPath
•Cluster Executable Path
•Common Advanced Properties
•Hive Engine Advanced Properties
•Blaze Engine Advanced Properties
•Spark Engine Advanced Properties

To reset to default values, delete the property values. For example, if you delete the values of an edited Cluster Library Path property, the value resets to the default $DEFAULT_CLUSTER_LIBRARY_PATH.

Cluster Environment Variables

Cluster Environment Variables property lists the environment variables that the cluster uses. Each environment variable contains a name and a value. You can add environment variables or edit environment variables.

To edit the property in the text box, use the following format with &: to separate each name-value pair:

<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]

Configure the following environment variables in the Cluster Environment Variables property:

HADOOP_NODE_JDK_HOME: Represents the directory from which you run the cluster services and the JDK version that the cluster nodes use. Required to run the Java transformation in the Hadoop environment and Sqoop mappings on the Blaze engine. You must use JDK version 1.7 or later. Default is /usr/java/default. The JDK version that the Data Integration Service uses must be compatible with the JRE version on the cluster.

DB2_HOME: Specifies the DB2 home directory. Required to run mappings with DB2 sources and targets on the Hive engine.

DB2INSTANCE

DB2CODEPAGE

GPHOME_LOADERS

PYTHONPATH

NZ_HOME

NZ_ODBC_INI_PATH

ODBCINI

ODBC_HOME

ORACLE_HOME

TNS_ADMIN

HADOOP_CLASSPATH

Cluster Library Path

Cluster Library Path property is a list of path variables for shared libraries on the cluster. You can add or edit library path variables.

To edit the property in the text box, use the following format with : to separate each path variable:

<variable1>[:<variable2>…:<variableN]

Configure the following library path variables in the Cluster Library Path property:

$DB2_HOME/lib64

$GPHOME_LOADERS/lib

$GPHOME_LOADERS/ext/python/lib

$NZ_HOME/lib64

$ORACLE_HOME/lib

/usr/lib/tdch/1.5/lib/*

Cluster ClassPath

Cluster ClassPath property is a list of classpath variables to access the Hadoop jar files and the required libraries on the cluster. You can add or edit classpath variables.

To edit the property in the text box, use the following format with : to separate each path variable:

<variable1>[:<variable2>…:<variableN]

Configure the following classpath variable in the Cluster ClassPath property:

/usr/lib/tdch/1.5/lib/*

Cluster Executable Path

Cluster Executable Path property is a list of path variables to access executable files on the cluster. You can add or edit executable path variables.

To edit the property in the text box, use the following format with : to separate each path variable:

<variable1>[:<variable2>…:<variableN]

Configure the following library path variables in the Cluster Executable Path property:

$DB2_HOME/bin

$GPHOME_LOADERS/bin

$GPHOME_LOADERS/ext/python/bin

$ORACLE_HOME/bin

Common Advanced Properties

Common advanced properties are a list of advanced or custom properties that are unique to the Hadoop environment. The properties are common to the Blaze, Spark, and Hive engines. Each property contains a name and a value. You can add or edit advanced properties.

To edit the property in the text box, use the following format with &: to separate each name-value pair:

<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]

Configure the following property in the Advanced Properties of the common properties section:

infapdo.java.opts

Hive Engine Advanced Properties

Hive advanced properties are a list of advanced or custom properties that are unique to the Hive engine. Each property contains a name and a value. You can add or edit advanced properties.

To edit the property in the text box, use the following format with &: to separate each name-value pair:

<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]

Blaze Engine Advanced Properties

Blaze advanced properties are a list of advanced or custom properties that are unique to the Blaze engine. Each property contains a name and a value. You can add or edit advanced properties.

To edit the property in the text box, use the following format with &: to separate each name-value pair:

<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]

Configure the following properties in the Advanced Properties of the Blaze configuration section:

infagrid.cadi.namespace

infagrid.blaze.console.jsfport

infagrid.blaze.console.httpport

infagrid.node.local.root.log.dir

infacal.hadoop.logs.directory: Path in HDFS for the persistent Blaze logs. Default is /var/log/hadoop-yarn/apps/informatica. Required to set up multiple Blaze instances.

Spark Engine Advanced Properties

Spark advanced properties are a list of advanced or custom properties that are unique to the Spark engine. Each property contains a name and a value. You can add or edit advanced properties.

To edit the property in the text box, use the following format with &: to separate each name-value pair:

<name1>=<value1>[&:<name2>=<value2>…&:<nameN>=<valueN>]

Configure the following properties in the Advanced Properties of the Spark configuration section:

spark.scheduler.maxRegisteredResourcesWaitingTime

spark.scheduler.minRegisteredResourcesRatio

spark.shuffle.encryption.enabled

spark.authenticate

spark.authenticate.enableSaslEncryption

spark.authenticate.sasl.encryption.aes.enabled

infaspark.pythontx.executorEnv.LD_PRELOAD

infaspark.pythontx.submit.lib.JEP_HOME

infaspark.executor.extraJavaOptions

infaspark.driver.cluster.mode.extraJavaOptions