Hadoop Connection Properties

Property	Description
Name	The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
ID	String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
Description	The description of the connection. Enter a string that you can use to identify the connection. The description cannot exceed 4,000 characters.
Location	The domain where you want to create the connection. Select the domain name.
Type	The connection type. Select Hadoop.

Property	Description
Resource Manager Address	The service within Hadoop that submits requests for resources or spawns YARN applications. Use the following format: <hostname>:<port> Where - <hostname> is the host name or IP address of the Yarn resource manager. - <port> is the port on which the Yarn resource manager listens for remote procedure calls (RPC). For example, enter: myhostame:8032 You can also get the Resource Manager Address property from yarn-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/ The Resource Manager Address appears as the following property in yarn-site.xml: <property> <name>yarn.resourcemanager.address</name> <value>hostname:port</value> <description>The address of the applications manager interface in the Resource Manager.</description> </property> Optionally, if the yarn.resourcemanager.address property is not configured in yarn-site.xml, you can find the host name from the yarn.resourcemanager.hostname or yarn.resourcemanager.scheduler.address properties in yarn-site.xml. You can then configure the Resource Manager Address in the Hadoop connection with the following value: hostname:8032
Default File System URI	The URI to access the default Hadoop Distributed File System. Use the following connection URI: hdfs://<node name>:<port> Where - <node name> is the host name or IP address of the NameNode. - <port> is the port on which the NameNode listens for remote procedure calls (RPC). For example, enter: hdfs://myhostname:8020/ You can also get the Default File System URI property from core-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/ Use the value from the fs.defaultFS property found in core-site.xml. For example, use the following value: <property> <name>fs.defaultFS</name> <value>hdfs://localhost:8020</value> </property> If the Hadoop cluster runs MapR, use the following URI to access the MapR File system: maprfs:///. The Azure HDInsight File System default file system can be Windows Azure Storage Blob (WASB) or Azure Data Lake Store (ADLS). If the cluster uses WASB storage, use the following string to specify the URI: wasb://<container_name>@<account_name>.blob.core.windows.net/<path> where: - <container_name> identifies a specific Azure Storage Blob container. Note: <container_name> is optional. - <account_name> identifies the Azure Storage Blob object. Example: wasb://infabdmoffering1storage.blob.core.windows.net/infabdmoffering1cluster/mr-history If the cluster uses ADLS storage, use the following format to specify the URI: adl://home The following is the fs.defaultFS property as it appears in hdfs-site.xml: <property>fs.defaultFS</property> <value>adl://home</value>

Property

Description

Resource Manager Address

The service within Hadoop that submits requests for resources or spawns YARN applications.

Use the following format:

Where

- <hostname> is the host name or IP address of the Yarn resource manager.
- <port> is the port on which the Yarn resource manager listens for remote procedure calls (RPC).

For example, enter: myhostame:8032

You can also get the Resource Manager Address property from yarn-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/

The Resource Manager Address appears as the following property in yarn-site.xml:

<property>
<name>yarn.resourcemanager.address</name>
<value>hostname:port</value>
<description>The address of the applications manager interface in the Resource Manager.</description>
</property>

Optionally, if the yarn.resourcemanager.address property is not configured in yarn-site.xml, you can find the host name from the yarn.resourcemanager.hostname or yarn.resourcemanager.scheduler.address properties in yarn-site.xml. You can then configure the Resource Manager Address in the Hadoop connection with the following value: hostname:8032

Default File System URI

The URI to access the default Hadoop Distributed File System.

Use the following connection URI:

hdfs://<node name>:<port>

Where

- <node name> is the host name or IP address of the NameNode.
- <port> is the port on which the NameNode listens for remote procedure calls (RPC).

For example, enter: hdfs://myhostname:8020/

You can also get the Default File System URI property from core-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/

Use the value from the fs.defaultFS property found in core-site.xml.

For example, use the following value:

<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:8020</value>
</property>

If the Hadoop cluster runs MapR, use the following URI to access the MapR File system: maprfs:///.

The Azure HDInsight File System default file system can be Windows Azure Storage Blob (WASB) or Azure Data Lake Store (ADLS).

If the cluster uses WASB storage, use the following string to specify the URI:

wasb://<container_name>@<account_name>.blob.core.windows.net/<path>

where:

- <container_name> identifies a specific Azure Storage Blob container.

Note: <container_name> is optional.

- <account_name> identifies the Azure Storage Blob object.

Example:

wasb://infabdmoffering1storage.blob.core.windows.net/infabdmoffering1cluster/mr-history

If the cluster uses ADLS storage, use the following format to specify the URI: adl://home

The following is the fs.defaultFS property as it appears in hdfs-site.xml:

<property>fs.defaultFS</property>
<value>adl://home</value>

Common Attributes - Common Properties

Property	Description
Impersonation User Name	User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Note: You must use user impersonation for the Hadoop connection if the Hadoop cluster uses Kerberos authentication. If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver. If you do not specify a user name, the Hadoop cluster authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
Temporary Table Compression Codec	Hadoop compression library for a compression codec class name.
Codec Class Name	Codec class name that enables data compression and improves performance on temporary staging tables.
Hadoop Connection Custom Properties	Custom properties that are unique to the Hadoop connection. You can specify multiple properties. Use the following format: <property1>=<value> Where - <property1> is a Blaze, Hive, or Hadoop property. - <value> is the value of the Hive or Hadoop property. To specify multiple properties use &: as the property separator. Use custom properties only at the request of Informatica Global Customer Support.

Hive Pushdown - Hive Pushdown Configuration

Property	Description
Environment SQL	SQL commands to set the Hadoop environment. The Data Integration Service executes the environment SQL at the beginning of each Hive script generated in a Hive execution plan. The following rules and guidelines apply to the usage of environment SQL: - Use the environment SQL to specify Hive queries. - Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions. - You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries. - If you use multiple values for the environment SQL, ensure that there is no space between the values. The following sample text shows two values that can be used for the Environment SQL property: set hive.execution.engine='tez';set hive.exec.dynamic.partition.mode='nonstrict';
Database Name	Namespace for tables. Use the name default for tables that do not have a specified database name.
Hive Warehouse Directory on HDFS	The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse: /user/hive/warehouse For Cloudera CDH, if the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the Hadoop cluster. You can get the value for the Hive Warehouse Directory on HDFS from the hive.metastore.warehouse.dir property in hive-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/ For example, use the following value: <property> <name>hive.metastore.warehouse.dir</name> <value>/usr/hive/warehouse </value> <description>location of the warehouse directory</description> </property> For MapR, hive-site.xml is located in the following direcetory: /opt/mapr/hive/<hive version>/conf.

Hive Pushdown - Hive Configuration

Property	Description
Metastore Execution Mode	Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Metastore Database Driver, Username, and Password. For a remote metastore, you must specify only the Remote Metastore URI. You can get the value for the Metastore Execution Mode from hive-site.xml. The Metastore Execution Mode appears as the following property in hive-site.xml: <property> <name>hive.metastore.local</name> <value>true</true> </property> Note: The hive.metastore.local property is deprecated in hive-site.xml for Hive server versions 0.9 and above. If the hive.metastore.local property does not exist but the hive.metastore.uris property exists, and you know that the Hive server has started, you can set the connection to a remote metastore.
Metastore Database URI	The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI: jdbc:<datastore type>://<node name>:<port>/<database name> where - <node name> is the host name or IP address of the data store. - <data store type> is the type of the data store. - <port> is the port on which the data store listens for remote procedure calls (RPC). - <database name> is the name of the database. For example, the following URI specifies a local metastore that uses MySQL as a data store: jdbc:mysql://hostname23:3306/metastore You can get the value for the Metastore Database URI from hive-site.xml. The Metastore Database URI appears as the following property in hive-site.xml: <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://MYHOST/metastore</value> </property>
Metastore Database Driver	Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver: com.mysql.jdbc.Driver You can get the value for the Metastore Database Driver from hive-site.xml. The Metastore Database Driver appears as the following property in hive-site.xml: <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property>
Metastore Database User Name	The metastore database user name. You can get the value for the Metastore Database User Name from hive-site.xml. The Metastore Database User Name appears as the following property in hive-site.xml: <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hiveuser</value> </property>
Metastore Database Password	The password for the metastore user name. You can get the value for the Metastore Database Password from hive-site.xml. The Metastore Database Password appears as the following property in hive-site.xml: <property> <name>javax.jdo.option.ConnectionPassword</name> <value>password</value> </property>
Remote Metastore URI	The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details. Use the following connection URI: thrift://<hostname>:<port> Where - <hostname> is name or IP address of the Thrift metastore server. - <port> is the port on which the Thrift server is listening. For example, enter: thrift://myhostname:9083/ You can get the value for the Remote Metastore URI from hive-site.xml. The Remote Metastore URI appears as the following property in hive-site.xml: <property> <name>hive.metastore.uris</name> <value>thrift://<n.n.n.n>:9083</value> <description> IP address or fully-qualified domain name and port of the metastore host</description> </property>
Engine Type	The engine that the Hadoop environment uses to run a mapping on the Hadoop cluster. Select a value from the drop down list. For example select: MRv2 To set the engine type in the Hadoop connection, you must get the value for the mapreduce.framework.name property from mapred-site.xml located in the following directory on the Hadoop cluster: /etc/hadoop/conf/ If the value for mapreduce.framework.name is classic, select mrv1 as the engine type in the Hadoop connection. If the value for mapreduce.framework.name is yarn, you can select the mrv2 or tez as the engine type in the Hadoop connection. Do not select Tez if Tez is not configured for the Hadoop cluster. You can also set the value for the engine type in hive-site.xml. The engine type appears as the following property in hive-site.xml: <property> <name>hive.execution.engine</name> <value>tez</value> <description>Chooses execution engine. Options are: mr (MapReduce, default) or tez (Hadoop 2 only)</description> </property>
Job Monitoring URL	The URL for the MapReduce JobHistory server. You can use the URL for the JobTracker URI if you use MapReduce version 1. Use the following format: <hostname>:<port> Where - <hostname> is the host name or IP address of the JobHistory server. - <port> is the port on which the JobHistory server listens for remote procedure calls (RPC). For example, enter: myhostname:8021 You can get the value for the Job Monitoring URL from mapred-site.xml. The Job Monitoring URL appears as the following property in mapred-site.xml: <property> <name>mapred.job.tracker</name> <value>myhostname:8021 </value> <description>The host and port that the MapReduce job tracker runs at.</description> </property>

Blaze Engine

Property	Description
Temporary Working Directory on HDFS	The HDFS file path of the directory that the Blaze engine uses to store temporary files. Verify that the directory exists. The YARN user, Blaze engine user, and mapping impersonation user must have write permission on this directory. For example, enter: /blaze/workdir
Blaze Service User Name	The operating system profile user name for the Blaze engine.
Minimum Port	The minimum value for the port number range for the Blaze engine. For example, enter: 12300
Maximum Port	The maximum value for the port number range for the Blaze engine. For example, enter: 12600
Yarn Queue Name	The YARN scheduler queue name used by the Blaze engine that specifies available resources on a cluster. The name is case sensitive.
Blaze Service Custom Properties	Custom properties that are unique to the Blaze engine. You can specify multiple properties. Use the following format: <property1>=<value> Where - <property1> is a Blaze engine optimization property. - <value> is the value of the Blaze engine optimization property. To enter multiple properties, separate each name-value pair with the following text: &:. Use custom properties only at the request of Informatica Global Customer Support.

Spark Engine

Property	Description
Spark HDFS Staging Directory	The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Spark engine user, and mapping impersonation user must have write permission on this directory.
Spark Event Log Directory	Optional. The HDFS file path of the directory that the Spark engine uses to log events. The Data Integration Service accesses the Spark event log directory to retrieve final source and target statistics when a mapping completes. These statistics appear on the Summary Statistics tab and the Detailed Statistics tab of the Monitoring tool. If you do not configure the Spark event log directory, the statistics might be incomplete in the Monitoring tool.
Spark Execution Parameters	An optional list of configuration parameters to apply to the Spark engine. You can change the default Spark configuration properties values, such as spark.executor.memory or spark.driver.cores. Use the following format: <property1>=<value> - <property1> is a Spark configuration property. - <value> is the value of the property. For example, you can configure a YARN scheduler queue name that specifies available resources on a cluster: spark.yarn.queue=TestQ To enter multiple properties, separate each name-value pair with the following text: &: