Hadoop Connector > Hadoop Connections > Hadoop connection properties

Hadoop connection properties

To use Hadoop Connector in a synchronization task, you must configure the connection properties.

Important: Hadoop Connector is deprecated and has been moved to maintenance mode. Informatica intends to drop support in a future release. Informatica recommends that you use Hive Connector to access the Hadoop clusters.

The following table describes the Hadoop connection properties:

Connection property	Description
Username	The username of schema of Hadoop component.
Password	The password of schema of Hadoop component.
JDBC Connection URL	The JDBC URL to connect to the Hadoop Component. Refer JDBC URL
Driver	The JDBC driver class to connect to the Hadoop Component. For more information, see the Setting Hadoop Classpath for various Hadoop Distributions topic.
Commit Interval	The Batch size, in rows, to load data to hive.
Hadoop Installation Path	The Installation path of the Hadoop component. Not applicable to a kerberos cluster.
Hive Installation Path	Hive Installation Path Not applicable to a kerberos cluster.
HDFS Installation Path	The HDFS Installation Path. Not applicable to a kerberos cluster.
HBase Installation Path	The HBase Installation Path. Not applicable to a kerberos cluster.
Impala Installation Path	The Impala Installation Path. Not applicable to a kerberos cluster.
Miscellaneous Library Path	The library that communicates with Hadoop. Not applicable to a kerberos cluster.
Enable Logging	Enable logging enables the log messages. Note: The Enable Logging connection parameter is place-holder for a future release, and its state has no impact on connector functionality.
Hadoop Distribution	The Hadoop distributions for which you can use Kerberos Authentication. You can use Kerberos authentication for the Cloudera and HDP Hadoop distributions.
Authentication Type	You can select native or Kerberos authentication.
Key Tab File	The file that contains encrypted keys and Kerberos principals to authenticate the machine.
Hive Site XML	The directory where the core-site.xml, hive-site.xml and hdfs-site.xml are located. The three XML files must locate in the same location.
Superuser Principle Name	Users assigned to the superuser privilege can perform all the tasks that a user with the administrator privilege can perform.
Impersonation Username	You can enable different users to run mappings in a Hadoop cluster that uses Kerberos authentication or connect to sources and targets that use Kerberos authentication. To enable different users to run mappings or connect to big data sources and targets, you must configure user impersonation.

Note: Installation paths are the paths where you place the Hadoop jar. Hadoop Connector loads the libraries from installation paths before it sends instructions to Hadoop. When you use Kerberos Authentication type, you need not specify the Hadoop installation path, Hive installation path, HDFS installation path, HBase Installation path, Impala installation path, and Miscellaneous Library path.

If you do not use Kerberos Authentication and do not mention the installation path, you can set the Hadoop classpath for Amazon EMR, HortonWorks, MapR and Cloudera.

When you perform an insert operation on non-Kerberos clusters, the Secure Agent uses the hadoop fs -put <FS> <HDFS> command to upload the file to the HDFS and uses the hadoop fs -rm -r <HDFS> command to delete the file from the HDFS. When you enable Kerberos authentication, the Secure Agent does not use the Hadoop commands to write data to or delete data from the HDFS.

JDBC URL

The connector connects to different components of Hadoop with JDBC. The URL format and parameters differ from components to components.

The Hive uses following JDBC URL format:

jdbc:<hive/hive2>://<server>:<port>/<schema>

The significance of URL parameters is discussed below:

•hive/hive2 : Contains the protocol information. The version of the Thrift Server, that is, hive for HiveServer and hive2 for HiveServer2.
•Server, port – server and port information of the Thrift Server.
•Schema – The hive schema which the connector needs to access.

For example, jdbc:hive2://invrlx63iso7:10000/default connects the default schema of Hive, uses a Hive Thrift server HiveServer2 that starts on the server invrlx63iso7 on port 10000.

Hadoop Connecter uses the Hive thrift server to communicate with Hive.

The command to start the Thrift server is –hive –service hiveserver2.

Cloudera Impala uses the JDBC URL in the following format:

jdbc:hive2://<server>:<port>/;auth=<auth mechanism>

JDBC Driver Class

The JDBC Driver class varies among Hadoop components. For example, org.apache.hive.jdbc.HiveDriver for Hive and Impala.

Setting Hadoop classpath for non-Kerberos clusters

For non-Kerberos Hadoop clusters, if you do not mention the installation paths in connection properties, you can still perform the connection operations by setting the class path. You must simply set the classpath for the respective distributions.

This section helps you to set the classpath for Amazon EMR, HortonWorks, Pivotal, and MapR.

Setting Hadoop classpath for Amazon EMR, HortonWorks, Pivotal, and MapR

Perform the following steps to generate the setHadoopConnectorClasspath.sh for Amazon, Horton works, Pivotal, and MapR.

1Use the command displayed in the following image to start the Secure Agent:

2Create the Hadoop connection using the connector.

3Test the connection. This generates the setHadoopConnectorClasspath.sh file in <Secure Agent installation directory>/apps/Data_Integration_Server/<latest DIS version>/ICS/main/tomcat path.

4Stop the Secure Agent.

5From <Secure Agent installation directory>, execute the . ./main/tomcat/setHadoopConnectorClasspath.sh command.

6Start the Secure Agent and execute the synchronization tasks:

Note: If you want to generate the setHadoopConnectorClasspath.sh file again, then delete the existing classpath and regenerate.

In certain cases the Hadoop classpath may point to the incorrect classpath. In this case, you must direct the Hadoop classpath to the correct classpath.

aEnter the command hadoop classpath from the terminal to display the stream of jars.
bCopy the stream of jars in a notepad.
cIf the following entries are present in the notepad file, delete them:

▪ :/opt/mapr/hadoop/hadoop-0.20.2/bin/../hadoop*core*.jar
▪ :/opt/mapr/hadoop/hadoop-0.20.2/bin/../lib/commons-logging-api-1.0.4.jar (retain the latest version and delete the previous)

dCopy the remaining content and export it to a variable called HADOOP_CLASSPATH.
eIn saas-infaagentapp.sh file make the following entry:

fRestart the Secure Agent.