Installation and Configuration Guide > Configuring Big Data Management to Run Mappings in Hadoop Environments > Perform Sqoop Configuration Tasks

Perform Sqoop Configuration Tasks

Before you run Sqoop mappings, you must perform the following configuration tasks:

1. Download the JDBC driver JAR files for Sqoop connectivity.
2. Configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties file.
3. Configure the mapred-site.xml file for Cloudera clusters.
4. Configure the yarn-site.xml file for Cloudera Kerberos clusters.
5. Configure the mapred-site.xml file for Cloudera Kerberos non-HA clusters.
6. Configure the core-site.xml file for Ambari-based non-Kerberos clusters.

Download the JDBC Driver JAR Files for Sqoop Connectivity

To configure Sqoop connectivity for relational databases, you must download the relevant JDBC driver jar files and copy the jar files to the node where the Data Integration Service runs. At run time, the Data Integration Service copies the jar files to the Hadoop distribution cache so that the jar files are accessible to all nodes in the Hadoop cluster.

You can use any Type 4 JDBC driver that the database vendor recommends for Sqoop connectivity.

Note: The DataDirect JDBC drivers that Informatica ships are not licensed for Sqoop connectivity.

If you use the Cloudera Connector Powered by Teradata or Hortonworks Connector for Teradata, you must download additional JAR files and copy them to the node where the Data Integration Service runs.

1. Download the JDBC driver jar files for the database that you want to connect to.

2. If you use the Cloudera Connector Powered by Teradata, perform the following steps:

a. Download the Cloudera Connector Powered by Teradata package from the following URL:

http://www.cloudera.com/downloads.html

The package is named as sqoop-connector-teradata-<version>.tar.gz. Download all the jar files in the package.

b. Download the terajdbc4.jar file and tdgssconfig.jar file from the following URL:

http://downloads.teradata.com/download/connectivity/jdbc-driver

3. If you use the Hortonworks Connector for Teradata, perform the following steps:

a. Download the Hortonworks Connector for Teradata package from the following URL:

http://hortonworks.com/downloads/#addons

The package is named as hdp-connector-for-teradata-<version>-distro.tar.gz. Download all the jar files in the package.

b. Download the avro-mapred-1.7.4-hadoop2.jar file from the following URL:

https://archive.apache.org/dist/avro/avro-1.7.4/java/

4. On the node where the Data Integration Service runs, copy all the JAR files mentioned in the earlier steps to the following directory:

<Informatica installation directory>\externaljdbcjars

Configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties File

Before you run Sqoop mappings, you must configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties file on the Data Integration Service node. Configure the HADOOP_NODE_JDK_HOME property to point to the JDK version that the cluster nodes use. You must use JDK version 1.7 or later.

1. Go to the following location:

<Informatica installation directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf

2. Find the file named hadoopEnv.properties.

3. Back up the file before you update it.

4. Use a text editor to open the file.

5. Define the HADOOP_NODE_JDK_HOME property as follows:

infapdo.env.entry.hadoop_node_jdk_home=HADOOP_NODE_JDK_HOME=<cluster_JDK_home>/jdk<version>

For example, infapdo.env.entry.hadoop_node_jdk_home=HADOOP_NODE_JDK_HOME=/usr/java/default

6. Save the properties file with the name hadoopEnv.properties.

Configure the mapred-site.xml File for Cloudera Clusters

Before you run Sqoop mappings on Cloudera clusters, you must configure MapReduce properties in the mapred-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.

1. Open the Yarn Configuration in Cloudera Manager.

2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapred-site.xml.

3. Click + and configure the following properties:

Property	Value
mapreduce.application.classpath	$HADOOP_MAPRED_HOME/,$HADOOP_MAPRED_HOME/lib/,$MR2_CLASSPATH,$CDH_MR2_HOME
mapreduce.jobhistory.intermediate-done-dir	<Directory where the map-reduce jobs write history files>

4. Select the Final check box.

5. Redeploy the client configurations.

6. Restart Hadoop services and the cluster.

Configure the yarn-site.xml File for Cloudera Kerberos Clusters

To run Sqoop mappings on Cloudera clusters that use Kerberos authentication, you must configure properties in the yarn-site.xml file on the Data Integration Service node and restart the Data Integration Service.

Copy the following properties from the mapred-site.xml file on the cluster and add them to the yarn-site.xml file on the Data Integration Service node:

mapreduce.jobhistory.address
mapreduce.jobhistory.principal
mapreduce.jobhistory.webapp.address: Web address of the MapReduce JobHistory Server. The default value is 19888.
mapreduce.application.classpath

Configure the mapred-site.xml File for Cloudera Kerberos non-HA Clusters

Before you run Sqoop mappings on the Spark and Blaze engines, and on Cloudera Kerberos clusters that are not enabled with NameNode high availability, you must configure the mapreduce.jobhistory.address property in the mapred-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.

1. Open the Yarn Configuration in Cloudera Manager.

2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapred-site.xml.

3. Click +.

4. Enter the name as mapreduce.jobhistory.address.

5. Set the value as follows: <MapReduce JobHistory Server hostname>:<port>

6. Select the Final check box.

7. Redeploy the client configurations.

8. Restart Hadoop services and the cluster.

Configure the core-site.xml File for Ambari-based non-Kerberos Clusters

To run Sqoop mappings on IBM BigInsights, Hortonworks HDP, or Azure HDInsight clusters that do not use Kerberos authentication, you must create a proxy user for the yarn user who will impersonate other users. You must configure the impersonation properties in the core-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.

Configure the following user impersonation properties in the core-site.xml file:

hadoop.proxyuser.yarn.groups
hadoop.proxyuser.yarn.hosts