Perform Sqoop Configuration Tasks
Before you run Sqoop mappings, you must perform the following configuration tasks:
- 1. Download the JDBC driver JAR files for Sqoop connectivity.
- 2. Configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties file.
- 3. Configure the mapred-site.xml file for Cloudera clusters.
- 4. Configure the yarn-site.xml file for Cloudera Kerberos clusters.
- 5. Configure the mapred-site.xml file for Cloudera Kerberos non-HA clusters.
- 6. Configure the core-site.xml file for Ambari-based non-Kerberos clusters.
Download the JDBC Driver JAR Files for Sqoop Connectivity
To configure Sqoop connectivity for relational databases, you must download the relevant JDBC driver jar files and copy the jar files to the node where the Data Integration Service runs. At run time, the Data Integration Service copies the jar files to the Hadoop distribution cache so that the jar files are accessible to all nodes in the Hadoop cluster.
You can use any Type 4 JDBC driver that the database vendor recommends for Sqoop connectivity.
Note: The DataDirect JDBC drivers that Informatica ships are not licensed for Sqoop connectivity.
If you use the Cloudera Connector Powered by Teradata or Hortonworks Connector for Teradata, you must download additional JAR files and copy them to the node where the Data Integration Service runs.
1. Download the JDBC driver jar files for the database that you want to connect to.
2. If you use the Cloudera Connector Powered by Teradata, perform the following steps:
- a. Download the Cloudera Connector Powered by Teradata package from the following URL:
The package is named as sqoop-connector-teradata-<version>.tar.gz. Download all the jar files in the package.
- b. Download the terajdbc4.jar file and tdgssconfig.jar file from the following URL:
3. If you use the Hortonworks Connector for Teradata, perform the following steps:
- a. Download the Hortonworks Connector for Teradata package from the following URL:
The package is named as hdp-connector-for-teradata-<version>-distro.tar.gz. Download all the jar files in the package.
- b. Download the avro-mapred-1.7.4-hadoop2.jar file from the following URL:
4. On the node where the Data Integration Service runs, copy all the JAR files mentioned in the earlier steps to the following directory:
<Informatica installation directory>\externaljdbcjars
Configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties File
Before you run Sqoop mappings, you must configure the HADOOP_NODE_JDK_HOME property in the hadoopEnv.properties file on the Data Integration Service node. Configure the HADOOP_NODE_JDK_HOME property to point to the JDK version that the cluster nodes use. You must use JDK version 1.7 or later.
1. Go to the following location:
<Informatica installation directory>/services/shared/hadoop/<Hadoop_distribution_name>_<version_number>/infaConf
2. Find the file named hadoopEnv.properties.
3. Back up the file before you update it.
4. Use a text editor to open the file.
5. Define the HADOOP_NODE_JDK_HOME property as follows:
infapdo.env.entry.hadoop_node_jdk_home=HADOOP_NODE_JDK_HOME=<cluster_JDK_home>/jdk<version>
For example, infapdo.env.entry.hadoop_node_jdk_home=HADOOP_NODE_JDK_HOME=/usr/java/default
6. Save the properties file with the name hadoopEnv.properties.
Configure the mapred-site.xml File for Cloudera Clusters
Before you run Sqoop mappings on Cloudera clusters, you must configure MapReduce properties in the mapred-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.
1. Open the Yarn Configuration in Cloudera Manager.
2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapred-site.xml.
3. Click + and configure the following properties:
Property | Value |
---|
mapreduce.application.classpath | $HADOOP_MAPRED_HOME/,$HADOOP_MAPRED_HOME/lib/,$MR2_CLASSPATH,$CDH_MR2_HOME |
mapreduce.jobhistory.intermediate-done-dir | <Directory where the map-reduce jobs write history files> |
4. Select the Final check box.
5. Redeploy the client configurations.
6. Restart Hadoop services and the cluster.
Configure the yarn-site.xml File for Cloudera Kerberos Clusters
To run Sqoop mappings on Cloudera clusters that use Kerberos authentication, you must configure properties in the yarn-site.xml file on the Data Integration Service node and restart the Data Integration Service.
Copy the following properties from the mapred-site.xml file on the cluster and add them to the yarn-site.xml file on the Data Integration Service node:
- mapreduce.jobhistory.address
Location of the MapReduce JobHistory Server. The default value is 10020.
<property>
<name>mapreduce.jobhistory.address</name>
<value>hostname:port</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
- mapreduce.jobhistory.principal
SPN for the MapReduce JobHistory server.
<property>
<name>mapreduce.jobhistory.principal</name>
<value>mapred/_HOST@YOUR-REALM</value>
<description>SPN for the MapReduce JobHistory server</description>
</property>
- mapreduce.jobhistory.webapp.address
- Web address of the MapReduce JobHistory Server. The default value is 19888.
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hostname:port</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
- mapreduce.application.classpath
Classpaths for MapReduce applications.
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,$CDH_MR2_HOME</value>
<description>Classpaths for MapReduce applications</description>
</property>
Configure the mapred-site.xml File for Cloudera Kerberos non-HA Clusters
Before you run Sqoop mappings on the Spark and Blaze engines, and on Cloudera Kerberos clusters that are not enabled with NameNode high availability, you must configure the mapreduce.jobhistory.address property in the mapred-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.
1. Open the Yarn Configuration in Cloudera Manager.
2. Find the property named NodeManager Advanced Configuration Snippet (Safety Valve) for mapred-site.xml.
3. Click +.
4. Enter the name as mapreduce.jobhistory.address.
5. Set the value as follows: <MapReduce JobHistory Server hostname>:<port>
6. Select the Final check box.
7. Redeploy the client configurations.
8. Restart Hadoop services and the cluster.
Configure the core-site.xml File for Ambari-based non-Kerberos Clusters
To run Sqoop mappings on IBM BigInsights, Hortonworks HDP, or Azure HDInsight clusters that do not use Kerberos authentication, you must create a proxy user for the yarn user who will impersonate other users. You must configure the impersonation properties in the core-site.xml file on the Hadoop cluster, and restart Hadoop services and the cluster.
Configure the following user impersonation properties in the core-site.xml file:
- hadoop.proxyuser.yarn.groups
<property>
<name>hadoop.proxyuser.yarn.groups</name>
<value><Name_of_the_impersonation_user></value>
<description>Allows impersonation from any group.</description>
</property>
- hadoop.proxyuser.yarn.hosts
<property>
<name>hadoop.proxyuser.yarn.hosts</name>
<value>*</value>
<description>Allows impersonation from any host.</description>
</property>