Configure Hive Connector to download the distribution-specific Hive libraries
You must configure Hive Connector to download the distribution specific Hive third-party libraries. The Informatica Hive third-party script and the Informatica Hive third-party properties files are available as part of the Hive Connector package in the Secure Agent installation.
Distributions applicable for Hive mappings
You can utilize the following distribution versions when you use Hive Connector to run mappings:
•Cloudera CDH 6.1
•Amazon EMR 5.20, 6.3, and 6.4
•Cloudera CDP 7.1 private cloud and Cloudera CDW 7.2 public cloud
•Azure HDInsight 4.0
•Hortonworks HDP 3.1
Distributions applicable for Hive mappings in advanced mode
You can utilize the following distribution versions when Hive Connector runs on the advanced cluster:
•Cloudera CDH 6.1
•Cloudera CDP 7.1 private cloud and Cloudera CDW 7.2 public cloud
•Azure HDInsight 4.0
•Amazon EMR 6.1, 6.2, and 6.3
Perform the following tasks to download distribution specific Hive third-party libraries before you use Hive Connector:
1Run the script to copy the third-party libraries to the Secure Agent location. Ensure that you have full permissions to the directories where the Hive libraries are copied.
The script is interactive and you need to specify the job type and the Hadoop cluster you want to use, when prompted.
2Add the runtime DTM property, INFA_HADOOP_DISTRO_NAME, and set its value to the applicable distribution that you want to use.
3Restart the Secure Agent.
Run the script on a Linux system
You can run the Hive third-party script either in the Secure Agent installation directory or outside the Secure Agent installation directory. The script and the Informatica Hive third-party property files are part of the Hive Connector package that is located in the Secure Agent installation directory.
When you run the Hive third-party script, you need to specify the Hadoop distribution that you want to use.
Based on the distribution you select, the Hadoop distribution directory is created in deploy_to_main/distros/Parsers/.
•If you select CDH_6.1, CDP_7.1, or CDW_7.2, the Hadoop distribution directory created is CDH_6.1.
•If you select EMR_5.20, EMR_6.1, EMR_6.2, or EMR_6.3, the Hadoop distribution directory created is EMR_5.20.
•If you select HDInsight_4.0, the Hadoop distribution directory created is HDInsight_4.0.
•If you select HDP_3.1, the Hadoop distribution directory created is HDP_3.1.
Note: CDH_6.1 option is applicable for Cloudera CDH 6.1, Cloudera CDP 7.1 private cloud, and Cloudera CDW 7.2 public cloud in mappings. For mappings in advanced mode, CDH_6.1 is applicable only for Cloudera CDH 6.1. EMR_5.20 is applicable for EMR_6.1, EMR_6.2, and EMR_6.3 for Hive mappings in advanced mode, whereas EMR_5.20 is applicable only for Amazon EMR 5.20, EMR 6.3, and EMR 6.4 in mappings.
Run the script in the Secure Agent installation directory
Run the script in the Secure Agent installation directory when the Secure Agent has Internet access to download the third-party libraries.
1Go to the following Secure Agent installation directory where the Informatica Hive third-party script is located:
▪ <Secure Agent installation directory>/apps/Data_Integration_Server/ext/deploy_to_main/distros/Parsers/<Hadoop distribution version>/lib
where the value of the Hadoop distribution version is based on the Hadoop distribution you specified. If you specify the HDInsight_4.0 or HDP_3.1 Hadoop distribution, go to step 5.
5Only for the HDInsight_4.0 or HDP_3.1 Hadoop distribution, perform the following tasks:
aNavigate to the directory where the Hive third-party libraries are available on the cluster machine.
For example, the /usr/hdp/ directory on the cluster machine.
bManually download and copy the Hive third-party libraries to the directory displayed in the prompt.
The prompt also displays a list of the Hive third-party libraries that you need to download.
The script fails to download a few Hive third-party libraries for the HDInsight_4.0 or HDP_3.1 Hadoop distribution because these repositories are private in Cloudera.
Run the script outside the Secure Agent installation directory
Run the script outside the Secure Agent installation directory when the Secure Agent does not have Internet access to download the third-party libraries or due to other network restrictions.
1Go to the following Secure Agent installation directory where the Informatica Hive third-party script is located:
▪ <CurrentDirectory>/deploy_to_main/distros/Parsers/<Hadoop distribution version>/lib
where the value of the Hadoop distribution version is based on the Hadoop distribution you specified. If you specify the HDInsight_4.0 or HDP_3.1 Hadoop distribution, go to step 7.
6Manually copy the generated directories with the third-party libraries to the Secure Agent installation location.
- For CDI:
Copy the deploy_to_main directory to the following Secure Agent installation location, or replace the directory if it is already present:
7Only for the HDInsight_4.0 or HDP_3.1 Hadoop distribution, perform the following tasks:
aNavigate to the directory where the Hive third-party libraries are available on the cluster machine.
For example, the /usr/hdp/ directory on the cluster machine.
bManually download and copy the Hive third-party libraries to the directory displayed in the prompt.
The prompt also displays a list of the Hive third-party libraries that you need to download.
The script fails to download a few Hive third-party libraries for the HDInsight_4.0 or HDP_3.1 Hadoop distribution because these repositories are private in Cloudera.
Set the custom property for the Data Integration Service
Set the INFA_HADOOP_DISTRO_NAME property for the DTM in the Secure Agent properties and set the value of the distribution version that you want to use.
1Open Administrator and select Runtime Environments.
2 Select the Secure Agent for which you want to configure the DTM property.
3On the upper-right corner of the page, click Edit.
4Add the following DTM properties in the Custom Configuration section:
- Service: Data Integration Service
- Type: DTM
- Name: INFA_HADOOP_DISTRO_NAME
- Value: <distribution_version>
where the following values are applicable based on the distribution version you want to access:
- For CDH_6.1, CDP_7.1, and CDW_7.2, set the value as CDH_6.1.
- For EMR_5.20, EMR_6.1, EMR_6.2, EMR_6.3, and EMR_6.4, set the value as EMR_5.20.
- For HDInsight_4.0, set the value as HDInsight_4.0.
- For HDP_3.1, set the value as HDP_3.1.
Restart the Secure Agent
After you complete the configurations and set the properties, restart the Secure Agent to reflect the changes.