Installation and Upgrade Guide > Part III: Install and Configure for Azure HDInsight > Hadoop Configuration Manager for Azure HDInsight > Start the Configuration Manager in Console Mode

Start the Configuration Manager in Console Mode

Run the Hadoop Configuration Manager in console mode to configure the Informatica domain for Big Data Management.

1. On the machine where the Data Integration Service runs, open the command line.

2. Go to the following directory: <Informatica installation directory>/tools/BDMUtil

3. Run BDMConfig.sh.

4. Choose option 5 to configure for Azure HDInsight.

5. In the Distribution Folder Selection section, choose the directory of the Azure HDInsight distribution that you want to configure.

Note: The distribution might be stored in a directory that uses a different distribution version number.

6. In the Connection Type section, select the option to access files on the Hadoop cluster.

Option	Description
1	Apache Ambari. Select this option to use the Ambari REST API to access files on the Hadoop cluster. Informatica recommends that you use this option.
2	Secure Shell (SSH). Select this option to use SSH to access files on the Hadoop cluster. This option requires SSH connections from the Informatica domain to the machines that host the NameNode, YARN ResourceManager, and Hive client. If you select this option, Informatica recommends that you use an SSH connection without a password or have sshpass or Expect installed.

Complete the configuration based on the option that you choose.

Complete the Configuration through Apache Ambari

To complete the configuration, you can choose to update Data Integration Service properties and create connection objects.

You might want to create connections if you are installing for the first time or if you are upgrading and the changes to the Hadoop environment are significant, such as a different distribution. If you are upgrading and you create connections that you want to use in existing mappings, you need to update mappings to use the new connections after you upgrade.

You might want to skip creating connections if the changes to the Hadoop environment are minimal, such as a change in distribution version. If you are upgrading and you choose not to create connections, you can continue to use the connections in existing mappings. However, you need update any changed properties after you upgrade. Property changes might include host names, URIs, or port numbers.

1. In the Ambari Administration Information section, enter the connection information to connect to the Ambari Manager.

a. Enter the Ambari Manager host name or IP address and port.
b. Enter the Ambari user ID.
c. Enter the password for the user ID.
d. Enter the port for Ambari Manager.
e. Select whether to use MapReduce or Tez as the execution engine type.

▪ 1 - MapReduce
▪ 2 - Tez

The Hadoop Configuration Manager retrieves the required information from the Hadoop cluster.

2. In the Hadoop Configuration Manager Output section, select whether you want to update Data Integration Service properties.

Select from the following options:

Option	Description
1	No. Select this option to update Data Integration Service properties later.
2	Yes. Select this option to update Data Integration Service properties now.

3. Select whether you want to restart the Data Integration Service.

Select from the following options:

Option	Description
1	No. Select this option if you do not want the configuration manager to restart the Data Integration Service.
2	Yes. Select this option if you want the configuration manager to restart the Data Integration Service when the configuration is complete.

4. Select whether you want to create connections for Big Data Management.

Select from the following options:

Option	Description
1	No. Select this option if you do not want to create connections. If you are upgrading and you choose not to create connections, you can continue to use the connections in existing mappings. However, you need update any changed properties after you upgrade. Property changes might include host names, URIs, or port numbers.
2	Yes. Select this option if you want to create connections. If you are upgrading and you create connections that you want to use in existing mappings, you need to update mappings to use the new connections after you upgrade.

5. In the Create Connections section, select the connection type to create Big Data Management connections:

Option	Description
1. Hive	Create a Hive connection to access Hive as a source or target.
2. HDFS	Create an HDFS connection to read data from or write data to the HDFS file system on a Hadoop cluster.
3. Hadoop	Create a Hadoop connection to run mappings in the Hadoop environment.
4. HBase	Create an HBase connection to access HBase.
5. Select all	Create all four types of connection.

6. In the Connection Details section, provide the connection properties.

Based on the type of connection that you choose to create, the Hadoop Configuration Manager requires different properties.

The Hadoop Configuration Manager creates the connections.

7. The Hadoop Configuration Manager reports a summary of its operations, including whether connection creation succeeded and the location of log files.

The Hadoop Configuration Manager creates the following file in the <Informatica installation directory>/tools/BDMUtil directory:

ClusterConfig.properties.<timestamp>: Contains details about the properties fetched from the Hadoop cluster, including cluster node names, to provide templates for connection creation commands. To use these connection creation templates to create connections to the Hadoop cluster, edit the connection name, domain username and password in the generated commands.

Complete the Configuration through SSH

When you complete configuration through SSH, you must provide host names and Hadoop configuration file locations.

Configuration through SSH requires SSH connections from the Informatica domain to the machines that host the NameNode, YARN ResourceManager, and Hive client. Informatica recommends that you use an SSH connection without a password or have sshpass or Expect installed. If you do not use one of these methods, you must enter the password each time the utility downloads a file from the Hadoop cluster.

1. Enter the name node host name.

2. Enter the SSH user ID.

3. Enter the password for the SSH user ID, or press enter if you use an SSH connection without a password.

4. Enter the location for the hdfs-site.xml file on the Hadoop cluster.

5. Enter the location for the core-site.xml file on the Hadoop cluster.

The Hadoop Configuration Manger connects to the name node and downloads the following files: hdfs-site.xml and core-site.xml.

6. Enter the Yarn resource manager host name.

Note: Yarn resource manager was formerly known as JobTracker.

7. Enter the SSH user ID.

8. Enter the password for the SSH user ID, or press enter if you use an SSH connection without a password.

9. Enter the directory for the mapred-site.xml file on the Hadoop cluster.

10. Enter the directory for the yarn-site.xml file on the Hadoop cluster.

The utility connects to the JobTracker and downloads the following files: mapred-site.xml and yarn-site.xml.

11. Enter the Hive client host name.

12. Enter the SSH user ID.

13. Enter the password for the SSH user ID, or press enter if you use an SSH connection without a password.

14. Enter the directory for the hive-site.xml file on the Hadoop cluster.

The configuration manager connects to the Hive client and downloads the following file: hive-site.xml.

15. Optionally, You can choose to configure the HBase server.

Option	Description
1	No. Select this option if you do not want to configure the HBase server.
2	Yes. Select this option to configure the HBase server.

If you select Yes, enter the following information to configure the HBase server:

a. Enter the HBase server host name.
b. Enter the SSH user ID.
c. Enter the password for the SSH user ID, or press enter if you use an SSH connection without a password.
d. Enter the directory for the hbase-site.xml file on the Hadoop cluster.

16. In the Create Connections section, select the connection type to create Big Data Management connections:

Option	Description
1. Hive	Create a Hive connection to access Hive as a source or target.
2. HDFS	Create an HDFS connection to read data from or write data to the HDFS file system on a Hadoop cluster.
3. Hadoop	Create a Hadoop connection to run mappings in the Hadoop environment.
4. HBase	Create an HBase connection to access HBase.
5. Select all	Create all four types of connection.

Press the number that corresponds to your choice.

17. The Domain Information section displays the domain name and the node name. Enter the following additional information about Informatica domain.

a. Enter the domain user name.
b. Enter the domain password.
c. Enter the Data Integration Service name.
d. If the Hadoop cluster uses Kerberos authentication, enter the following information:

▪ Hadoop Kerberos service principal name
▪ Hadoop Kerberos keytab location. Location of the keytab on the Data Integration Service machine.

Note: After you enter the Data Integration Service name, the utility tests the domain connection, and then recycles the Data Integration Service.

18. In the Connection Details section, provide the connection properties.

Based on the type of connection you choose to create, the utility requires different properties.

The Hadoop Configuration Manager creates the connections.

19. The Hadoop Configuration Manager reports a summary of its operations, including whether connection creation succeeded and the location of log files.

The Hadoop Configuration Manager creates the following file in the <Informatica installation directory>/tools/BDMUtil directory:

ClusterConfig.properties.<timestamp>: Contains details about the properties fetched from the Hadoop cluster, including cluster node names, to provide templates for connection creation commands. To use these connection creation templates to create connections to the Hadoop cluster, edit the connection name, domain username and password in the generated commands.