Installation and Upgrade Guide > Part I: Pre-Installation > Before You Install or Upgrade > Record Configuration Information

Record Configuration Information

Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR

When you run the Hadoop Configuration Manager, you need to know information about the domain and Hadoop environments. Gather the required information so you can easily perform the configuration.

Gather Domain Information

Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR

When you run the configuration manager, you must provide information about the Informatica domain. Gather domain information before you run the configuration manager.

The following table describes the information that you need to know about the Informatica domain:

Domain Information	Description
Domain user name	The domain user name.
Domain password	The password of the domain user.
Data Integration Service name	The name of the Data Integration Service.
Informatica home directory on Hadoop	The directory where Big Data Management is installed in the Hadoop environment. Get this information after you perform the installation.
Hadoop Kerberos service principal name	Required if the Hadoop cluster uses Kerberos authentication.
Hadoop Kerberos keytab location	Required if the Hadoop cluster uses Kerberos authentication. The location of the keytab file on the Data Integration Service machine.

Gather Connection Information

Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR

When you complete configuration on the domain, you can choose to create connections for Hadoop, Hive, HDFS, and HBase.

The following table describes the information that you need to know to create connections:

Connection Information	Description
Hive connection name	Required for the Hive connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
Hive user name	Required for the Hive connection. User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment. If the Hadoop cluster runs Hortonworks HDP, you must provide a user name. If you use Tez to run mappings, you must provide the user account for the Data Integration Service. If you do not use Tez to run mappings, you can use an impersonation user account. If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL. If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
HDFS connection name	Required for the HDFS connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
HDFS user name	Required for the HDFS connection. User name to access HDFS.
Hadoop connection name	Required for the Hadoop connection. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
Impersonation user name	Required for the Hadoop connection if the cluster uses Kerberos authentication. User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same.
Blaze work directory on HDFS	Required for all connection types. The HDFS file path of the directory that the Blaze engine uses to store temporary files.
Blaze user name	Required for all connection types. The Blaze user account on the cluster.
Spark staging directory on HDFS	Required for all connection types. The HDFS file path of the directory that the Spark engine uses to store temporary files for running jobs. The YARN user, Spark engine user, and mapping impersonation user must have write permission on this directory.
Spark event log directory	Optional for all connection types. The HDFS file path of the directory that the Spark engine uses to log events. The Data Integration Service accesses the Spark event log directory to retrieve final source and target statistics when a mapping completes. These statistics appear on the Summary Statistics tab and the Detailed Statistics tab of the Monitoring tool. If you do not configure the Spark event log directory, the statistics might be incomplete in the Monitoring tool.
Spark execution parameters list	Optional for all connection types.
HBase connection name	Required for HBase connections. The name is not case sensitive and must be unique within the domain. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] \| \ : ; " ' < , > . ? /
ZooKeeper hosts	Required for HBase connections. Name of the machine that hosts the ZooKeeper server.

Gather Information for Configuration Through SSH

Azure HDInsight, Cloudera CDH, Hortonworks HDP, IBM BigInsights, MapR

When you run the configuration manager you can choose the tool that you use to configure the domain connectivity. You can perform the configuration through a web interface or through SSH. Complete this section if you plan to configure the domain information through SSH.

Note: If you did not use the web user interface to create the cluster, you cannot use the web user interface to run the configuration utility.

Gather the following information that is required to perform configuration through SSH:

•Host name of the NameNode or Resource Manager, based on the distribution
•Host name of JobTracker
•Host name of the Hive client
•Location of hdfs-site.xml
•Location of core-site.xml
•Location of mapred-site.xml
•Location of yarn-site.xml
•Location of hive-site.xml