Installation and Upgrade Guide > Connection Property Reference > Hive Connection Properties
  

Hive Connection Properties

Use the Hive connection to access Hive data. A Hive connection is a database type connection. You can create and manage a Hive connection in the Administrator tool, Analyst tool, or the Developer tool. Hive connection properties are case sensitive unless otherwise noted.
Note: The order of the connection properties might vary depending on the tool where you view them.
The following table describes Hive connection properties:
Property
Description
Name
The name of the connection. The name is not case sensitive and must be unique within the domain. You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters:
~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
ID
String that the Data Integration Service uses to identify the connection. The ID is not case sensitive. It must be 255 characters or less and must be unique in the domain. You cannot change this property after you create the connection. Default value is the connection name.
Description
The description of the connection. The description cannot exceed 4000 characters.
Location
The domain where you want to create the connection. Not valid for the Analyst tool.
Type
The connection type. Select Hive.
Connection Modes
Hive connection mode. Select at least one of the following options:
  • - Access Hive as a source or target. Select this option if you want to use Hive as a source or a target.
  • - Use Hive to run mappings in Hadoop cluster. Select this option if you want to use the Hive driver to run mappings in the Hadoop cluster.
User Name
User name of the user that the Data Integration Service impersonates to run mappings on a Hadoop cluster. The user name depends on the JDBC connection string that you specify in the Metadata Connection String or Data Access Connection String for the native environment.
If the Hadoop cluster runs Hortonworks HDP, you must provide a user name. If you use Tez to run mappings, you must provide the user account for the Data Integration Service. If you do not use Tez to run mappings, you can use an impersonation user account.
If the Hadoop cluster uses Kerberos authentication, the principal name for the JDBC connection string and the user name must be the same. Otherwise, the user name depends on the behavior of the JDBC driver. With Hive JDBC driver, you can specify a user name in many ways and the user name can become a part of the JDBC URL.
If the Hadoop cluster does not use Kerberos authentication, the user name depends on the behavior of the JDBC driver.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the following criteria:
  • - The Hadoop cluster does not use Kerberos authentication. It authenticates jobs based on the operating system profile user name of the machine that runs the Data Integration Service.
  • - The Hadoop cluster uses Kerberos authentication. It authenticates jobs based on the SPN of the Data Integration Service.
Common Attributes to Both the Modes: Environment SQL
SQL commands to set the Hadoop environment. In native environment type, the Data Integration Service executes the environment SQL each time it creates a connection to a Hive metastore. If you use the Hive connection to run profiles in the Hadoop cluster, the Data Integration Service executes the environment SQL at the beginning of each Hive session.
The following rules and guidelines apply to the usage of environment SQL in both connection modes:
  • - Use the environment SQL to specify Hive queries.
  • - Use the environment SQL to set the classpath for Hive user-defined functions and then use environment SQL or PreSQL to specify the Hive user-defined functions. You cannot use PreSQL in the data object properties to specify the classpath. The path must be the fully qualified path to the JAR files used for user-defined functions. Set the parameter hive.aux.jars.path with all the entries in infapdo.aux.jars.path and the path to the JAR files for user-defined functions.
  • - You can use environment SQL to define Hadoop or Hive parameters that you want to use in the PreSQL commands or in custom queries.
  • - If you use multiple values for the Environment SQL property, ensure that there is no space between the values. The following sample text shows two values that can be used for the Environment SQL:
  • set hive.execution.engine='tez';set hive.exec.dynamic.partition.mode='nonstrict';
If you use the Hive connection to run profiles in the Hadoop cluster, the Data Integration service executes only the environment SQL of the Hive connection. If the Hive sources and targets are on different clusters, the Data Integration Service does not execute the different environment SQL commands for the connections of the Hive source or target.

Properties to Access Hive as Source or Target

The following table describes the connection properties that you configure to access Hive as a source or target:
Property
Description
Metadata Connection String
The JDBC connection URI used to access the metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer2, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>;transportMode=<mode>
Where
  • - <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • - <port> is the port number on which HiveServer2 listens.
  • - <db> is the database to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
  • - <mode> is the value of the hive.server2.transport.mode property in the Hive tab of the Ambari tool.
Bypass Hive JDBC Server
JDBC driver mode. Select the check box to use the embedded JDBC driver mode.
To use the JDBC embedded mode, perform the following tasks:
  • - Verify that Hive client and Informatica services are installed on the same machine.
  • - Configure the Hive connection properties to run mappings in the Hadoop cluster.
If you choose the non-embedded mode, you must configure the Data Access Connection String.
Informatica recommends that you use the JDBC embedded mode.
Observe Fine Grained SQL Authorization
When you select the option to observe fine-grained SQL authentication in a Hive source, the mapping observes row and column-level restrictions on data access. If you do not select the option, the Blaze run-time engine ignores the restrictions, and results include restricted data.
Data Access Connection String
The JDBC connection URI used to access data from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer2, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>;transportMode=<mode>
Where
  • - <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • - <port> is the port number on which HiveServer2 listens.
  • - <db> is the database to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
  • - <mode> is the value of the hive.server2.transport.mode property in the Hive tab of the Ambari tool.

Properties to Run Mappings in Hadoop Cluster

The following table describes the Hive connection properties that you configure when you want to use the Hive connection to run Informatica mappings in the Hadoop cluster:
Property
Description
Database Name
Namespace for tables. Use the name default for tables that do not have a specified database name.
Default FS URI
The URI to access the Hadoop Distributed File System.
Use the connection URI that matches the storage type. The storage type is configured for the cluster in the fs.defaultFS property.
If the cluster uses HDFS storage, use the following string to specify the URI:
hdfs://<cluster_name>
Example:
hdfs://my-cluster
If the cluster is enabled with Namenode High Availability, you can use the value of the dfs.nameservices property from the hdfs-site.xml file for the Default FS URI property.
Example:
Where the following is the dfs.nameservices property as it appears in hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>infaqaha</value>
</property>
The value to use for Default FS URI is:
hdfs://infaqaha
The Azure HDInsight File System default file system can be Windows Azure Storage Blob (WASB) or Azure Data Lake Store (ADLS).
If the cluster uses WASB storage, use the following string to specify the URI:
wasb://<container_name>@<account_name>.blob.core.windows.net/<path>
where:
  • - <container_name> identifies a specific Azure Storage Blob container.
  • Note: <container_name> is optional.
  • - <account_name> identifies the Azure Storage Blob object.
Example:
wasb://infabdmoffering1storage.blob.core.windows.net/infabdmoffering1cluster/mr-history
If the cluster uses ADLS storage, use the following format to specify the URI: adl://home
The following is the fs.defaultFS property as it appears in hdfs-site.xml:
<property>fs.defaultFS</property>
<value>adl://home</value>
Yarn Resource Manager URI
The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster.
For HDInsight 3.3 with YARN, use the following format:
<hostname>:<port>
Where
  • - <hostname> is the host name or IP address of the JobTracker or Yarn resource manager.
  • - <port> is the port on which the JobTracker or Yarn resource manager listens for remote procedure calls (RPC).
Use the value specified by yarn.resourcemanager.address in yarn-site.xml. You can find yarn-site.xml in the following directory on the NameNode: /etc/hive/<version>/0/.
For HDInsight 3.3 with MapReduce 2, use the following URI:
hdfs://host:port
JobTracker/Yarn Resource Manager URI
The service within Hadoop that submits the MapReduce tasks to specific nodes in the cluster.
Use the following format:
<hostname>:<port>
Where
  • - <hostname> is the host name or IP address of the JobTracker or Yarn resource manager.
  • - <port> is the port on which the JobTracker or Yarn resource manager listens for remote procedure calls (RPC).
If the cluster uses MapR with YARN, use the value specified in the yarn.resourcemanager.address property in yarn-site.xml. You can find yarn-site.xml in the following directory on the NameNode of the cluster: /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop.
MapR with MapReduce 1 supports a highly available JobTracker. If you are using MapR distribution, define the JobTracker URI in the following format: maprfs:///
Hive Warehouse Directory on HDFS
The absolute HDFS file path of the default database for the warehouse that is local to the cluster. For example, the following file path specifies a local warehouse:
/user/hive/warehouse
If the Metastore Execution Mode is remote, then the file path must match the file path specified by the Hive Metastore Service on the hadoop cluster.
Use the value specified for the hive.metastore.warehouse.dir property in hive-site.xml. You can find yarn-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/.
Advanced Hive/Hadoop Properties
Configures or overrides Hive or Hadoop cluster properties in hive-site.xml on the machine on which the Data Integration Service runs. You can specify multiple properties.
Use the following format:
<property1>=<value>
Where
  • - <property1> is a Hive or Hadoop property in hive-site.xml.
  • - <value> is the value of the Hive or Hadoop property.
To specify multiple properties use &: as the property separator.
The maximum length for the format is 1 MB.
If you enter a required property for a Hive connection, it overrides the property that you configure in the Advanced Hive/Hadoop Properties.
The Data Integration Service adds or sets these properties for each map-reduce job. You can verify these properties in the JobConf of each mapper and reducer job. Access the JobConf of each job from the Jobtracker URL under each map-reduce job.
The Data Integration Service writes messages for these properties to the Data Integration Service logs. The Data Integration Service must have the log tracing level set to log each row or have the log tracing level set to verbose initialization tracing.
For example, specify the following properties to control and limit the number of reducers to run a mapping job:
mapred.reduce.tasks=2&:hive.exec.reducers.max=10
Temporary Table Compression Codec
Hadoop compression library for a compression codec class name.
Codec Class Name
Codec class name that enables data compression and improves performance on temporary staging tables.
Metastore Execution Mode
Controls whether to connect to a remote metastore or a local metastore. By default, local is selected. For a local metastore, you must specify the Metastore Database URI, Driver, Username, and Password. For a remote metastore, you must specify only the Remote Metastore URI.
Metastore Database URI
The JDBC connection URI used to access the data store in a local metastore setup. Use the following connection URI:
jdbc:<datastore type>://<node name>:<port>/<database name>
where
  • - <node name> is the host name or IP address of the data store.
  • - <data store type> is the type of the data store.
  • - <port> is the port on which the data store listens for remote procedure calls (RPC).
  • - <database name> is the name of the database.
For example, the following URI specifies a local metastore that uses MySQL as a data store:
jdbc:mysql://hostname23:3306/metastore
Use the value specified for the javax.jdo.option.ConnectionURL property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/hive-site.xml.
Metastore Database Driver
Driver class name for the JDBC data store. For example, the following class name specifies a MySQL driver:
Use the value specified for the javax.jdo.option.ConnectionDriverName property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/hive-site.xml.
Metastore Database Username
The metastore database user name.
Use the value specified for the javax.jdo.option.ConnectionUserName property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/hive-site.xml.
Metastore Database Password
Required if the Metastore Execution Mode is set to local. The password for the metastore user name.
Use the value specified for the javax.jdo.option.ConnectionPassword property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/hive-site.xml.
Remote Metastore URI
The metastore URI used to access metadata in a remote metastore setup. For a remote metastore, you must specify the Thrift server details.
Use the following connection URI:
thrift://<hostname>:<port>
Where
  • - <hostname> is name or IP address of the Thrift metastore server.
  • - <port> is the port on which the Thrift server is listening.
Use the value specified for the hive.metastore.uris property in hive-site.xml. You can find hive-site.xml in the following directory on the node that runs HiveServer2: /etc/hive/<version>/0/hive-site.xml.
Hive Connection String
The JDBC connection URI used to access the metadata from the Hadoop server.
You can use PowerExchange for Hive to communicate with a HiveServer service or HiveServer2 service.
To connect to HiveServer2, specify the connection string in the following format:
jdbc:hive2://<hostname>:<port>/<db>;transportMode=<mode>
Where
  • - <hostname> is name or IP address of the machine on which HiveServer2 runs.
  • - <port> is the port number on which HiveServer2 listens.
  • - <db> is the database to which you want to connect. If you do not provide the database name, the Data Integration Service uses the default database details.
  • - <mode> is the value of the hive.server2.transport.mode property in the Hive tab of the Ambari tool.