Connections > Hadoop connection properties
  

Hadoop connection properties

To use Hadoop Connector in a synchronization task, you must configure the connection properties.
The following table describes the Hadoop connection properties:
Connection property
Description
Username
The username of schema of Hadoop component.
Password
The password of schema of Hadoop component.
JDBC Connection URL
The JDBC URL to connect to the Hadoop Component. Refer JDBC URL
Driver
The JDBC driver class to connect to the Hadoop Component.
For more information, see the Setting Hadoop Classpath for various Hadoop Distributions topic.
Commit Interval
The Batch size, in rows, to load data to hive.
Hadoop Installation Path
The Installation path of the Hadoop component.
Not applicable to a kerberos cluster.
Hive Installation Path
Hive Installation Path
Not applicable to a kerberos cluster.
HDFS Installation Path
The HDFS Installation Path.
Not applicable to a kerberos cluster.
HBase Installation Path
The HBase Installation Path.
Not applicable to a kerberos cluster.
Impala Installation Path
The Impala Installation Path.
Not applicable to a kerberos cluster.
Miscellaneous Library Path
The library that communicates with Hadoop.
Not applicable to a kerberos cluster.
Enable Logging
Enable logging enables the log messages.
Note: The Enable Logging connection parameter is place-holder for a future release, and its state has no impact on connector functionality.
Hadoop Distribution
The Hadoop distributions for which you can use Kerberos Authentication. You can use Kerberos authentication for the Cloudera and HDP Hadoop distributions.
Authentication Type
You can select native or Kerberos authentication.
Key Tab File
The file that contains encrypted keys and Kerberos principals to authenticate the machine.
Hive Site XML
The directory where the core-site.xml, hive-site.xml and hdfs-site.xml are located. The three XML files must locate in the same location.
Superuser Principle Name
Users assigned to the superuser privilege can perform all the tasks that a user with the administrator privilege can perform.
Impersonation Username
You can enable different users to run mappings in a Hadoop cluster that uses Kerberos authentication or connect to sources and targets that use Kerberos authentication. To enable different users to run mappings or connect to big data sources and targets, you must configure user impersonation.
Note: Installation paths are the paths where you place the Hadoop jar. Hadoop Connector loads the libraries from installation paths before it sends instructions to Hadoop. When you use Kerberos Authentication type, you need not specify the Hadoop installation path, Hive installation path, HDFS installation path, HBase Installation path, Impala installation path, and Miscellaneous Library path.
If you do not use Kerberos Authentication and do not mention the installation path, you can set the Hadoop classpath for Amazon EMR, HortonWorks, MapR and Cloudera.
When you perform an insert operation on non-Kerberos clusters, the Secure Agent uses the hadoop fs -put <FS> <HDFS> command to upload the file to the HDFS and uses the hadoop fs -rm -r <HDFS> command to delete the file from the HDFS. When you enable Kerberos authentication, the Secure Agent does not use the Hadoop commands to write data to or delete data from the HDFS.