Big Data Management Security Guide > User Impersonation with Kerberos Authentication > User Impersonation in the Hadoop Environment
  

User Impersonation in the Hadoop Environment

To enable different users to run mapping and workflow jobs on a Hadoop cluster that uses Kerberos authentication, you must configure user impersonation in the Hadoop environment.
For example, the HypoStores administrator wants to enable user Bob to run mappings and workflows on the Hadoop cluster that uses Kerberos authentication.
To enable user impersonation, you must complete the following steps:
  1. 1. Enable the SPN of the Data Integration Service to impersonate another user named Bob to run Hadoop jobs.
  2. 2. Specify Bob as the user name for the Data Integration Service to impersonate in the Hadoop connection or Hive connection.
Note: If you create a Hadoop connection, you must use user impersonation.

Step 1. Enable the SPN of the Data Integration Service to Impersonate Another User

To run mapping and workflow jobs on the Hadoop cluster, enable the SPN of the Data Integration Service to impersonate another user.
Configure user impersonation properties in core-site.xml on the Name Node on the Hadoop cluster.
core-site.xml is located in the following directory:
/etc/hadoop/conf/core-site.xml
Configure the following properties in core-site.xml:
hadoop.proxyuser.<superuser>.groups
Enables the superuser to impersonate any member in the specified groups of users.
hadoop.proxyuser.<superuser>.hosts
Enables the superuser to connect from specified hosts to impersonate a user.
For example, set the values for the following properties in core-site.xml:
<property>
<name>hadoop.proxyuser.bob.groups</name>
<value>group1,group2</value>
<description>Allow the superuser <DIS_user> to impersonate any members of the group group1 and group2</description>
</property>

<property>
<name>hadoop.proxyuser.bob.hosts</name>
<value>host1,host2</value>
<description>The superuser can connect only from host1 and host2 to impersonate a user</description>
</property>

Step 2. Specify a User Name in the Hadoop Connection

In the Developer tool, specify a user name in the Hadoop connection for the Data Integration Service to impersonate when it runs jobs on the Hadoop cluster.
If you do not specify a user name, the Hadoop cluster authenticates jobs based on the SPN of the Data Integration Service.
For example, if Bob is the name of the user that you entered in core-site.xml, enter Bob as the user name.