Step 5. Set Up User Impersonation
When a user performs tasks in Intelligent Data Lake, the Data Integration Service connects to Hadoop services to access the data lake and perform the operation on behalf of the user. The Data Integration Service uses impersonation to pass the user account to Hadoop.
Configure user impersonation for Intelligent Data Lake based on the user access requirements of your organization.
You can configure user impersonation in the following ways:
- Using the Intelligent Data Lake user account to connect to Hadoop services
- You can configure the Data Integration Service to impersonate the user account logged in to the Intelligent Data Lake application. The user account logged in to Intelligent Data Lake must have user or group authorization to connect to the Hadoop services. To use this option, you must configure the Data Integration Service to use operating system profiles. You must assign the Intelligent Data Lake user account a default operating system profile with the Use the logged in user as Hadoop Impersonation User option selected.
- Using an authorized user account to connect to Hadoop services
- If the Intelligent Data Lake user accounts do not have logged authorization to connect Hadoop, you can configure the Data Integration Service to impersonate a specific authorized user. The impersonated user account must have authorization to connect to the data lake and the Hadoop services. To use this option, you must configure the Data Integration Service to use operating system profiles. The operating system profile must have the Use the specified user as Hadoop Impersonation User option selected and must specify the authorized user account to impersonate.
- Using the Hive connection user account to connect to Hadoop services
- If user access to the data lake does not require authorization or if all users have the same authorization, you do not need to set up operating system profiles to connect to the Hadoop services. The Data Integration Service connects to Hadoop services using the user account specified in the Hive connection object for Intelligent Data Lake.
To configure user impersonation for Intelligent Data Lake, perform the following steps:
- •Configure user impersonation in the Hadoop cluster.
- •Configure the user account in the Hive connection.
Configuring User Impersonation in the Hadoop Cluster
To enable the Data Integration Service to impersonate a user account on the Hadoop cluster, configure user impersonation in Hadoop. Set up the user account that runs the Data Integration Service as a proxy user in Hadoop.
The Hadoop configuration file core-site.xml defines the proxy user accounts that can impersonate other users. You can set the properties directly in the configuration file or use the Hadoop management tool for your Hadoop distribution to set the properties.
In the core-site.xml file, the following properties specify the impersonation settings:
- hadoop.proxyuser.<user account>.groups
- Defines the groups that a user account can impersonate. The user account can impersonate any member of the groups that you specify.
- hadoop.proxyuser.<user account>.hosts
- Defines the machines that a user account can connect from to impersonate the members of a group. The host machine you specify must be a machine where the Data Integration Service runs.
For more information about how to enable user impersonation in the Hadoop cluster, see the documentation for your Hadoop distribution.
Configuring the User Account in the Hive Connection
The Hive connection object defines the Data Integration Service user account that can impersonate Intelligent Data Lake users. Set the user name in the Hive connection object you use to connect to the data lake.
1. In the Administrator tool, click ManageConnections.
2. In the Navigator, select the Hive connection to the data lake.
3. In the properties view, edit the Common Attribute to Both the Modes section.
The Edit Common Attribute to Both the Modes dialog box appears.
4. Set the user name and password for the Data Integration Service user account you want to use to impersonate Intelligent Data Lake users.
The user account must be the user account specified for user impersonation in the Hadoop cluster.
5. Click OK.
For more information about connection objects and the Hive connection properties, see the Informatica Administrator Guide.