Providing and Managing Access to Data

HDFS permissions

Grant each user account the appropriate HDFS permissions in the Hadoop cluster. HDFS permissions determine what a user can do to files and directories stored in HDFS. To access a file or directory, a user must have permission or belong to a group that has permission to the file or directory.

A Hive database corresponds to a directory in HDFS. Each Hive table created in the database corresponds to a subdirectory. You grant Intelligent Data Lake user accounts permission on the appropriate directory, based on whether you want to provide permission on a Hive database or on a specific Hive table in the database.

Note: As a best practice, you can set up private/shared/public databases (or schemas) in a single Hive resource and grant users appropriate permissions on those corresponding HDFS directories.

User impersonation

User impersonation allows different user accounts to run mappings in a Hadoop cluster that uses Kerberos authentication. When users upload and publish prepared data in the Intelligent Data Lake application, the Data Integration Service runs mappings in the Hadoop environment. The Data Integration Service pushes the processing to nodes in the Hadoop cluster. The Data Integration Service uses the credentials you have specified to impersonate the user accounts that publish and upload the data. Create a user account in the Hadoop cluster for each Intelligent Data Lake user account.

When the Data Integration Service impersonates a user account to submit a mapping, the mapping can only access Hadoop resources that the impersonated user has permissions on. Without user impersonation, the Data Integration Service uses its credentials to submit a mapping to the Hadoop cluster. Restricted Hadoop resources might be accessible.