Security
Security for implementations of Data Engineering Integration includes security for the Informatica domain native environment and for the non-native environments.
Security for the Hadoop Environment
You can configure security for the Informatica domain and the Hadoop cluster to protect from threats inside and outside the network. Security for the Hadoop cluster includes the following areas:
- Authentication
- When the Informatica implementation includes Data Engineering Integration, user identities must be authenticated in the Informatica domain and the Hadoop cluster. Authentication for the Informatica domain is separate from authentication for the Hadoop cluster.
- By default, Hadoop does not verify the identity of users. To authenticate user identities, you can configure the following authentication protocols on the cluster:
- - Native authentication
- - Lightweight Directory Access Protocol (LDAP)
- - Kerberos, when the Hadoop distribution supports it
- - Apache Knox Gateway
- Data Engineering Integration also supports Hadoop clusters that use a Microsoft Active Directory (AD) Key Distribution Center (KDC) or an MIT KDC.
- Authorization
- After a user is authenticated, a user must be authorized to perform actions. For example, a user must have the correct permissions to access the directories where specific data is stored to use that data in a mapping.
- You can run mappings on a cluster that uses one of the following security management systems for authorization:
- - Cloudera Navigator Encrypt
- - HDFS permissions
- - User impersonation
- - Apache Ranger
- - Apache Sentry
- - HDFS Transparent Encryption
- Data and metadata management
- Data and metadata management involves managing data to track and audit data access, update metadata, and perform data lineage. Data Engineering Integration supports Cloudera Navigator and Metadata Manager to manage metadata and perform data lineage.
- Data security
- Data security involves protecting sensitive data from unauthorized access. Data Engineering Integration supports data masking with the Data Masking transformation in the Developer tool, Dynamic Data Masking, and Persistent Data Masking.
- Operating system profiles
- An operating system profile is a type of security that the Data Integration Service uses to run mappings. Use operating system profiles to increase security and to isolate the run-time environment for users. Data Engineering Integration supports operating system profiles on all Hadoop distributions. In the Hadoop run-time environment, the Data Integration Service pushes the processing to the Hadoop cluster and the run-time engines run mappings with the operating system profile.
Security for the Databricks Environment
The Data Integration Service uses token-based authentication to provide access to the Databricks environment. Generate tokens within the Databricks environment and use the token ID to connect to Databricks.