Big Data Management Security Guide > Introduction to Big Data Management Security > Authentication
  

Authentication

When the Informatica domain includes Big Data Management, user identities must be authenticated in the Informatica domain and the Hadoop cluster. Authentication for the Informatica domain is separate from authentication for the Hadoop cluster.
The authentication process verifies the identity of a user account.
By default, Hadoop does not authenticate users. Any user can be used in the Hadoop connection. Informatica recommends that you enable authentication for the cluster. If authentication is enabled for the cluster, the cluster authenticates the user account used for the Hadoop connection between Big Data Management and the cluster. For a higher level of security, you can set up Kerberos authentication for the cluster.
The Informatica domain uses one of the following authentication protocols:
Native authentication
The Informatica domain stores user credentials and privileges in the domain configuration repository and performs all user authentication within the Informatica domain.
Lightweight Directory Access Protocol (LDAP)
The LDAP directory service stores user accounts and credentials that are accessed over the network.
Kerberos authentication
Kerberos is a network authentication protocol which uses tickets to authenticate users and services in a network. Users are stored in the Kerberos principal database, and tickets are issued by a KDC.
Apache Knox Gateway
The Apache Knox Gateway is a REST API gateway that authenticates users and acts as a single access point for a Hadoop cluster.
For more information about how to configure authentication for the Informatica domain, see the Informatica Security Guide.
For more information about how to enable authentication for the Hadoop cluster, see the documentation for your Hadoop distribution.

Kerberos Authentication

Big Data Management and the Hadoop cluster can use Kerberos authentication to verify user accounts. You can use Kerberos authentication with the Informatica domain, with the Hadoop cluster, or with both.
Kerberos is a network authentication protocol which uses tickets to authenticate access to services and nodes in a network. Kerberos uses a Key Distribution Center (KDC) to validate the identities of users and services and to grant tickets to authenticated user and service accounts. Users and services are known as principals. The KDC has a database of principals and their associated secret keys that are used as proof of identity. Kerberos can use an LDAP directory service as a principal database.
The requirements for Kerberos authentication for the Informatica domain and for the Hadoop cluster:
Kerberos authentication for the Informatica domain
Kerberos authentication for the Informatica domain requires principals stored in a Microsoft Active Directory (AD) LDAP service. Additionally, you must use Microsoft AD for the KDC.
For more information about how to enable Kerberos authentication for the Informatica domain, see the Informatica Security Guide.
Kerberos authentication for the Hadoop cluster
Informatica supports Hadoop clusters that use an AD KDC or an MIT KDC.
When you enable Kerberos for Hadoop, each user and Hadoop service needs to be authenticated by KDC. The cluster must authenticate the Data Integration Service User and, optionally, the Blaze user.
For more information about how to configure Kerberos for Hadoop, see the documentation for your Hadoop distribution.
The configuration steps required for Big Data Management to connect to a Hadoop cluster that uses Kerberos authentication depends on whether the Informatica domain uses Kerberos.
For more information about how to configure Big Data Management to connect to a Hadoop cluster that uses Kerberos, see the "Running Mappings with Kerberos Authentication" chapter.

Apache Knox Gateway

The Apache Knox Gateway is a REST API gateway that authenticates users and acts as a single access point for a Hadoop cluster.
Knox creates a perimeter around a Hadoop cluster. Without Knox, users and applications must connect directly to a resource in the cluster, which requires configuration on the client machines. A direct connection to resources exposes host names and ports to all users and applications and decreases the security of the cluster.
If the cluster uses Knox, applications use REST APIs and JDBC/ODBC over HTTP to connect to Knox. Knox authenticates the user and connects to a resource.