Running Mappings in a Kerberos-Enabled Hadoop Environment
To run mappings in a Kerberos-enabled Hadoop environment, you must configure the Kerberos configuration file, create user authentication artifacts, and configure Kerberos authentication properties for the Informatica domain.
The Kerberos configuration file krb5.conf contains configuration properties for the Kerberos realm. The one-way cross-realm trust enables the Informatica domain to communicate with the Hadoop cluster.
The Informatica domain uses Kerberos authentication on a Microsoft Active Directory service. The Hadoop cluster uses Kerberos authentication on an MIT Kerberos service. You set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos service to communicate with the KDC for the Active Directory service. After you set up the cross-realm trust, you must configure the Informatica domain to enable mappings to run in the Hadoop cluster.
To run mappings on a cluster that uses Kerberos authentication, perform the following configuration tasks:
- 1. Set up the Kerberos configuration file.
- 2. When the Informatica domain uses Kerberos authentication, set up the one-way cross-realm trust.
- 3. Create matching operating system profile user names on each Hadoop cluster node.
- 4. Create the Service Principal Name and Keytab File in the Active Directory Server.
- 5. Specify the Kerberos authentication properties for the Data Integration Service.
- 6. Configure Execution Options for the Data Integration Service.
Step 1. Set Up the Kerberos Configuration File
Set the configuration properties for the Kerberos realm that the Hadoop cluster uses to krb5.conf on the machine on which the Data Integration Service runs.
If the Informatica domain does not use Kerberos authentication, set the properties for the MIT realm. If the Informatica domain uses Kerberos authentication, set the properties for the Active Directory realm and MIT realm.
krb5.conf is located in the <Informatica Installation Directory>/java/jre/lib/security directory.
1. Back up krb5.conf before you make any changes.
2. Edit krb5.conf.
3. In the libdefaults section, set the following properties:
- - default_realm. Name of the service realm for the Informatica domain. If the Informatica domain does not use Kerberos authentication, then the default realm must be the name of the AD service.
- - udp_preference_limit. Determines the protocol that Kerberos uses when it sends a message to the KDC. Set to 1 to use the TCP protocol.
The following example shows the value if the Informatica domain does not use Kerberos authentication:
[libdefaults]
default_realm = HADOOP-AD-REALM
udp_preference_limit=1
The following example shows the value if the Informatica domain uses Kerberos authentication:
[libdefaults]
default_realm = INFA-AD-REALM
udp_preference_limit=1
4. In the realms section, set or add the properties required by Informatica.
The following table lists the values to which you must set properties in the realms section:
Parameter | Value |
---|
kdc | Name of the host running a KDC server for that realm. |
admin_server | Name of the Kerberos administration server. |
The following example shows the parameters for the Hadoop realm if the Informatica domain does not use Kerberos authentication:
[realms]
HADOOP-AD-REALM = {
kdc = l23abcd134.hadoop-AD-realm.com
admin_server = 123abcd124.hadoop-AD-realm.com
}
The following example shows the parameters for the Hadoop realm if the Informatica domain uses Kerberos authentication:
[realms]
INFA-AD-REALM = {
kdc = abc123.infa-ad-realm.com
admin_server = abc123.infa-ad-realm.com
}
HADOOP-MIT-REALM = {
kdc = def456.hadoop-mit-realm.com
admin_server = def456.hadoop-mit-realm.com
}
5. In the domain_realms section, map the domain name or host name to a Kerberos realm name. The domain name is prefixed by a period (.).
The following example shows the parameters for the Hadoop domain_realm if the Informatica domain does not use Kerberos authentication:
[domain_realm]
.hadoop_ad_realm.com = HADOOP-AD-REALM
hadoop_ad_realm.com = HADOOP-AD-REALM
The following example shows the parameters for the Hadoop domain_realm if the Informatica domain uses Kerberos authentication:
[domain_realm]
.infa_ad_realm.com = INFA-AD-REALM
infa_ad_realm.com = INFA-AD-REALM
.hadoop_mit_realm.com = HADOOP-MIT-REALM
hadoop_mit_realm.com = HADOOP-MIT-REALM
6. Copy the krb5.conf file to the following locations on the machine that hosts the Data Integration Service:
- - <Informatica installation directory>/services/shared/security/
- - <Informatica installation directory>/java/jre/lib/security
The following example shows the content of krb5.conf with the required properties for an Informatica domain that does not use Kerberos authentications:
[libdefaults]
default_realm = HADOOP-AD-REALM
udp_preference_limit=1
[realms]
HADOOP-AD-REALM = {
kdc = l23abcd134.hadoop-ad-realm.com
admin_server = 123abcd124.hadoop-ad-realm.com
}
[domain_realm]
.hadoop_ad_realm.com = HADOOP-AD-REALM
hadoop_ad_realm.com = HADOOP-AD-REALM
The following example shows the content of krb5.conf with the required properties for an Informatica domain that uses Kerberos authentication:
[libdefaults]
default_realm = INFA-AD-REALM
udp_preference_limit=1
[realms]
INFA-AD-REALM = {
kdc = abc123.infa-ad-realm.com
admin_server = abc123.infa-ad-realm.com
}
HADOOP-MIT-REALM = {
kdc = def456.hadoop-mit-realm.com
admin_server = def456.hadoop-mit-realm.com
}
[domain_realm]
.infa_ad_realm.com = INFA-AD-REALM
infa_ad_realm.com = INFA-AD-REALM
.hadoop_mit_realm.com = HADOOP-MIT-REALM
hadoop_mit_realm.com = HADOOP-MIT-REALM
Step 2. Set up the Cross-Realm Trust
Perform this step when the Informatica domain uses Kerberos authentication.
Set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos server to communicate with the KDC for the Active Directory server. When you set up the one-way cross-realm trust, the Hadoop cluster can authenticate the Active Directory principals.
To set up the cross-realm trust, you must complete the following steps:
- 1. Configure the Active Directory server to add the local MIT realm trust.
- 2. Configure the MIT server to add the cross-realm principal.
- 3. Translate principal names from the Active Directory realm to the MIT realm.
Configure the Microsoft Active Directory Server
Add the MIT KDC host name and local realm trust to the Active Directory server.
To configure the Active Directory server, complete the following steps:
- 1. Enter the following command to add the MIT KDC host name:
ksetup /addkdc <mit_realm_name> <kdc_hostname>
- For example, enter the command to add the following values:
ksetup /addkdc HADOOP-MIT-REALM def456.hadoop-mit-realm.com
- 2. Enter the following command to add the local realm trust to Active Directory:
netdom trust <mit_realm_name> /Domain:<ad_realm_name> /add /realm /passwordt:<TrustPassword>
- For example, enter the command to add the following values:
netdom trust HADOOP-MIT-REALM /Domain:INFA-AD-REALM /add /realm /passwordt:trust1234
- 3. Enter the following commands based on your Microsoft Windows environment to set the proper encryption type:
For Microsoft Windows 2008, enter the following command:
ksetup /SetEncTypeAttr <mit_realm_name> <enc_type>
For Microsoft Windows 2003, enter the following command:
ktpass /MITRealmName <mit_realm_name> /TrustEncryp <enc_type>
Note: The enc_type parameter specifies AES, DES, or RC4 encryption. To find the value for enc_type, see the documentation for your version of Windows Active Directory. The encryption type you specify must be supported on both versions of Windows that use Active Directory and the MIT server.
Configure the MIT Server
Configure the MIT server to add the cross-realm krbtgt principal. The krbtgt principal is the principal name that a Kerberos KDC uses for a Windows domain.
Enter the following command in the kadmin.local or kadmin shell to add the cross-realm krbtgt principal:
kadmin: addprinc -e "<enc_type_list>" krbtgt/<mit_realm_name>@<MY-AD-REALM.COM>
The enc_type_list parameter specifies the types of encryption that this cross-realm krbtgt principal will support. The krbtgt principal can support either AES, DES, or RC4 encryption. You can specify multiple encryption types. However, at least one of the encryption types must correspond to the encryption type found in the tickets granted by the KDC in the remote realm.
For example, enter the following value:
kadmin: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/HADOOP-MIT-REALM@INFA-AD-REALM
Translate Principal Names from the Active Directory Realm to the MIT Realm
To translate the principal names from the Active Directory realm into local names within the Hadoop cluster, you must configure the hadoop.security.auth_to_local property in the core-site.xml file and hadoop.kms.authentication.kerberos.name.rules property in the kms-site.xml file on all the machines in the Hadoop cluster.
For example, set the following property in core-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>
For example, set the following property in kms-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.kms.authentication.kerberos.name.rules</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>
Step 3. Create Matching Operating System Profile Names
Create matching operating system profile user names on the machine that runs the Data Integration Service and each Hadoop cluster node to run Informatica mapping jobs.
For example, if user joe runs the Data Integration Service on a machine, you must create the user joe with the same operating system profile on each Hadoop cluster node.
Open a UNIX shell and enter the following UNIX command to create a user with the user name joe.
Step 4. Create the Principal Name and Keytab Files in the Active Directory Server
Create an SPN in the KDC database for Microsoft Active Directory service that matches the user name of the user that runs the Data Integration Service. Create a keytab file for the SPN on the machine on which the KDC server runs. Then, copy the keytab file to the machine on which the Data Integration Service runs.
You do not need to use the Informatica Kerberos SPN Format Generator to generate a list of SPNs and keytab file names. You can create your own SPN and keytab file name.
To create an SPN and Keytab file in the Active Directory server, complete the following steps:
- Create a user in the Microsoft Active Directory Service.
- Login to the machine on which the Microsoft Active Directory Service runs and create a user with the same name as the user you created in Step 3. Create Matching Operating System Profile Names.
- Create an SPN associated with the user.
- Use the following guidelines when you create the SPN and keytab files:
- - The user principal name (UPN) must be the same as the SPN.
- - Enable delegation in Microsoft Active Directory.
- - Use the ktpass utility to create an SPN associated with the user and generate the keytab file.
For example, enter the following command:
ktpass -out infa_hadoop.keytab -mapuser joe -pass tempBG@2008 -princ joe/domain12345@INFA-AD-REALM -crypto all
Note: The -out parameter specifies the name and path of the keytab file. The -mapuser parameter is the user to which the SPN is associated. The -pass parameter is the password for the SPN in the generated keytab. The -princ parameter is the SPN.
Step 5. Specify the Kerberos Authentication Properties for the Data Integration Service
In the Data Integration Service properties, configure the properties that enable the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication. Use the Administrator tool to set the Data Integration Service properties.
Property | Description |
---|
Hadoop Staging Directory | The HDFS directory where the Data Integration Service pushes Informatica Hadoop binaries and stores temporary files during processing. Default is /tmp. |
Hadoop Staging User | The HDFS user that performs operations on the Hadoop staging directory. The user requires write permissions on Hadoop staging directory. Default is the operating system user that starts the Informatica daemon. |
Custom Hadoop OS Path | The local path to the Informatica server binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems: Include the source directory in the path. For example, <Informatica server binaries>/source. Changes take effect after you recycle the Data Integration Service. Note: When you install an Informatica EBF, you must also install it in this directory. |
Hadoop Kerberos Service Principal Name | Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication. Not required for the MapR distribution. |
Hadoop Kerberos Keytab | The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs. Not required for the MapR distribution. |
JDK Home Directory | The JDK installation directory on the machine that runs the Data Integration Service. Changes take effect after you recycle the Data Integration Service. The JDK version that the Data Integration Service uses must be compatible with the JRE version on the cluster. Required if you run Sqoop mappings on the Spark engine or process a Java transformation on the Spark engine. Default is blank. |
Custom Properties | Properties that are unique to specific environments. You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities: - 1. Mapping custom properties set using infacmd ms runMapping with the -cp option
- 2. Mapping run-time properties for the Hadoop environment
- 3. Hadoop connection advanced properties for run-time engines
- 4. Hadoop connection advanced general properties, environment variables, and classpaths
- 5. Data Integration Service custom properties
|
Step 6. Configure the Execution Options for the Data Integration Service
To determine whether the Data Integration Service runs jobs in separate operating system processes or in one operating system process, configure the Launch Job Options property. Use the Administrator tool to configure the execution options for the Data Integration Service.
- 1. Click Edit to edit the Launch Job Options property in the execution options for the Data Integration Service properties.
- 2. Choose the launch job option.