Running Mappings in a Kerberos-Enabled Hadoop Environment
To run mappings in a Kerberos-enabled Hadoop environment, you must configure the Kerberos configuration file, create user authentication artifacts, and configure Kerberos authentication properties for the Informatica domain.
The Kerberos configuration file krb5.conf contains configuration properties for the Kerberos realm. The one-way cross-realm trust enables the Informatica domain to communicate with the Hadoop cluster.
The Informatica domain uses Kerberos authentication on a Microsoft Active Directory service. The Hadoop cluster uses Kerberos authentication on an MIT Kerberos service. You set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos service to communicate with the KDC for the Active Directory service. After you set up the cross-realm trust, you must configure the Informatica domain to enable mappings to run in the Hadoop cluster.
To run mappings on a cluster that uses Kerberos authentication, perform the following configuration tasks:
- 1. Set up the Kerberos configuration file.
- 2. When the Informatica domain uses Kerberos authentication, set up the one-way cross-realm trust.
- 3. Create matching operating system profile user names on each Hadoop cluster node.
- 4. Create the Service Principal Name and Keytab File in the Active Directory Server.
- 5. Specify the Kerberos authentication properties for the Data Integration Service.
- 6. Configure Execution Options for the Data Integration Service.
Step 1. Set Up the Kerberos Configuration File on the Domain Host
Set the properties required by Informatica in the Kerberos configuration file, and then copy the file to each node in the Informatica domain.
krb5.conf is located in the <Informatica Installation Directory>/java/jre/lib/security directory.
1. Back up krb5.conf before you make any changes.
2. Open krb5.conf for editing.
3. Configure the following Kerberos library properties in the libdefaults section of the file.
The following table describes the properties to enter:
Property | Description |
|---|
default_realm | Name of the Kerberos realm to which the Informatica domain services belong. The realm name must be in uppercase. If the domain uses a single Kerberos realm for authentication, the service realm name and the user realm name must be the same. |
forwardable | Allows a service to delegate client user credentials to another service. The Informatica domain requires application services to authenticate the client user credentials with other services. Set to true. |
default_tkt_enctypes | Encryption types for the session key included in ticket-granting tickets (TGT). Set this property only if session keys must use specific encryption types. Ensure that the Kerberos Key Distribution Center (KDC) supports the encryption type that you specify. Do not set this property to allow the Kerberos protocol to select the encryption type to use. If the node hosts or Informatica client hosts use 256-bit encryption, install the Java Cryptography Extension (JCE) unlimited strength policy files on all node hosts and Informatica client hosts to avoid authentication issues. |
rdns | Determines whether reverse name lookup is used in addition to forward name lookup to canonicalize host names for use in service principal names. Set to false. |
renew_lifetime | The default renewable lifetime for initial ticket requests. |
ticket_lifetime | The default lifetime for initial ticket requests. |
udp_preference_limit | Determines the protocol that Kerberos uses when it sends a message to the KDC. Set to 1 to use the TCP protocol if the domain experiences intermittent Kerberos authentication failures. |
dns_lookup_kdc | Indicates whether the Kerberos client uses DNS SRV records to locate the KDCs and other servers for a realm, if they are not listed in the information for the realm. DNS uses SRV records to identify computers that host specific services. Required when the domain is Kerberos-enabled. Requires you to set the admin_server realm property. Set to true. |
dns_lookup_realm | Indicates whether the Kerberos client uses DNS TXT records to determine the Kerberos realm of a host. DNS uses text or TXT records to associate arbitrary text with a host or other name, such as human readable information about a server, network, data center, or other accounting information. Required when the domain is Kerberos-enabled. Set to true. |
4. In the realms section, set or add the properties required by Informatica.
The following table lists the values to which you must set properties in the realms section:
Property | Description |
|---|
admin_server | The name or IP address of the Kerberos administration server host. You can include an optional port number, separated from the host name by a colon. Default is 749. |
kdc | The name or IP address of a host running the Key Distribution Center (KDC) for the realm. You can include an optional port number, separated from the host name by a colon. Default is 88. |
The following example shows the parameters for the Hadoop realm if the Informatica domain does not use Kerberos authentication:
[realms]
HADOOP-REALM = {
kdc = 123abcdl34.hadoop-realm.com
admin server = def456.hadoop-realm.com
}
The following example shows the parameters for the Hadoop realm if the Informatica domain uses Kerberos authentication:
[realms]
INFA-AD-REALM = {
kdc = 123abcd.infa-realm.com
admin server = 123abcd.infa-realm.com
}
HADOOP-REALM = {
kdc = 123abcdl34.hadoop-realm.com
admin server = def456.hadoop-realm.com
}
5. In the domain_realms section, map the domain name or host name to a Kerberos realm name. The domain name is prefixed by a period (.).
The following example shows the parameters for the Hadoop domain_realm if the Informatica domain does not use Kerberos authentication:
[domain_realm]
.hadoop_realm.com = HADOOP-REALM
hadoop_realm.com = HADOOP-REALM
The following example shows the parameters for the Hadoop domain_realm if the Informatica domain uses Kerberos authentication:
[domain_realm]
.infa_ad_realm.com = INFA-AD-REALM
infa_ad_realm.com = INFA-AD-REALM
.hadoop_realm.com = HADOOP-REALM
hadoop_realm.com = HADOOP-REALM
6. Copy the krb5.conf file to the following locations on the machine that hosts the Data Integration Service:
- - <Informatica installation directory>/services/shared/security/
- - <Informatica installation directory>/java/jre/lib/security
The following example shows the content of a Kerberos configuration file with the required properties for a single Kerberos realm configuration:
[libdefaults]
default_realm = COMPANY.COM
forwardable = true
rdns = false
renew_lifetime = 7d
ticket_lifetime = 24h
udp_preference_limit = 1
dns_lookup_kdc = true
dns_lookup_realm = true
[realms]
COMPANY.COM = {
admin_server = KDC01.COMPANY.COM:749
kdc = KDC01.COMPANY.COM:88
}
[domain_realm]
.company.com = COMPANY.COM
company.com = COMPANY.COM
The following example shows the content of a Kerberos configuration file with the required properties for a Kerberos cross realm configuration:
[libdefaults]
default_realm = COMPANY.COM
forwardable = true
rdns = false
renew_lifetime = 7d
ticket_lifetime = 24h
udp_preference_limit = 1
dns_lookup_kdc = true
dns_lookup_realm = true
[realms]
COMPANY.COM = {
admin_server = KDC01.COMPANY.COM:749
kdc = KDC01.COMPANY.COM:88
}
EAST.COMPANY.COM = {
kdc = 10.75.141.193
admin_server = 10.75.141.193
}
WEST.COMPANY.COM = {
kdc = 10.78.140.111
admin_server = 10.78.140.111
[domain_realm]
.company.com = COMPANY.COM
company.com = COMPANY.COM
.east.company.com = EAST.COMPANY.COM
east.company.com = EAST.COMPANY.COM
.west.company.com = WEST.COMPANY.COM
west.company.com = WEST.COMPANY.COM
For more information about the Kerberos configuration file, see the Kerberos network authentication documentation.
Step 2. Set up the Cross-Realm Trust
Perform this step when the Informatica domain uses Kerberos authentication.
Set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos server to communicate with the KDC for the Active Directory server. When you set up the one-way cross-realm trust, the Hadoop cluster can authenticate the Active Directory principals.
To set up the cross-realm trust, you must complete the following steps:
- 1. Configure the Active Directory server to add the local MIT realm trust.
- 2. Configure the MIT server to add the cross-realm principal.
- 3. Translate principal names from the Active Directory realm to the MIT realm.
Configure the Microsoft Active Directory Server
Add the MIT KDC host name and local realm trust to the Active Directory server.
To configure the Active Directory server, complete the following steps:
- 1. Enter the following command to add the MIT KDC host name:
ksetup /addkdc <mit_realm_name> <kdc_hostname>
- For example, enter the command to add the following values:
ksetup /addkdc HADOOP-MIT-REALM def456.hadoop-mit-realm.com
- 2. Enter the following command to add the local realm trust to Active Directory:
netdom trust <mit_realm_name> /Domain:<ad_realm_name> /add /realm /passwordt:<TrustPassword>
- For example, enter the command to add the following values:
netdom trust HADOOP-MIT-REALM /Domain:INFA-AD-REALM /add /realm /passwordt:trust1234
- 3. Enter the following commands based on your Microsoft Windows environment to set the proper encryption type:
For Microsoft Windows 2008, enter the following command:
ksetup /SetEncTypeAttr <mit_realm_name> <enc_type>
For Microsoft Windows 2003, enter the following command:
ktpass /MITRealmName <mit_realm_name> /TrustEncryp <enc_type>
Note: The enc_type parameter specifies AES, DES, or RC4 encryption. To find the value for enc_type, see the documentation for your version of Windows Active Directory. The encryption type you specify must be supported on both versions of Windows that use Active Directory and the MIT server.
Configure the MIT Server
Configure the MIT server to add the cross-realm krbtgt principal. The krbtgt principal is the principal name that a Kerberos KDC uses for a Windows domain.
Enter the following command in the kadmin.local or kadmin shell to add the cross-realm krbtgt principal:
kadmin: addprinc -e "<enc_type_list>" krbtgt/<mit_realm_name>@<MY-AD-REALM.COM>
The enc_type_list parameter specifies the types of encryption that this cross-realm krbtgt principal will support. The krbtgt principal can support either AES, DES, or RC4 encryption. You can specify multiple encryption types. However, at least one of the encryption types must correspond to the encryption type found in the tickets granted by the KDC in the remote realm.
For example, enter the following value:
kadmin: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/HADOOP-MIT-REALM@INFA-AD-REALM
Translate Principal Names from the Active Directory Realm to the MIT Realm
To translate the principal names from the Active Directory realm into local names within the Hadoop cluster, you must configure the hadoop.security.auth_to_local property in the core-site.xml file and hadoop.kms.authentication.kerberos.name.rules property in the kms-site.xml file on all the machines in the Hadoop cluster.
For example, set the following property in core-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>
For example, set the following property in kms-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.kms.authentication.kerberos.name.rules</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>
Step 3. Create Matching Operating System Profile Names
Create matching operating system profile user names on the machine that runs the Data Integration Service and each Hadoop cluster node to run Informatica mapping jobs.
For example, if user joe runs the Data Integration Service on a machine, you must create the user joe with the same operating system profile on each Hadoop cluster node.
Open a UNIX shell and enter the following UNIX command to create a user with the user name joe.
Step 4. Create the Principal Name and Keytab Files in the Active Directory Server
Create an SPN in the KDC database for Microsoft Active Directory service that matches the user name of the user that runs the Data Integration Service. Create a keytab file for the SPN on the machine on which the KDC server runs. Then, copy the keytab file to the machine on which the Data Integration Service runs.
You do not need to use the Informatica Kerberos SPN Format Generator to generate a list of SPNs and keytab file names. You can create your own SPN and keytab file name.
To create an SPN and Keytab file in the Active Directory server, complete the following steps:
- Create a user in the Microsoft Active Directory Service.
- Login to the machine on which the Microsoft Active Directory Service runs and create a user with the same name as the user you created in Step 3. Create Matching Operating System Profile Names.
- Create an SPN associated with the user.
- Use the following guidelines when you create the SPN and keytab files:
- - The user principal name (UPN) must be the same as the SPN.
- - Enable delegation in Microsoft Active Directory.
- - Use the ktpass utility to create an SPN associated with the user and generate the keytab file.
For example, enter the following command:
ktpass -out infa_hadoop.keytab -mapuser joe -pass tempBG@2008 -princ joe/domain12345@INFA-AD-REALM -crypto all
Note: The -out parameter specifies the name and path of the keytab file. The -mapuser parameter is the user to which the SPN is associated. The -pass parameter is the password for the SPN in the generated keytab. The -princ parameter is the SPN.
- - Use the ktutil utility to generate the keytab file for an Azure HDInsight cluster that uses Enterprise Security Package and ADLS storage.
For example, enter the following command:
sshuser@hn0-hivesc:/tmp/keytabs$ ktutil
ktutil: addent -password -p alice -k 1 -e RC4-HMAC
Password for alice@SECUREHADOOPRC.ONMICROSOFT.COM;
ktutil: wkt /tmp/keytabs/alice.keytab
ktutil: q
Step 5. Specify the Kerberos Authentication Properties for the Data Integration Service
In the Data Integration Service properties, configure the properties that enable the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication. Use the Administrator tool to set the Data Integration Service properties.
Property | Description |
|---|
Cluster Staging Directory | The directory on the cluster where the Data Integration Service pushes the binaries to integrate the native and non-native environments and to store temporary files during processing. Default is /tmp. |
Hadoop Staging User | The HDFS user that performs operations on the Hadoop staging directory. The user requires write permissions on Hadoop staging directory. Default is the operating system user that starts the Informatica daemon. |
Custom Hadoop OS Path | The local path to the Informatica server binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems: Include the source directory in the path. For example, <Informatica server binaries>/source. Changes take effect after you recycle the Data Integration Service. Note: When you install an Informatica EBF, you must also install it in this directory. |
Hadoop Kerberos Service Principal Name | Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication. Not required for the MapR distribution. |
Hadoop Kerberos Keytab | The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs. Not required for the MapR distribution. |
Custom Properties | Properties that are unique to specific environments. You can configure run-time properties for the Hadoop environment in the Data Integration Service, the Hadoop connection, and in the mapping. You can override a property configured at a high level by setting the value at a lower level. For example, if you configure a property in the Data Integration Service custom properties, you can override it in the Hadoop connection or in the mapping. The Data Integration Service processes property overrides based on the following priorities: - 1. Mapping custom properties set using infacmd ms runMapping with the -cp option
- 2. Mapping run-time properties for the Hadoop environment
- 3. Hadoop connection advanced properties for run-time engines
- 4. Hadoop connection advanced general properties, environment variables, and classpaths
- 5. Data Integration Service custom properties
|
Step 6. Configure the Execution Options for the Data Integration Service
To determine whether the Data Integration Service runs jobs in separate operating system processes or in one operating system process, configure the Launch Job Options property. Use the Administrator tool to configure the execution options for the Data Integration Service.
- 1. Click Edit to edit the Launch Job Options property in the execution options for the Data Integration Service properties.
- 2. Choose the launch job option.