Data Engineering Administrator Guide > Running Mappings on a Cluster with Kerberos Authentication > Running Mappings in a Kerberos-Enabled Hadoop Environment
  

Running Mappings in a Kerberos-Enabled Hadoop Environment

To run mappings in a Kerberos-enabled Hadoop environment, you must configure the Kerberos configuration file, create user authentication artifacts, and configure Kerberos authentication properties for the Informatica domain. To view metadata from Hive, HBase and complex file sources in the Developer tool, you must import configuration files and generate a Kerberos credentials file.
The Kerberos configuration file krb5.conf contains configuration properties for the Kerberos realm. The one-way cross-realm trust enables the Informatica domain to communicate with the Hadoop cluster.
The Informatica domain uses Kerberos authentication on a Microsoft Active Directory service. The Hadoop cluster uses Kerberos authentication on an MIT Kerberos service. You set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos service to communicate with the KDC for the Active Directory service. After you set up the cross-realm trust, you must configure the Informatica domain to enable mappings to run in the Hadoop cluster.
To run mappings on a cluster that uses Kerberos authentication, perform the following configuration tasks:
  1. 1Set up the Kerberos configuration file.
  2. 2When the Informatica domain uses Kerberos authentication, set up the one-way cross-realm trust.
  3. 3Create matching operating system profile user names on each Hadoop cluster node.
  4. 4Create the Service Principal Name and Keytab File in the Active Directory Server.
  5. 5Specify the Kerberos authentication properties for the Data Integration Service.
  6. 6Configure Execution Options for the Data Integration Service.
  7. 7Configure the Developer tool to enable you to view metadata from Hive, HBase and complex file sources when a cluster is Kerberos-enabled.

Step 1. Set Up the Kerberos Configuration File on the Domain Host

Set the properties required by Informatica in the Kerberos configuration file, and then copy the file to each node in the Informatica domain.
krb5.conf is located in the <Informatica Installation Directory>/java/jre/lib/security directory.
    1Back up krb5.conf before you make any changes.
    2Open krb5.conf for editing.
    3Configure the following Kerberos library properties in the libdefaults section of the file.
    The following table describes the properties to enter:
    Eigenschaft
    Beschreibung
    default_realm
    Name des Kerberos-Bereichs, zu dem die Informatica-Domänendienste gehören. Der Bereichsname muss aus Großbuchstaben bestehen.
    Wird in der Domäne ein einzelner Kerberos-Bereich für die Authentifizierung verwendet, müssen der Name des Dienstbereichs und der Name des Benutzerbereichs identisch sein.
    forwardable
    Ermöglicht es einem Dienst, Client-Benutzeranmeldedaten an einen anderen Dienst zu delegieren. Für die Informatica-Domäne müssen Anwendungsdienste die Client-Benutzeranmeldedaten bei anderen Diensten authentifizieren.
    Setzen Sie den Wert auf „true“.
    default_tkt_enctypes
    Verschlüsselungstypen für den Sitzungsschlüssel, der in den Ticket-Granting-Tickets (TGT) enthalten ist. Legen Sie diese Eigenschaft nur fest, wenn Sitzungsschlüssel spezifische Verschlüsselungstypen verwenden müssen. Vergewissern Sie sich, dass das Key Distribution Center (KDC) von Kerberos den angegebenen Verschlüsselungstyp unterstützt.
    Legen Sie diese Eigenschaft nicht fest, um zuzulassen, dass das Kerberos-Protokoll den zu verwendenden Verschlüsselungstyp auswählt.
    Wenn die Knotenhosts oder die Informatica-Clienthosts 256-Bit-Verschlüsselung verwenden, installieren Sie die Unlimited Strength JCE-Richtliniendateien (Java Cryptography Extension) auf allen Knotenhosts und Informatica-Clienthosts, um Authentifizierungsprobleme zu vermeiden.
    rdns
    Bestimmt, ob Reverse Name Lookup zusätzlich zu Forward Name Lookup verwendet wird, um Hostnamen für die Verwendung in Dienstprinzipalnamen zu kanonisieren.
    Setzen Sie den Wert auf „false“.
    renew_lifetime
    Die verlängerbare Standardlebensdauer für anfängliche Ticketanfragen.
    ticket_lifetime
    Die Standardlebensdauer für anfängliche Ticketanfragen.
    udp_preference_limit
    Legt das Protokoll fest, das Kerberos beim Senden einer Meldung an das KDC verwendet.
    Legen Sie den Wert auf 1 fest, um das TCP-Protokoll zu verwenden, wenn in der Domäne immer wieder Kerberos-Authentifizierungsfehler auftreten.
    dns_lookup_kdc
    Gibt an, ob der Kerberos-Client die KDCs und andere Server für einen Bereich, falls diese nicht in den Informationen für den Bereich aufgeführt sind, mithilfe von DNS-SRV-Datensätzen sucht. Anhand von SRV-Datensätzen ermittelt DNS Computer, die bestimmte Dienste hosten. Erforderlich, wenn die Domäne Kerberos-fähig ist.
    Erfordert die Festlegung der Bereichseigenschaft „admin_server“.
    Setzen Sie den Wert auf „true“.
    dns_lookup_realm
    Gibt an, ob der Kerberos-Client den Kerberos-Bereich eines Hosts mithilfe von DNS-TXT-Datensätzen bestimmt. Anhand von TXT-Datensätzen verknüpft DNS beliebigen Text, beispielsweise visuell lesbare Informationen zu einem Server, Netzwerk, Datencenter oder anderen Buchhaltungsinformationen, mit einem Hostnamen oder anderen Namen. Erforderlich, wenn die Domäne Kerberos-fähig ist.
    Setzen Sie den Wert auf „true“.
    4In the realms section, set or add the properties required by Informatica.
    The following table lists the values to which you must set properties in the realms section:
    Property
    Description
    admin_server
    The name or IP address of the Kerberos administration server host.
    You can include an optional port number, separated from the host name by a colon. Default is 749.
    kdc
    The name or IP address of a host running the Key Distribution Center (KDC) for the realm.
    You can include an optional port number, separated from the host name by a colon. Default is 88.
    When you use a Kerberos-enabled Cloudera CDP Public Cloud cluster, set both admin_server and kdc to the KDC server IP address. To find the KDC server IP address, run the following command on any cluster node: ping kdc.<default realm name>
    The following example shows the parameters for the Hadoop realm if the Informatica domain does not use Kerberos authentication:
    [realms]
    HADOOP-REALM = {
    kdc = 123abcdl34.hadoop-realm.com
    admin server = def456.hadoop-realm.com
    }
    The following example shows the parameters for the Hadoop realm if the Informatica domain uses Kerberos authentication:
    [realms]
    INFA-AD-REALM = {
    kdc = 123abcd.infa-realm.com
    admin server = 123abcd.infa-realm.com
    }
    HADOOP-REALM = {
    kdc = 123abcdl34.hadoop-realm.com
    admin server = def456.hadoop-realm.com
    }
    5In the domain_realms section, map the domain name or host name to a Kerberos realm name. The domain name is prefixed by a period (.).
    The following example shows the parameters for the Hadoop domain_realm if the Informatica domain does not use Kerberos authentication:
    [domain_realm]
    .hadoop_realm.com = HADOOP-REALM
    hadoop_realm.com = HADOOP-REALM
    The following example shows the parameters for the Hadoop domain_realm if the Informatica domain uses Kerberos authentication:
    [domain_realm]
    .infa_ad_realm.com = INFA-AD-REALM
    infa_ad_realm.com = INFA-AD-REALM
    .hadoop_realm.com = HADOOP-REALM
    hadoop_realm.com = HADOOP-REALM
    6Copy the krb5.conf file to the following locations on the machine that hosts the Data Integration Service:
Das folgende Beispiel zeigt den Inhalt einer Kerberos-Konfigurationsdatei mit den notwendigen Eigenschaften für eine Kerberos-Konfiguration für Einzelbereiche:
[libdefaults]
default_realm = COMPANY.COM
forwardable = true
rdns = false
renew_lifetime = 7d
ticket_lifetime = 24h
udp_preference_limit = 1
dns_lookup_kdc = true
dns_lookup_realm = true

[realms]
COMPANY.COM = {
admin_server = KDC01.COMPANY.COM:749
kdc = KDC01.COMPANY.COM:88
}

[domain_realm]
.company.com = COMPANY.COM
company.com = COMPANY.COM
Das folgende Beispiel zeigt den Inhalt einer Kerberos-Konfigurationsdatei mit den notwendigen Eigenschaften für eine bereichsübergreifende Kerberos-Konfiguration:
[libdefaults]
default_realm = COMPANY.COM
forwardable = true
rdns = false
renew_lifetime = 7d
ticket_lifetime = 24h
udp_preference_limit = 1
dns_lookup_kdc = true
dns_lookup_realm = true

[realms]
COMPANY.COM = {
admin_server = KDC01.COMPANY.COM:749
kdc = KDC01.COMPANY.COM:88
}
EAST.COMPANY.COM = {
kdc = 10.75.141.193
admin_server = 10.75.141.193
}
WEST.COMPANY.COM = {
kdc = 10.78.140.111
admin_server = 10.78.140.111

[domain_realm]
.company.com = COMPANY.COM
company.com = COMPANY.COM
.east.company.com = EAST.COMPANY.COM
east.company.com = EAST.COMPANY.COM
.west.company.com = WEST.COMPANY.COM
west.company.com = WEST.COMPANY.COM
Weitere Informationen zur Kerberos-Konfigurationsdatei finden Sie in der Dokumentation zur Kerberos-Netzwerkauthentifizierung.

Step 2. Set up the Cross-Realm Trust

Perform this step when the Informatica domain uses Kerberos authentication.
Set up a one-way cross-realm trust to enable the KDC for the MIT Kerberos server to communicate with the KDC for the Active Directory server. When you set up the one-way cross-realm trust, the Hadoop cluster can authenticate the Active Directory principals.
To set up the cross-realm trust, you must complete the following steps:
  1. 1Configure the Active Directory server to add the local MIT realm trust.
  2. 2Configure the MIT server to add the cross-realm principal.
  3. 3Translate principal names from the Active Directory realm to the MIT realm.

Configure the Microsoft Active Directory Server

Add the MIT KDC host name and local realm trust to the Active Directory server.
To configure the Active Directory server, complete the following steps:
  1. 1Enter the following command to add the MIT KDC host name:
  2. ksetup /addkdc <mit_realm_name> <kdc_hostname>
  3. For example, enter the command to add the following values:
  4. ksetup /addkdc HADOOP-MIT-REALM def456.hadoop-mit-realm.com
  5. 2Enter the following command to add the local realm trust to Active Directory:
  6. netdom trust <mit_realm_name> /Domain:<ad_realm_name> /add /realm /passwordt:<TrustPassword>
  7. For example, enter the command to add the following values:
  8. netdom trust HADOOP-MIT-REALM /Domain:INFA-AD-REALM /add /realm /passwordt:trust1234
  9. 3Enter the following commands based on your Microsoft Windows environment to set the proper encryption type:
  10. For Microsoft Windows 2008, enter the following command:
    ksetup /SetEncTypeAttr <mit_realm_name> <enc_type>
    For Microsoft Windows 2003, enter the following command:
    ktpass /MITRealmName <mit_realm_name> /TrustEncryp <enc_type>
    HINWEIS: The enc_type parameter specifies AES, DES, or RC4 encryption. To find the value for enc_type, see the documentation for your version of Windows Active Directory. The encryption type you specify must be supported on both versions of Windows that use Active Directory and the MIT server.

Configure the MIT Server

Configure the MIT server to add the cross-realm krbtgt principal. The krbtgt principal is the principal name that a Kerberos KDC uses for a Windows domain.
Enter the following command in the kadmin.local or kadmin shell to add the cross-realm krbtgt principal:
kadmin: addprinc -e "<enc_type_list>" krbtgt/<mit_realm_name>@<MY-AD-REALM.COM>
The enc_type_list parameter specifies the types of encryption that this cross-realm krbtgt principal will support. The krbtgt principal can support either AES, DES, or RC4 encryption. You can specify multiple encryption types. However, at least one of the encryption types must correspond to the encryption type found in the tickets granted by the KDC in the remote realm.
For example, enter the following value:
kadmin: addprinc -e "rc4-hmac:normal des3-hmac-sha1:normal" krbtgt/HADOOP-MIT-REALM@INFA-AD-REALM

Translate Principal Names from the Active Directory Realm to the MIT Realm

To translate the principal names from the Active Directory realm into local names within the Hadoop cluster, you must configure the hadoop.security.auth_to_local property in the core-site.xml file and hadoop.kms.authentication.kerberos.name.rules property in the kms-site.xml file on all the machines in the Hadoop cluster.
For example, set the following property in core-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>
For example, set the following property in kms-site.xml on all the machines in the Hadoop cluster:
<property>
<name>hadoop.kms.authentication.kerberos.name.rules</name>
<value>
RULE:[1:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
RULE:[2:$1@$0](^.*@INFA-AD-REALM$)s/^(.*)@INFA-AD-REALM$/$1/g
DEFAULT
</value>
</property>

Step 3. Create Matching Operating System Profile Names

Create matching operating system profile user names on the machine that runs the Data Integration Service and each Hadoop cluster node to run Informatica mapping jobs.
For example, if user joe runs the Data Integration Service on a machine, you must create the user joe with the same operating system profile on each Hadoop cluster node.
Open a UNIX shell and enter the following UNIX command to create a user with the user name joe.

Step 4. Create the Principal Name and Keytab Files in the Active Directory Server

Create an SPN in the KDC database for Microsoft Active Directory service that matches the user name of the user that runs the Data Integration Service. Create a keytab file for the SPN on the machine on which the KDC server runs. Then, copy the keytab file to the machine on which the Data Integration Service runs.
You do not need to use the Informatica Kerberos SPN Format Generator to generate a list of SPNs and keytab file names. You can create your own SPN and keytab file name.
To create an SPN and Keytab file in the Active Directory server, complete the following steps:
Create a user in the Microsoft Active Directory Service.
Login to the machine on which the Microsoft Active Directory Service runs and create a user with the same name as the user you created in Step 3. Create Matching Operating System Profile Names.
Create an SPN associated with the user.
Use the following guidelines when you create the SPN and keytab files:

Step 5. Specify the Kerberos Authentication Properties for the Data Integration Service

In the Data Integration Service properties, configure the properties that enable the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication. Use the Administrator tool to set the Data Integration Service properties.
Property
Description
Cluster Staging Directory
The directory on the cluster where the Data Integration Service pushes the binaries to integrate the native and non-native environments and to store temporary files during processing. Default is /tmp.
Hadoop Staging User
The HDFS user that performs operations on the Hadoop staging directory. The user requires write permissions on Hadoop staging directory. Default is the operating system user that starts the Informatica daemon.
Custom Hadoop OS Path
The local path to the Informatica server binaries compatible with the Hadoop operating system. Required when the Hadoop cluster and the Data Integration Service are on different supported operating systems. The Data Integration Service uses the binaries in this directory to integrate the domain with the Hadoop cluster. The Data Integration Service can synchronize the following operating systems:
  • - SUSE and Redhat
Include the source directory in the path. For example, <Informatica server binaries>/source.
Changes take effect after you recycle the Data Integration Service.
HINWEIS: When you install an Informatica EBF, you must also install it in this directory.
Hadoop Kerberos Service Principal Name
Service Principal Name (SPN) of the Data Integration Service to connect to a Hadoop cluster that uses Kerberos authentication.
Not required for the MapR distribution.
Hadoop Kerberos Keytab
The file path to the Kerberos keytab file on the machine on which the Data Integration Service runs.
Not required for the MapR distribution.
Custom Properties
Properties that are unique to specific environments.
Sie haben die Möglichkeit, Laufzeiteigenschaften für die Hadoop-Umgebung im Datenintegrationsdienst, in der Hadoop-Verbindung und im Mapping zu konfigurieren. Eine auf einer hohen Ebene konfigurierte Eigenschaft können Sie durch Festlegen des Werts auf einer unteren Ebene überschreiben. Wenn Sie z. B. eine Eigenschaft in den benutzerdefinierten Eigenschaften des Datenintegrationsdiensts konfigurieren, können Sie diese in der Hadoop-Verbindung oder im Mapping überschreiben. Der Datenintegrationsdienst verarbeitet Überschreibungen von Eigenschaften auf der Grundlage der folgenden Prioritäten:
  1. 1Zuordnung der mit dem Befehl infacmd ms runMapping und der Option -cp festgelegten benutzerdefinierten Eigenschaften
  2. 2Zuordnung der Laufzeiteigenschaften für die Hadoop-Umgebung
  3. 3Erweiterte Eigenschaften der Hadoop-Verbindung für Laufzeit-Engines
  4. 4Erweiterte allgemeine Eigenschaften, Umgebungsvariablen und Klassenpfade der Hadoop-Verbindung
  5. 5Benutzerdefinierte Eigenschaften des Datenintegrationsdiensts
HINWEIS: Wenn eine Zuordnung Hive-Server 2 zum Ausführen eines Jobs oder von Teilen eines Jobs verwendet, können Sie keine Eigenschaften überschreiben, die auf Cluster-Ebene in PreSQL- oder Post-SQL-Abfragen oder SQL-Überschreibungsanweisungen konfiguriert sind.
Problemumgehung: Statt die Clusterkonfiguration in der Domäne zum Überschreiben von Cluster-Eigenschaften zu verwenden, übergeben Sie die Überschreibungseinstellungen an die JDBC-URL. Beispiel: beeline -u "jdbc:hive2://<domain host>:<port_number>/tpch_text_100" --hiveconf hive.execution.engine=tez

Step 6. Configure the Execution Options for the Data Integration Service

To determine whether the Data Integration Service runs jobs in separate operating system processes or in one operating system process, configure the Launch Job Options property. Use the Administrator tool to configure the execution options for the Data Integration Service.
  1. 1Click Edit to edit the Launch Job Options property in the execution options for the Data Integration Service properties.
  2. 2Choose the launch job option.

Step 7. Configure the Developer Tool

To import metadata from Hive, HBase, and complex file sources, import configuration files from the Kerberos-enabled cluster, and generate the Kerberos credentials file on the Developer tool machine.

Import configuration files

The Hadoop cluster uses a set of XML files named *-site.xml to store configuration settings. The domain uses the same set of files to create the cluster configuration object.
To enable you to import metadata from the cluster, import the *-site.xml files to each Developer tool machine:
  1. 1Log in to the Administrator tool and navigate to Connections > Cluster Configuration > CCO. Locate the cluster configuration associated with the Hadoop cluster.
  2. 2Extract the *-site.xml files in the cluster configuration, inlcuding sensitive properties, to the following directory on the Developer tool machine: <Informatica installation directory>\clients\DeveloperClient\hadoop\<Hadoop distribution>\conf
  3. For more information about sensitive properties, see Active Properties View.
HINWEIS: If you refresh the cluster configuration, repeat these steps.

Generate the Kerberos credentials file

  1. 1Copy the krb5.conf file from <Developer tool installation directory>/services/shared/security to C:/Windows.
  2. 2Rename krb5.conf to krb5.ini.
  3. 3In the krb5.ini file, verify the value of the forwardable option to determine how to use the kinit command. If forwardable=true, run the command with the -f option. Otherwise, run the command without the -f option.
  4. 4To generate the Kerberos credentials file, run the kinit command from the following location: <Developer tool installation directory>/clients/java/bin/kinit.exe
  5. For example, you might run the following command: kinit joe/domain12345@MY-REALM