External Hadoop Cluster Deployment
You can deploy Enterprise Data Catalog on a Hadoop cluster that you have set up on Cloudera or Hortonworks. If you have enabled Kerberos authentication in your enterprise to authenticate users and services on a network, you can configure the Informatica domain to use Kerberos network authentication.
You need to configure Zookeeper, HDFS, and Yarn specifications when you install Enterprise Data Catalog on an external Hadoop cluster in your enterprise. The Catalog Service uses the following specifications and launches the following services and components on the Hadoop cluster as YARN application:
- •Solr version 5.2.1
- •HBase version 0.98
- •Scanner components
Prerequisites for the External Cluster
Before you install Enterprise Data Catalog to use an external Hadoop cluster, you must verify that the system environment meets the prerequisites required to deploy Enterprise Data Catalog.
Verify that the external Hadoop distribution meets the following prerequisites:
- •OpenSSL version on the cluster nodes is openssl-1.0.1e-30.el6_6.5.x86_64 or later. Make sure that you do not use versions in the 1.0.2 branch.
- •Ensure that you install JDK 1.8 on all cluster nodes.
- •Verify that the secure path in the /etc/sudoers file has the /usr/bin directory location at the start.
- •On each host machine, verify that you have the zip and unzip utilities available.
- •You have the Read, Write, and Execute permissions for owners, groups, and others on HDFS directories.
- •Verify that the maximum number of open file descriptors is 10,000 or more. Use the ulimit command to verify the current value and change the value if required.
- •When you create the Catalog Service that connects to an SSL-enabled external cluster, verify that you configure the following properties:
- - A keytab file that contains all the users in LDAP.
- - Kerberos domain name.
- - HDFS namenode and YARN Resource Manager service principals
- - Path to Solr keystore file and password.
- - Import the Hadoop cluster certificates to the Informatica domain truststore.
- •Before you deploy Enterprise Data Catalog on clusters where Apache Ranger is enabled, make sure that the Informatica domain user has the required permission to submit applications to the YARN queue.
- •If the cluster is enabled for SSL, it is recommended to enable SSL for the Informatica domain and the Catalog Service.
- •Verify that you install the following prerequisite packages before you enable Kerberos:
- - krb5-workstation
- - krb5-libs
- - krb5-auth-dialog
- •Create the service-logs directory under /informatica/ldm/<service cluster name>/ and assign the ownership of the directory to the service cluster user if the cluster is enabled for Kerberos.
Note: If the cluster is not enabled for Kerberos, create the service-logs directory under /informatica/ldm/<domain user name>/ and assign the ownership of the directory to the domain user.
- •If the cluster is not enabled for Kerberos. create the directory <domain user name> under /user and assign the ownership of directory to the domain user.
Note: If the cluster is enabled for Kerberos, create the directory <service cluster name> under /user and assign the ownership of the directory to the service cluster user.
- •Ensure that you do not create the Informatica domain on a node in the existing Hadoop cluster.
- •If you want to enable Kerberos authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
- - Make sure that all the domain nodes include the krb5.conf file in the following directories:
- ▪ $INFA_HOME/services/shared/security/
- ▪ /etc/
- - Make sure that the /etc/hosts file of all cluster nodes and domain nodes include the krb hosts entry and a host entry for other nodes.
- - Install krb5-workstation in all domain nodes.
- - Make sure that the keytab file is present in a common location on all domain nodes.
- •If you want to enable SSL authentication for Enterprise Data Catalog deployed on a multi-node Informatica domain, make sure that you complete the following prerequisites:
- - Export the Default.keystore of each node to the infa_truststore.jks on all nodes.
- - Make sure that the Default.keystore is unique for each host node.
- - Copy the Default.keystore to a unique location of each node.
- - If Informatica Cluster Service and Catalog Service are on different nodes, then export the Apache Ambari server certificate to the infa_truststore.jks on all nodes.
Preparing the External Hadoop Cluster Environment
You need to perform multiple validation checks before you install Enterprise Data Catalog on an external Hadoop cluster.
Perform the following steps before you install Enterprise Data Catalog to use an external cluster.:
- •Create the following directories in HDFS before you create the Catalog Service:
- - /Informatica/LDM/<service cluster name>
- - /user/<user name>
Where <service cluster name> is the name of the service cluster that you need to enter when you create the Catalog Service and <user name> is the username of the Informatica domain user.
- •Make <user name>who is the Informatica domain user is the owner of the /Informatica/LDM/<service cluster name> and /user/<user name> directories.
Kerberos and SSL Setup for an External Cluster
You can install Enterprise Data Catalog on an external cluster that uses Kerberos network authentication to authenticate users and services on a network. Enterprise Data Catalog also supports SSL authentication for secure communication in the cluster.
Kerberos is a network authentication protocol which uses tickets to authenticate access to services and nodes in a network. Kerberos uses a Key Distribution Center (KDC) to validate the identities of users and services and to grant tickets to authenticated user and service accounts. In the Kerberos protocol, users and services are known as principals. The KDC has a database of principals and their associated secret keys that are used as proof of identity. Kerberos can use an LDAP directory service as a principal database.
Informatica does not support cross or multi-realm Kerberos authentication. The server host, client machines, and Kerberos authentication server must be in the same realm.
The Informatica domain requires keytab files to authenticate nodes and services in the domain without transmitting passwords over the network. The keytab files contain the service principal names (SPN) and associated encrypted keys. Create the keytab files before you create nodes and services in the Informatica domain.
Prerequisites for SSL Authentication
Verify that the external cluster meets the following requirements before you can enable SSL authentication in the cluster:
- •Informatica domain is configured in the SSL mode.
- •The cluster and YARN REST endpoints are Kerberos-enabled.
- •Create a keystore file for the Apache Solr application on all nodes in the cluster. Import public certificates of Apache Solr keystore files on all the hosts into all the truststore files configured for HDFS and YARN. This step is required for Apache Spark and scanner jobs to connect to the Apache Solr application.
- •Import the public certificates of Apache Solr and YARN applications into the truststore file of the Informatica domain. This step is required for Catalog Service to connect to YARN and Solr applications.
- •Import the public certificates of Informatica domain and the Catalog Service into the YARN truststore.
- •Import the public certificate of the Catalog Service into the Informatica domain truststore.
- •If you plan to deploy Enterprise Data Catalog on an existing Hortonworks version 2.5 cluster that does not support SSL authentication, perform the following steps:
- 1. Configure the following properties in the /etc/hadoop/conf/ssl-client.xml file: ssl.client.truststore.location and ssl.client.truststore.password.
- 2. Ensure that the ssl.client.truststore.location value is set to /opt directory and not /etc directory. Verify that you configure the full path to the truststore file for the ssl.client.truststore.location property. For example, you can set the value similar to /opt/truststore/infa_truststore.jks.
- 3. Export the keystore certificate used in the Informatica domain.
- 4. Import the keystore certificate into the Informatica domain truststore file.
- 5. Place the domain truststore file in all the Hadoop nodes in the /opt directory. For example, /opt/truststore/infa_truststore.jks.
- 6. Open the /etc/hadoop/conf/ssl-client.xml file.
- 7. Modify the ssl.client.truststore.location and ssl.client.truststore.password properties.
Prerequisites for Kerberos Authentication
Perform the following steps before you enable the Kerberos authentication for the external cluster:
Note: Enterprise Data Catalog does not support deployment on a Hortonworks version 2.6 cluster where Kerberos is enabled.