Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Ensure that the Hadoop Distributed File System administrator created a user account configured to access the Hadoop Distributed File System source system.
•Ensure that you have the required permissions to access the Hadoop Distributed File System source system.
•Configure one of the following authentication methods:
- Kerberos. Requires the Secure Agent to work with a key distribution center (KDC).
- Non-Kerberos. Requires a configuration file or an access URI to the Hadoop Distributed File System instance.
•Configure a connection to the Hadoop Distributed File System source system in Administrator.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions for metadata extraction
To extract Hadoop Distributed File System metadata, you need account access and permissions to the Hadoop Distributed File System source system.
Verify that the Hadoop Distributed File System administrator performs the following tasks:
•Creates a user account to access the source system.
•Grants the user read permissions to the directory from which you want to extract metadata.
Verify that the user of the Cloudera CDP, Amazon EMR, Azure HDInsight, or Google Dataproc cluster has read permissions on the source.
Permissions to run data profiles
You can run profiles with the permissions required to perform metadata extraction.
Permissions to run data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to run glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Configure Kerberos authentication
If you use Kerberos authentication, configure configuration files in Secure Agent to work with the Kerberos Key Distribution Center (KDC).
Ensure that you know the location of the following configuration files on your machine:
•hdfs.keytab
•core-site.xml
•hdfs-site.xml
1Open the hosts file located in the following directory on the Secure Agent machine: /etc/hosts
2Add the KDC server IP address to the hosts file in the following format: <KDC Server IP address> <Fully Qualified Name of the KDC server> <Alias Name>
3Save and close the hosts file.
4Copy the krb5.conf file to the following directory: <Secure Agent installation directory>/jdk/jre/lib/security
5Navigate to the directory on the Hadoop cluster node where you store the following files:
- hdfs.keytab
- core-site.xml
- hdfs-site.xml
6Copy KEYTAB and XML files from the Hadoop cluster node to a local Secure Agent directory, for example : /data/Kerberos
You can modify the Kerberos configuration file.
The following code shows a sample Kerberos configuration file:
Note: If the Kerberos encryption algorithms are not compatible with Java Standard Edition version 11, you can add the allow_weak_crypto=true property in the Kerberos configuration file.
7Restart the Secure Agent machine.
Configure non-Kerberos authentication
If you don't use Kerberos authentication, you can authenticate with or without configuration files.
If you have the following XML configuration files on your machine, place the files in a directory, for example: /data/Non-Kerberos
- core-site.xml
- hdfs-site.xml
If you don't have XML configuration files on your machine, continue to Create a connection.
Create a connection
Before you configure the Hadoop Distributed File System catalog source, create a connection object in Administrator.
Ensure that you have the required information to connect to the Hadoop Distributed File System.
Before you create a connection, configure the Hadoop Files V2 connector to download the Hadoop Distributed File System third-party libraries for the Cloudera CDP, Amazon EMR, Azure HDInsight, or Google Dataproc cluster. For more information about the Hadoop Files V2 connector, see the Data Integration Connectors help.
1In Administrator, select Connections.
2Click New Connection.
3Enter the following connection details:
Property
Description
Connection Name
Unique name of the Hadoop Distributed File System connection that meets the following criteria:
- Can contain alphanumeric characters, spaces, and the following special characters: _ . + -
- Maximum length is 100 characters.
- Is not case sensitive.
Description
Optional description of the connection.
The maximum permitted length is 255 characters.
Type
Type of connection. Ensure that the type is Hadoop Files V2.
4If you want to use Kerberos authentication to connect to the Hadoop Distributed File System source system, enter the following properties:
Property
Description
Runtime Environment
A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
NameNode URI
The access URI to the Hadoop Distributed File System instance.
Configuration Files Path
The directory that contains Kerberos Hadoop Distributed File System configuration files.
Keytab File
The path and file name of the keytab file that contains the encrypted keys and Kerberos principals for Kerberos login.
Principal Name
The principal name that you use to connect to Hadoop Distributed File System with Kerberos authentication.
5If you want to use non-Kerberos authentication with the configuration file to connect to the Hadoop Distributed File System source system, enter the following properties:
Property
Description
Runtime Environment
A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
User Name
Name of the user that connects to the Hadoop Distributed File System instance.
NameNode URI
The access URI to the Hadoop Distributed File System instance in one of the following formats:
- hdfs://<NameNodeURI>:<port>/
- hdfs://<NameNodeURI>:<port>/<source directory>
Note: If you don't enter <source directory>, you can include the directory in Metadata Command Center. In the Filters area, select Folder and include the source directory.
Configuration Files Path
The directory that contains non-Kerberos Hadoop Distributed File System configuration files.
6If you want to use non-Kerberos authentication without the configuration file to connect to the Hadoop Distributed File System source system, enter the following properties:
Property
Description
Runtime Environment
A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
User Name
Name of the user that connects to the Hadoop Distributed File System instance.
NameNode URI
The access URI to the Hadoop Distributed File System instance in one of the following formats:
- hdfs://<NameNodeURI>:<port>/
- hdfs://<NameNodeURI>:<port>/<source directory>
Note: If you don't enter <source directory>, you can include the directory in Metadata Command Center. In the Filters area, select Folder and include the source directory.