Catalog Source Configuration > Hadoop Distributed File System > Before you begin
  

Before you begin

Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:

Verify permissions

To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.

Permissions for metadata extraction

To extract Hadoop Distributed File System metadata, you need account access and permissions to the Hadoop Distributed File System source system.
Verify that the Hadoop Distributed File System administrator performs the following tasks:
Verify that the user of the Cloudera CDP, Amazon EMR, Azure HDInsight, or Google Dataproc cluster has read permissions on the source.

Permissions to run data profiles

You can run profiles with the permissions required to perform metadata extraction.

Permissions to run data classification

You can perform data classification with the permissions required to perform metadata extraction.

Permissions to run glossary association

You can perform glossary association with the permissions required to perform metadata extraction.

Configure Kerberos authentication

If you use Kerberos authentication, configure configuration files in Secure Agent to work with the Kerberos Key Distribution Center (KDC).
Ensure that you know the location of the following configuration files on your machine:
    1Open the hosts file located in the following directory on the Secure Agent machine: /etc/hosts
    2Add the KDC server IP address to the hosts file in the following format: <KDC Server IP address> <Fully Qualified Name of the KDC server> <Alias Name>
    3Save and close the hosts file.
    4Copy the krb5.conf file to the following directory: <Secure Agent installation directory>/jdk/jre/lib/security
    5Navigate to the directory on the Hadoop cluster node where you store the following files:
    6Copy KEYTAB and XML files from the Hadoop cluster node to a local Secure Agent directory, for example : /data/Kerberos
    You can modify the Kerberos configuration file.
    The following code shows a sample Kerberos configuration file:
    [libdefaults]
    default_realm = *****
    dns_lookup_kdc = false
    dns_lookup_realm = false
    ticket_lifetime = 86400
    renew_lifetime = 604800
    forwardable = true
    default_tgs_enctypes = rc4-hmac
    default_tkt_enctypes = rc4-hmac
    permitted_enctypes = rc4-hmac
    udp_preference_limit = 1
    kdc_timeout = 3000
    allow_weak_crypto=true
    [realms]
    <domain name> = {
    kdc = *****
    admin_server = *****
    }
    [domain_realm]
    Note: If the Kerberos encryption algorithms are not compatible with Java Standard Edition version 11, you can add the allow_weak_crypto=true property in the Kerberos configuration file.
    7Restart the Secure Agent machine.

Configure non-Kerberos authentication

If you don't use Kerberos authentication, you can authenticate with or without configuration files.
    bulletIf you have the following XML configuration files on your machine, place the files in a directory, for example: /data/Non-Kerberos
    If you don't have XML configuration files on your machine, continue to Create a connection.

Create a connection

Before you configure the Hadoop Distributed File System catalog source, create a connection object in Administrator.
Ensure that you have the required information to connect to the Hadoop Distributed File System.
Before you create a connection, configure the Hadoop Files V2 connector to download the Hadoop Distributed File System third-party libraries for the Cloudera CDP, Amazon EMR, Azure HDInsight, or Google Dataproc cluster. For more information about the Hadoop Files V2 connector, see the Data Integration Connectors help.
    1In Administrator, select Connections.
    2Click New Connection.
    3Enter the following connection details:
    Property
    Description
    Connection Name
    Unique name of the Hadoop Distributed File System connection that meets the following criteria:
    • - Can contain alphanumeric characters, spaces, and the following special characters: _ . + -
    • - Maximum length is 100 characters.
    • - Is not case sensitive.
    Description
    Optional description of the connection.
    The maximum permitted length is 255 characters.
    Type
    Type of connection. Ensure that the type is Hadoop Files V2.
    4If you want to use Kerberos authentication to connect to the Hadoop Distributed File System source system, enter the following properties:
    Property
    Description
    Runtime Environment
    A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
    NameNode URI
    The access URI to the Hadoop Distributed File System instance.
    Configuration Files Path
    The directory that contains Kerberos Hadoop Distributed File System configuration files.
    Keytab File
    The path and file name of the keytab file that contains the encrypted keys and Kerberos principals for Kerberos login.
    Principal Name
    The principal name that you use to connect to Hadoop Distributed File System with Kerberos authentication.
    5If you want to use non-Kerberos authentication with the configuration file to connect to the Hadoop Distributed File System source system, enter the following properties:
    Property
    Description
    Runtime Environment
    A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
    User Name
    Name of the user that connects to the Hadoop Distributed File System instance.
    NameNode URI
    The access URI to the Hadoop Distributed File System instance in one of the following formats:
    • - hdfs://<NameNodeURI>:<port>/
    • - hdfs://<NameNodeURI>:<port>/<source directory>
    Note: If you don't enter <source directory>, you can include the directory in Metadata Command Center. In the Filters area, select Folder and include the source directory.
    Configuration Files Path
    The directory that contains non-Kerberos Hadoop Distributed File System configuration files.
    6If you want to use non-Kerberos authentication without the configuration file to connect to the Hadoop Distributed File System source system, enter the following properties:
    Property
    Description
    Runtime Environment
    A runtime environment is either Informatica Cloud Secure Agent or a serverless runtime environment.
    User Name
    Name of the user that connects to the Hadoop Distributed File System instance.
    NameNode URI
    The access URI to the Hadoop Distributed File System instance in one of the following formats:
    • - hdfs://<NameNodeURI>:<port>/
    • - hdfs://<NameNodeURI>:<port>/<source directory>
    Note: If you don't enter <source directory>, you can include the directory in Metadata Command Center. In the Filters area, select Folder and include the source directory.
    7Click Test Connection.