Catalog Source Configuration > Apache HiveQL Script > Before you begin
  

Before you begin

Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:

Verify permissions

To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.

Permissions for metadata extraction

To extract metadata, you need account access and permissions to the source system.
Verify that the Cloudera CDP, Amazon EMR, or Azure HDInsight cluster user has read permission on the source.
Grant permissions that allow you to perform the following operations:

Create a connection

Create a Hive Connector connection object in Administrator with the connection details of the Apache Hive source system.
    1In Administrator, select Connections.
    2Click New Connection.
    3In the Connection Details section, enter the following connection details:
    Connection property
    Description
    Connection Name
    Name of the connection.
    Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
    Maximum length is 255 characters.
    Description
    Description of the connection. Maximum length is 4000 characters.
    4Select the Hive Connector connection type.
    5In the Hive Connector Properties section, in the Runtime Environment list, select the runtime environment where you want to run the tasks.
    6In the Connection section, enter the following connection details:
    Connection property
    Description
    Authentication Type
    You can select one of the following authentication types:
    • - Kerberos. Select Kerberos for a Kerberos cluster.
    • - LDAP. Select LDAP for an LDAP-enabled cluster.
    • Note: LDAP is not applicable to mappings in advanced mode.
    • - None. Select None for a Hadoop cluster that is not secure or not LDAP-enabled.
    JDBC URL *
    The JDBC URL to connect to Hive.
    Specify the following format based on your requirement:
    • - To view and import tables from a single database, use the following format: jdbc:hive2://<host>:<port>/<database name>
    • - To view and import tables from multiple databases, do not enter the database name. Use the following JDBC URL format: jdbc:hive2://<host>:<port>/
    • Note: After the port number, enter a slash.
    • - To access Hive on a Hadoop cluster enabled for TLS, specify the details in the JDBC URL in the following format: jdbc:hive2://<host>:<port>/<database name>;ssl=true;sslTrustStore=<TrustStore_path>;trustStorePassword=<TrustStore_password>,
    • where the truststore path is the directory path of the truststore file that contains the TLS certificate on the agent machine.
    JDBC Driver *
    The JDBC driver class to connect to Hive.
    Username
    The user name to connect to Hive in LDAP or None mode.
    Password
    The password to connect to Hive in LDAP or None mode.
    Principal Name
    The principal name to connect to Hive through Kerberos authentication.
    Keytab Location
    The path and file name to the Keytab file for Kerberos login.
    Configuration Files Path *
    The directory that contains the Hadoop configuration files for the client.
    Copy the site.xml files from the Hadoop cluster and add them to a folder in the Linux box. Specify the path in this field before you use the connection in a mapping to access Hive on a Hadoop cluster:
    • - For mappings, you require the core-site.xml, hdfs-site.xml, and hive-site.xml files.
    • - For mappings in advanced mode, you require the core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, and yarn-site.xml files.
    * These fields are mandatory parameters.
    7Click Test Connection.
    8Click Save.

Create endpoint catalog sources for connection assignment

An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.
You can then perform connection assignment to reference source systems to view complete lineage with source system objects.