Before you create a catalog source, ensure that you have the information required to connect to the source system.
Perform the following tasks:
•Assign the required permissions.
•Configure a connection to the Apache Hive source system in Administrator.
•Save the Apache HiveQL Script files on the runtime environment from which you want to extract metadata.
•Create endpoint catalog sources for connection assignment.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions for metadata extraction
To extract metadata, you need account access and permissions to the source system.
Verify that the Cloudera CDP, Amazon EMR, or Azure HDInsight cluster user has read permission on the source.
Grant permissions that allow you to perform the following operations:
•show schemas
•show tables
•show views
•show materialized views
Create a connection
Create a Hive Connector connection object in Administrator with the connection details of the Apache Hive source system.
1In Administrator, select Connections.
2Click New Connection.
3In the Connection Details section, enter the following connection details:
Connection property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
4Select the Hive Connector connection type.
5In the Hive Connector Properties section, in the Runtime Environment list, select the runtime environment where you want to run the tasks.
6In the Connection section, enter the following connection details:
Connection property
Description
Authentication Type
You can select one of the following authentication types:
- Kerberos. Select Kerberos for a Kerberos cluster.
- LDAP. Select LDAP for an LDAP-enabled cluster.
Note: LDAP is not applicable to mappings in advanced mode.
- None. Select None for a Hadoop cluster that is not secure or not LDAP-enabled.
JDBC URL *
The JDBC URL to connect to Hive.
Specify the following format based on your requirement:
- To view and import tables from a single database, use the following format: jdbc:hive2://<host>:<port>/<database name>
- To view and import tables from multiple databases, do not enter the database name. Use the following JDBC URL format: jdbc:hive2://<host>:<port>/
Note: After the port number, enter a slash.
- To access Hive on a Hadoop cluster enabled for TLS, specify the details in the JDBC URL in the following format: jdbc:hive2://<host>:<port>/<database name>;ssl=true;sslTrustStore=<TrustStore_path>;trustStorePassword=<TrustStore_password>,
where the truststore path is the directory path of the truststore file that contains the TLS certificate on the agent machine.
JDBC Driver *
The JDBC driver class to connect to Hive.
Username
The user name to connect to Hive in LDAP or None mode.
Password
The password to connect to Hive in LDAP or None mode.
Principal Name
The principal name to connect to Hive through Kerberos authentication.
Keytab Location
The path and file name to the Keytab file for Kerberos login.
Configuration Files Path *
The directory that contains the Hadoop configuration files for the client.
Copy the site.xml files from the Hadoop cluster and add them to a folder in the Linux box. Specify the path in this field before you use the connection in a mapping to access Hive on a Hadoop cluster:
- For mappings, you require the core-site.xml, hdfs-site.xml, and hive-site.xml files.
- For mappings in advanced mode, you require the core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, and yarn-site.xml files.
* These fields are mandatory parameters.
7Click Test Connection.
8Click Save.
Create endpoint catalog sources for connection assignment
An endpoint catalog source represents a source system that the catalog source references. Before you perform connection assignment, create endpoint catalog sources and run the catalog source jobs.
You can then perform connection assignment to reference source systems to view complete lineage with source system objects.