Before you can extract catalog source metadata, get information from the Apache Hive administrator.
Perform the following prerequisite tasks:
•Verify permissions.
•Configure authentication.
•Create a connection.
Verify permissions
To extract metadata and to configure other capabilities that a catalog source might include, you need account access and permissions on the source system. The permissions required might vary depending on the capability.
Permissions to extract metadata
To extract Apache Hive metadata, you need access to the Apache Hive source.
Verify that the Cloudera CDP, Amazon EMR, or Azure HDInsight cluster user has read permission on the source.
Grant permissions that allow you to perform the following operations:
•show schemas
•show tables
•show views
•show materialized views
Permissions to run data profiles
Ensure that you have the required permissions to run profiles.
Grant read permissions to the Cloudera CDP, Amazon EMR, or Azure HDInsight cluster user for all objects on which you want to run data profiles.
Also, grant write permission to the connection object used by the intermediate staging connection to write profiling results temporarily.
Permissions to run data classification
You can perform data classification with the permissions required to perform metadata extraction.
Permissions to run glossary association
You can perform glossary association with the permissions required to perform metadata extraction.
Verify authentication
Verify that you have the URL to access Apache Hive and connect to the Hive REST API.
To use Kerberos authentication, provide the Kerberos principal for authentication when you configure the Apache Hive catalog source in Metadata Command Center.
Complete the following prerequisite tasks:
•Add details of the Kerberos server to the host file located in the Secure Agent machine in the following format: <ip_address> <hostname>
On a Windows machine, the hosts file is available in the following path: C:\Windows\System32\drivers\etc\hosts
On a Linux machine, the hosts file is available in the following path: /etc/hosts
•Download the Keytab file from the Kerberos administrator and copy it to a location on the Secure Agent machine.
Configure Kerberos authentication
If you use Kerberos authentication, provide the Kerberos principal for authentication when you configure the Apache Hive catalog source in Metadata Command Center. Also, configure the Secure Agent machine to work with the Kerberos Key Distribution Center (KDC).
Verify that you have the URL to connect to Apache Hive.
1Open the hosts file located on the Secure Agent machine.
On a Windows machine, the hosts file is available in the following path: C:\Windows\System32\drivers\etc\hosts
On a Linux machine, the hosts file is available in the following path: /etc/hosts
2Add details of the Kerberos server to the hosts file in the following format: <IP address> <Host name>
3Add the KDC server IP address to the file in the following format: <KDC server IP address> <Fully qualified name of the KDC server> <Alias name>
4Save and close the file.
5Verify that the Kerberos configuration file is available on the Secure Agent machine.
On a Windows machine, the krb5.ini configuration file is available in the following path: C:\Windows
On a Linux machine, the krb5.conf configuration file is available in the following path: /etc
6 Copy hive.keytab, core-site.xml, and hive-site.xml files from the Hadoop cluster node to a directory on the Secure Agent machine.
7Download the Keytab file from the Kerberos administrator and copy it to a directory on the Secure Agent machine.
Create a connection
Create a Hive Connector connection object in Administrator.
Before you create a connection, configure the Hive Connector to download the Hive third-party libraries for Cloudera CDP, Amazon EMR, or the Azure HDInsight cluster. For more information about the Hive Connector, see Hive Connector in the Data Integration Connectors help.
1In Administrator, select Connections.
2Click New Connection.
3In the Connection Details section, enter the following connection details:
Connection property
Description
Connection Name
Name of the connection.
Each connection name must be unique within the organization. Connection names can contain alphanumeric characters, spaces, and the following special characters: _ . + -,
Maximum length is 255 characters.
Description
Description of the connection. Maximum length is 4000 characters.
4Select the Hive Connector connection type.
5In the Hive Connector Properties section, select the runtime environment where you want to run the tasks.
6In the Connection section, enter the following connection details:
Connection property
Description
Authentication Type
You can select one of the following authentication types:
- Kerberos. Select Kerberos for a Kerberos cluster.
- LDAP. Select LDAP for an LDAP-enabled cluster.
Note: LDAP is not applicable to mappings in advanced mode.
- None. Select None for a Hadoop cluster that is not secure or not LDAP-enabled.
JDBC URL *
The JDBC URL to connect to Hive.
Specify the following format based on your requirement:
- To view and import tables from a single database, use the following format: jdbc:hive2://<host>:<port>/<database name>
- To view and import tables from multiple databases, do not enter the database name. Use the following JDBC URL format: jdbc:hive2://<host>:<port>/
Note: After the port number, enter a slash.
- To access Hive on a Hadoop cluster enabled for TLS, specify the details in the JDBC URL in the following format: jdbc:hive2://<host>:<port>/<database name>;ssl=true;sslTrustStore=<TrustStore_path>;trustStorePassword=<TrustStore_password>,
where the truststore path is the directory path of the truststore file that contains the TLS certificate on the agent machine.
JDBC Driver *
The JDBC driver class to connect to Hive.
Username
The user name to connect to Hive in LDAP or None mode.
Password
The password to connect to Hive in LDAP or None mode.
Principal Name
The principal name to connect to Hive through Kerberos authentication.
Keytab Location
The path and file name to the Keytab file for Kerberos login.
Configuration Files Path *
The directory that contains the Hadoop configuration files for the client.
Copy the site.xml files from the Hadoop cluster and add them to a folder in the Linux box. Specify the path in this field before you use the connection in a mapping to access Hive on a Hadoop cluster:
- For mappings, you require the core-site.xml, hdfs-site.xml, and hive-site.xml files.
- For mappings in advanced mode, you require the core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, and yarn-site.xml files.