Enabling Hive Statistics
The Intelligent Data Lake application displays upload and publication statistics in the My Activity page using the Hive statistics stored in the database metastore. Hive is configured to use Derby as the default metastore and the statistics information is stored in the Derby metastore. However, Derby cannot be used for concurrent execution in Hive. To display the Intelligent Data Lake upload and publishing statistics, you must enable Hive statistics.
To change the default metastore to MySQL and enable Hive statistics:
1. Create a mysql table using the command: CREATE TABLE <db>.PARTITION_STATS_V2 (TS TIMESTAMP DEFAULT CURRENT_TIMESTAMP, ID VARCHAR(255) PRIMARY KEY , ROW_COUNT BIGINT , RAW_DATA_SIZE BIGINT );
2. On the machine where the Data Integration Service is configured, edit the hive.xml file as follows:
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:mysql</value>
<description>The default database that stores temporary hive statistics.</description>
</property>
<property>
<name>hive.stats.autogather</name>
<value>true</value>
<description>A flag to gather statistics automatically during the INSERT OVERWRITE command.</description>
</property>
<property>
<name>hive.stats.jdbcdriver</name>
<value>com.mysql.jdbc.Driver</value>
<description>The JDBC driver for the database that stores temporary hive statistics.</description>
</property>
<property>
<name>hive.stats.dbconnectionstring</name>
<value>jdbc:mysql://<host>:3306/<db>?useUnicode=true&characterEncoding=UTF-8&user=<user>&password=<pwd></value>
<description>The default connection string for the database that stores temporary hive statistics.</description>
</property>
3. Copy the mysql-connector.jar file to the following location: INFA_HOME/services/shared/hadoop/cloudera_cdh5u4/lib.
4. In the INFA_HOME/services/shared/hadoop/cloudera_cdh5u4/infaConf/hadoopEnv.properties file, add or update the following line: infapdo.aux.jars.path=file://$DIS_HADOOP_DIST/lib/mysql-connector-java-5.1.23.jar.