Big Data Management User Guide > Monitoring Mappings in the Hadoop Environment > Hive Engine Monitoring
  

Hive Engine Monitoring

You can monitor statistics and view log events for a Hive engine mapping job in the Monitor tab of the Administrator tool.
The following image shows the Monitor tab in the Administrator tool:
The Monitor tab has the Summary Statistics and Execution Statistics views.

Summary Statistics

Use the Summary Statistics view to view graphical summaries of object states and distribution across the Data Integration Services. You can also view graphs of the memory and CPU that the Data Integration Services used to run the objects.

Execution Statistics

Use the Execution Statistics view to monitor properties, run-time statistics, and run-time reports. In the Navigator, you can expand a Data Integration Service to monitor Ad Hoc Jobs or expand an application to monitor deployed mapping jobs or workflows
When you select Ad Hoc Jobs, deployed mapping jobs, or workflows from an application in the Navigator of the Execution Statistics view, a list of jobs appears in the contents panel. The contents panel displays jobs that are in the queued, running, completed, failed, aborted, and cancelled state. The Data Integration Service submits jobs in the queued state to the cluster when resources are available.
The contents panel groups related jobs based on the job type. You can expand a job type to view the related jobs under it.
Access the following views on the content panel under the Execution Statistics view:

Properties

The Properties view on the content panel shows the general properties about the selected job such as name, job type, user who started the job, and start time of the job.

Hive Execution Plan

The Hive execution plan displays the Hive script that the Data Integration Service generates based on the mapping logic. The execution plan includes the Hive queries and Hive commands. Each script has a unique identifier.

Summary Statistics

The Summary Statistics view appears in the details panel when you select a mapping job in the contents panel. The Summary Statistics view displays throughput and resource usage statistics for the job.
You can view the following throughput statistics for the job:
You can view the throughput statistics for the job in the details pane in the following image:
The Monitor tab in the Administrator tool shows the mapping, script, and Hive query in the Ad Hoc Jobs pane. In the details pane under Summary Statistics, the throughput appears for the source and target. Under the source, the AllHiveSourceTables row appears with all the source statistics, such as first row accessed and dropped rows. Under the target, a row appears with all the target statistics, such as average bytes and rejected rows.
The Hive summary statistics include a row called "AllHiveSourceTables." This row includes records read from the following sources for the MapReduce engine:
If the LDTM session includes one Tez job, the "AllHiveSourceTables" statistics only includes original Hive sources in the mapping.
Note: The AllHiveSourceTables statistics only includes the original Hive sources in a mapping for the Tez job.
When a mapping contains customized data objects or logical data objects, the summary statistics display the original source data instead of the customized data objects or logical data objects in the Administrator tool and in the session log. The Hive driver reads data from the original source data.
You can view the Tez job statistics in the Administrator tool when reading and writing to Hive tables that the Spark engine launches in any of the following scenarios:
Incorrect statistics appears for all the Hive sources and targets indicating zero rows for average rows for each second, bytes, average bytes for each second, and rejected rows. You can see that only processed rows contain correct values, and the remaining columns will contain either 0 or N/A.
When an Update Strategy transformation runs on the Hive engine, the Summary Statistics for the target table instance combines the number of inserted rows processed, deleted rows processed, and twice the number of updated rows processed. The update operations are handled as separate delete and insert operations.

Detailed Statistics

The Detailed Statistics view appears in the details panel when you select a mapping job in the contents panel. The Detailed Statistics view displays graphs of the throughput and resource usage statistics for the job run.

Monitoring with MapReduce Hive Engine

You can monitor the MapReduce Hive engine.
You can also monitor and view Hive tasks that use MapReduce to run Spark jobs. Or, you can monitor MapReduce engines for Hive mappings.
Note: Effective in version 10.2.1, the MapReduce mode of the Hive run-time engine is deprecated, and Informatica will drop support for it in a future release. The Tez mode remains supported.
The following image shows the MapReduce Hive Query properties on the Monitor tab in the Administrator tool:
On the Monitor tab in the Administrator tool, the contents panel contains the mapping, script, and Hive query that uses MapReduce engines. In the Details pane, you can view the MR Job details, such as the Job ID and Map % Complete. The Job_ID starts with the prefix job_. Both Map % Complete and Reduce % Complete shows as 100%. DAG % Complete appears as N/A.
The following image shows a Hive task that uses MapReduce to run Spark jobs:
On the Monitor tab in the Administrator tool, the contents panel contains the mapping, script, Hive query, and Spark application that uses MapReduce engines to run Spark jobs. In the Details pane, you can view the MR Job details, such as the Job ID and Map % Complete. The Job_ID starts with the prefix job_. Both Map % Complete and Reduce % Complete shows as 100%. DAG % Complete appears as N/A.
You can view the following information under the MR Job details for MapReduce:
Property
Applicable Values
Description
Job ID
Job_<name>
You can select the link under Job ID to view the application cluster
For example, if the Job ID property contains a value starting with the prefix job_ in the MR Job Details pane, the naming convention indicates that the MapReduce engine is in use.
Map % Complete
0 - 100
You can specify a value from 0 through 100 for MapReduce.
Reduce % Complete
0 - 100
You can specify a value from 0 through 100 for MapReduce.
DAG % Complete
N/A
DAG % is not applicable for MapReduce.

Monitoring with Tez Hive Engine

You can monitor Tez Hive engine.
Tez uses YARN timeline as its application history store. Tez stores most of its lifecycle information into the history store, such as all the DAG information. You can monitor the Tez engine information, such as DAG % complete.
Tez relies on the application time line server as a backing store for the application data generated during the lifetime of a YARN application. Tez interfaces with the application timeline server and displays both a live and historical view of the Tez application inside a Tez web application.
The following image shows the Tez Hive Query properties on the Monitor tab in the Administrator tool:
On the Monitor tab in the Administrator tool, the contents panel contains the mapping, script, and Hive query that uses Tez engines. In the Details pane, you can view the MR Job details, such as the Job ID and Map % Complete. The Job_ID starts with the prefix application_. Both Map % Complete and Reduce % Complete shows as N/A. DAG % Complete appears as 100%.
You can monitor and view Hive tasks that use Tez to run Spark jobs. Or, you can monitor Tez engines for Hive mappings.
The following image shows a Hive task that uses Tez to run Spark jobs:
On the Monitor tab in the Administrator tool, the contents panel contains the mapping, script, Hive query, and Spark application that uses Tez engines to run Spark jobs. In the Details pane, you can view the MR Job details, such as the Job ID and Map % Complete. The Job_ID starts with the prefix application_. Both Map % Complete and Reduce % Complete shows as N/A. DAG % Complete appears as 100%.
You can view the following information under the MR Job details for Tez:
Property
Applicable Values
Description
Job ID
Application_<name>
You can select the link under Job ID to view the application cluster.
For example, if the Job ID property contains a value starting with the prefix application_ in the MR Job Details pane, the naming convention indicates that the Tez engine is in use.
You can click the link under Job ID to view the application cluster. If you click the Tracking URL for the Tez job, you get redirected to the Hadoop Resource Manager. If you then click History, you can view the Tez view, which is provided by the Hadoop distribution in Ambari.
For each application ID, there are multiple DAGs information.
Map % Complete
N/A
Map % is not applicable for Tez.
Reduce % Complete
N/A
Reduce % is not applicable for Tez.
DAG % Complete
0 - 100
You can specify a value from 0 through 100 for Tez.
When you specify a query in Hive, the script launches a Hadoop job, such as INSERT or DELETE query. Or, the script launches a Hive query. If the script launches no Hadoop jobs, it appears blank for the following fields, such as Job ID, reduce % complete, and DAG % complete.
Note: If the active Resource Manager goes down during a mapping run on the Tez engine, the Tez monitoring statistics might become unavailable for Hive jobs or Spark jobs that use HiveServer 2 tasks.

Hive Engine Logs

The Hive engine logs appear in the LDTM log and the Hive session log.
You can find the information about Hive engine log events in the following log files:
LDTM log
The LDTM logs the results of the Hive queries run for the mapping. You can view the LDTM log from the Developer tool or the Administrator tool for a mapping job.
Hive session log
When you have a Hive script in the Hive execution plan of a mapping, the Data Integration Service opens a Hive session to run the Hive queries.
A Hive session updates a log file in the following directory on the Data Integration Service node:
<Informatica installation directory>/tomcat/bin/disTemp/.
The full path to the Hive session log appears in the LDTM log.
You can view information about DAG vertices in the Tez job link and in the session log. The Tez layout and views might differ based on the selected configurations for the Tez specific properties.
The following image shows the Tez Hive query properties in Tez:
The Tez Hive query properties that appears under the Hive Queries are as follows: Query ID, User, DAG ID, Tables Read, Tables Written, App ID, Queue, and Execution Mode.
The following image shows the advanced Tez properties in Tez:
Tez properties appears under several different property categories, such as General, Advanced tez-env, and Advanced tez-site. The Advanced tez-site properties list values for the following properties: tez.am.am-rm.heartbeat.interval-ms.max, tez.am.container.idle.release-timeout-max.millis, tez.am.container.idle.release-timeout-min.millis, tez.am.container.reuse.enabled, tez.am.container.reuse.locality.delay-allocation-millis, tez.am.container.reuse.non-local-fallback.enabled, tez.am.container.reuse.rack-fallback enabled, tez.am.launch.cluster-default.cmd-opts, tez.am.max.app.attempts, and tez.am.maxtaskfailures.per.node.
The following image shows the advanced Tez properties related to DAG, vertex, and task counts:
The following Tez properties are listed: tez.session.am.dag.submit.timeout.secs, tez.session.client.timeout.secs, tez.shuffle-vertex-manager.max-src-fraction, tez.shuffle-vertex-manager.min-src-function, tez.staging-dir, tez.task.am.heartbeat.counter.interval-ms.max, tez.task.generate.counters.per.io, tez.task.get-task.sleep.interval-ms.max, tez.task.launch.cluster-default.cmd-opts, tez.task.max-events-per-heartbeat, tez.tez-ui.history-url.base, and tez.use.cluster.hadoop-libs.
The monitoring properties appear in the Hive session log under Mapping Status Report when enabled for verbose data or verbose initialization for Tez.
To get DAG tracking URL in the workflow log, you have to update the tez.tez-ui.history-url.base with the following value in the HDInsights cluster:
<host address>:<port>/#/main/view/TEZ/tez_cluster_instance.
For example, a complete DAG URL is as follows:
https://ivlhdp584.informatica.com:8443/#/main/view/TEZ/tez_cluster_instance?viewPath=%2F%23%2Fdag%2Fdag_1520917602092_9282_1