Blaze Engine Architecture

To run a mapping on the Informatica Blaze engine, the Data Integration Service submits jobs to the Blaze engine executor. The Blaze engine executor is a software component that enables communication between the Data Integration Service and the Blaze engine components on the Hadoop cluster.

The following Blaze engine components appear on the Hadoop cluster:

•Grid Manager. Manages tasks for batch processing.

•Orchestrator. Schedules and processes parallel data processing tasks on a cluster.

•Blaze Job Monitor. Monitors Blaze engine jobs on a cluster.

•DTM Process Manager. Manages the DTM Processes.

•DTM Processes. An operating system process started to run DTM instances.

•Data Exchange Framework. Shuffles data between different processes that process the data on cluster nodes.

The following image shows how a Hadoop cluster processes jobs sent from the Blaze engine executor:

This image shows the Blaze engine architecture diagram.

The following events occur when the Data Integration Service submits jobs to the Blaze engine executor:

1. The Blaze Engine Executor communicates with the Grid Manager to initialize Blaze engine components on the Hadoop cluster, and it queries the Grid Manager for an available Orchestrator.

2. The Grid Manager starts the Blaze Job Monitor.

3. The Grid Manager starts the Orchestrator and sends Orchestrator information back to the LDTM.

4. The LDTM communicates with the Orchestrator.

5. The Grid Manager communicates with the Resource Manager for available resources for the Orchestrator.

6. The Resource Manager handles resource allocation on the data nodes through the Node Manager.

7. The Orchestrator sends the tasks to the DTM Processes through the DTM Process Manger.

8. The DTM Process Manager continually communicates with the DTM Processes.

9. The DTM Processes continually communicate with the Data Exchange Framework to send and receive data across processing units that run on the cluster nodes.

Application Timeline Server

The Hadoop Application Timeline Server collects basic information about completed application processes. The Timeline Server also provides information about completed and running YARN applications.

The Grid Manager starts the Application Timeline Server in the Yarn configuration by default.

The Blaze engine uses the Application Timeline Server to store the Blaze Job Monitor status. On Hadoop distributions where the Timeline Server is not enabled by default, the Grid Manager attempts to start the Application Timeline Server process on the current node.

If you do not enable the Application Timeline Server on secured Kerberos clusters, the Grid Manager attempts to start the Application Timeline Server process in HTTP mode.

Manage Blaze Engines

The Blaze engine remains running after a mapping run. To save resources, you can set a property to stop Blaze engine infrastructure after a specified time period.

Save resources by shutting down Blaze engine infrastructure after a specified time period.

Set the infagrid.blaze.service.idle.timeout property or the infagrid.orchestrator.svc.sunset.time property. You can use the infacmd isp createConnection command, or set the property in the Blaze Advanced properties in the Hadoop connection in the Administrator tool or the Developer tool.

Configure the following Blaze advanced properties in the Hadoop connection: