Grid for Jobs that Run in Local Mode
Configure the Data Integration Service to run jobs in separate DTM processes on the local node to increase stability. Use this configuration when the Data Integration Service grid runs mappings, profiles, and workflows. All nodes in the grid must have both the service and compute roles.
When you enable a Data Integration Service that runs on a grid, one service process starts on each node with the service role in the grid. The Data Integration Service designates one service process as the master service process, and designates the remaining service processes as worker service processes. When a worker service process starts, it registers itself with the master service process so that the master is aware of the worker.
The master service process manages application deployments, logging, job requests, and the dispatch of mappings to worker service processes. The worker service processes optimize and compile mapping and preview jobs. The worker service processes create separate DTM processes to run jobs. The master service process also acts as a worker service process and runs jobs.
The Data Integration Service balances the workload across the nodes in the grid based on the following job types:
- Workflows
- When you run a workflow instance, the master service process runs the workflow instance and non-mapping tasks. The master service process uses round robin to dispatch each mapping within a Mapping task to a worker service process. The worker service process optimizes and compiles the mapping. The worker service process then creates a DTM instance within a separate DTM process to run the mapping.
- Deployed mappings
- When you run a deployed mapping, the master service process uses round robin to dispatch each mapping to a worker service process. The worker service process optimizes and compiles the mapping. The worker service process then creates a DTM instance within a separate DTM process to run the mapping.
- Profiles
- When you run a profile, the master service process converts the profiling job into multiple mapping jobs based on the advanced profiling properties of the Data Integration Service. The master service process then uses round robin to dispatch the mappings across the worker service processes. The worker service process optimizes and compiles the mapping. The worker service process then creates a DTM instance within a separate DTM process to run the mapping.
- Ad hoc jobs, with the exception of profiles
- When you run ad hoc jobs, with the exception of profiles, the Data Integration Service uses round robin to dispatch the first request directly to a worker service process. Ad hoc jobs include mappings run from the Developer tool or previews, scorecards, or drill downs on profile results run from the Developer tool or Analyst tool. To ensure faster throughput, the Data Integration Service bypasses the master service process. The worker service process creates a DTM instance within a separate DTM process to run the job. When you run additional ad hoc jobs from the same login, the Data Integration Service dispatches the requests to the same worker service process.
Note: Informatica does not recommend running SQL queries or web service requests on a Data Integration Service grid that is configured to run jobs in separate local processes. SQL data service and web service jobs typically achieve better performance when the Data Integration Service runs jobs in the service process. For web service requests, you must configure the external HTTP load balancer to distribute requests to nodes that have both the service and compute roles.
Example Grid that Runs Jobs in Local Mode
In this example, the grid contains three nodes. All nodes have both the service and compute roles. The Data Integration Service is configured to run jobs in separate local processes.
The following image shows an example Data Integration Service grid configured to run mapping, profile, workflow, and ad hoc jobs in separate local processes:
The Data Integration Service manages requests and runs jobs on the following nodes in the grid:
- •On Node1, the master service process runs the workflow instance and non-mapping tasks. The master service process dispatches mappings included in Mapping tasks from workflow1 to the worker service processes on Node2 and Node3. The master service process also acts as a worker service process and completes jobs. The Data Integration Service dispatches a preview request directly to the service process on Node1. The service process creates a DTM instance within a separate DTM process to run the preview job. Mapping and profile jobs can also run on Node1.
- •On Node2, the worker service process creates a DTM instance within a separate DTM process to run mapping1 from workflow1. Ad hoc jobs can also run on Node2.
- •On Node3, the worker service process creates a DTM instance within a separate DTM process to run mapping2 from workflow1. Ad hoc jobs can also run on Node3.
Rules and Guidelines for Grids that Run Jobs in Local Mode
Consider the following rules and guidelines when you configure a Data Integration Service grid to run jobs in separate local processes:
- •If the grid contains nodes with the compute role only, the Data Integration Service cannot start.
- •If the grid contains nodes with the service role only, jobs that are dispatched to the service process on the node fail to run.
- •Configure environment variables for the Data Integration Service processes on the Processes view for the service. The Data Integration Service ignores any environment variables configured on the Compute view.
Configuring a Grid that Runs Jobs in Local Mode
When a Data Integration Service grid runs mappings, profiles, and workflows, you can configure the Data Integration Service to run jobs in separate DTM processes on local nodes.
To configure a Data Integration Service grid to run mappings, profiles, and workflows in separate local processes, perform the following tasks:
- 1. Create a grid for mappings, profiles, and workflows that run in separate local processes.
- 2. Assign the Data Integration Service to the grid.
- 3. Configure the Data Integration Service to run jobs in separate local processes.
- 4. Configure a shared log directory.
- 5. Optionally, configure properties for each Data Integration Service process that runs on a node in the grid.
- 6. Optionally, configure compute properties for each DTM instance that can run on a node in the grid.
- 7. Recycle the Data Integration Service.
Step 1. Create a Grid
To create a grid, create the grid object and assign nodes to the grid. You can assign a node to more than one grid when the Data Integration Service is configured to run jobs in the service process or in separate local processes.
When a Data Integration Service grid runs mappings, profiles, and workflows in separate local processes, all nodes in the grid must have both the service and compute roles. When you assign nodes to the grid, select nodes that have both roles.
1. In the Administrator tool, click the Manage tab.
2. Click the Services and Nodes view.
3. In the Domain Navigator, select the domain.
4. On the Navigator Actions menu, click New > Grid.
The Create Grid dialog box appears.
5. Enter the following properties:
Property | Description |
---|
Name | Name of the grid. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [ |
Description | Description of the grid. The description cannot exceed 765 characters. |
Nodes | Select nodes to assign to the grid. |
Path | Location in the Navigator, such as: DomainName/ProductionGrids |
6. Click OK.
Step 2. Assign the Data Integration Service to the Grid
Assign the Data Integration Service to run on the grid.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Properties tab.
3. In the General Properties section, click Edit.
The Edit General Properties dialog box appears.
4. Next to Assign, select Grid.
5. Select the grid to assign to the Data Integration Service.
6. Click OK.
Step 3. Run Jobs in Separate Local Processes
Configure the Data Integration Service to run jobs in separate local processes.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Properties tab.
3. In the Execution Options section, click Edit.
The Edit Execution Options dialog box appears.
4. For the Launch Job Options property, select In separate local processes.
5. Click OK.
Step 4. Configure a Shared Log Directory
When the Data Integration Service runs on a grid, a Data Integration Service process can run on each node with the service role. Configure each service process to use the same shared directory for log files. When you configure a shared log directory, you ensure that if the master service process fails over to another node, the new master service process can access previous log files.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Processes tab.
3. Select a node to configure the shared log directory for that node.
4. In the Logging Options section, click Edit.
The Edit Logging Options dialog box appears.
5. Enter the location to the shared log directory.
6. Click OK.
7. Repeat the steps for each node listed in the Processes tab to configure each service process with identical absolute paths to the shared directories.
Step 5. Optionally Configure Process Properties
Optionally, configure the Data Integration Service process properties for each node with the service role in the grid. You can configure the service process properties differently for each node.
To configure properties for the Data Integration Service processes, click the Processes view. Select a node with the service role to configure properties specific to that node.
Step 6. Optionally Configure Compute Properties
You can configure the compute properties that the execution Data Transformation Manager (DTM) uses when it runs jobs. When the Data Integration Service runs on a grid, DTM processes run jobs on each node with the compute role. You can configure the compute properties differently for each node.
To configure compute properties for the DTM, click the Compute view. Select a node with the compute role to configure properties specific to DTM instances that run on the node. For example, you can configure a different temporary directory for each node.
When a Data Integration Service grid runs jobs in separate local processes, you can configure the execution options on the Compute view. If you configure environment variables on the Compute view, they are ignored.
Step 7. Recycle the Data Integration Service
After you change Data Integration Service properties, you must recycle the service for the changed properties to take effect.
To recycle the service, select the service in the Domain Navigator and click Recycle the Service.