Data Integration Service Grid Overview

When you enable a Data Integration Service assigned to a grid, a Data Integration Service process runs on each node in the grid that has the service role. If a service process shuts down unexpectedly, the Data Integration Service remains available as long as another service process runs on another node. Jobs can run on each node in the grid that has the compute role. The Data Integration Service balances the workload among the nodes based on the type of job and based on how the grid is configured.

Grid Configuration by Job Type

SQL data services and web services

When a Data Integration Service grid runs SQL queries and web service requests, configure the service to run jobs in the Data Integration Service process. All nodes in the grid must have both the service and compute roles. The Data Integration Service dispatches jobs to available nodes in a round-robin fashion.

SQL data service and web service jobs typically achieve better performance when the Data Integration Service runs jobs in the service process.

Mappings, profiles, and workflows that run in local mode

When a Data Integration Service grid runs mappings, profiles, and workflows, you can configure the service to run jobs in separate DTM processes on the local node. All nodes in the grid must have both the service and compute roles. The Data Integration Service dispatches jobs to available nodes in a round-robin fashion.

When the Data Integration Service runs jobs in separate local processes, stability increases because an unexpected interruption to one job does not affect all other jobs.

Mappings, profiles, and workflows that run in remote mode

When a Data Integration Service grid runs mappings, profiles, and workflows, you can configure the service to run jobs in separate DTM processes on remote nodes. The nodes in the grid can have a different combination of roles. The Data Integration Service designates one node with the compute role as the master compute node. The Service Manager on the master compute node communicates with the Resource Manager Service to dispatch jobs to an available worker compute node. The Resource Manager Service matches job requirements with resource availability to identify the best compute node to run the job.

When the Data Integration Service runs jobs in separate remote processes, stability increases because an unexpected interruption to one job does not affect all other jobs. In addition, you can better use the resources available on each node in the grid. When a node has the compute role only, the node does not have to run the service process. The machine uses all available processing power to run mappings.

Note: Ad hoc jobs, with the exception of profiles, can run in the Data Integration Service process or in separate DTM processes on the local node. Ad hoc jobs include mappings run from the Developer tool or previews, scorecards, or drill downs on profile results run from the Developer tool or Analyst tool. If you configure a Data Integration Service grid to run jobs in separate remote processes, the service runs ad hoc jobs in separate local processes.

If you run SQL queries or web service requests, and you run other job types in which stability and scalability is important, create multiple Data Integration Services. Configure one Data Integration Service grid to run SQL queries and web service requests in the Data Integration Service process. Configure the other Data Integration Service grid to run mappings, profiles, and workflows in separate local processes or in separate remote processes.