Grid for SQL Data Services and Web Services
When a Data Integration Service grid runs SQL queries and web service requests, configure the service to run jobs in the Data Integration Service process. All nodes in the grid must have both the service and compute roles.
When you enable a Data Integration Service that runs on a grid, one service process starts on each node with the service role in the grid. The Data Integration Service designates one service process as the master service process, and designates the remaining service processes as worker service processes. When a worker service process starts, it registers itself with the master service process so that the master is aware of the worker.
The master service process manages application deployments and logging. The worker service processes run the SQL data service, web service, and preview jobs. The master service process also acts as a worker service process and completes jobs.
The Data Integration Service balances the workload across the nodes in the grid based on the following job types:
- SQL data services
- When you connect to an SQL data service from a third-party client tool to run queries against the service, the Data Integration Service dispatches the connection directly to a worker service process. To ensure faster throughput, the Data Integration Service bypasses the master service process. When you establish multiple connections to SQL data services, the Data Integration Service uses round robin to dispatch each connection to a worker service process. When you run multiple queries against the SQL data service using the same connection, each query runs on the same worker service process.
- Web services
When you submit a web service request, the Data Integration Service uses an external HTTP load balancer to distribute the request to a worker service process. When you submit multiple requests against web services, the Data Integration Service uses round robin to dispatch each query to a worker service process.
To run web service requests on a grid, you must configure the external HTTP load balancer. Specify the logical URL for the load balancer in the web service properties of the Data Integration Service. When you configure the external load balancer, enter the URLs for all nodes in the grid that have both the service and compute roles. If you do not configure an external HTTP load balancer, web service requests are not distributed across the nodes in the grid. Each web service request runs on the node that receives the request from the web service client.
- Previews
- When you preview a stored procedure output or virtual table data, the Data Integration Service uses round robin to dispatch the first preview query directly to a worker service process. To ensure faster throughput, the Data Integration Service bypasses the master service process. When you preview additional objects from the same login, the Data Integration Service dispatches the preview queries to the same worker service process.
Note: You can run mappings, profiles, and workflows on a Data Integration Service grid that is configured to run jobs in the Data Integration Service process. However, you optimize stability for these job types when the Data Integration Service grid is configured to run jobs in separate DTM processes.
Example Grid that Runs Jobs in the Service Process
In this example, the grid contains three nodes. All nodes have both the service and compute roles. The Data Integration Service is configured to run jobs in the service process.
The following image shows an example Data Integration Service grid configured to run SQL data service, web service, and preview jobs in the Data Integration Service process:
The Data Integration Service manages requests and runs jobs on the following nodes in the grid:
- •On Node1, the master service process manages application deployment and logging. The master service process also acts as a worker service process and completes jobs. The Data Integration Service dispatches a preview request directly to the service process on Node1. The service process creates a DTM instance to run the preview job. SQL data service and web service jobs can also run on Node1.
- •On Node2, the Data Integration Service dispatches SQL queries and web service requests directly to the worker service process. The worker service process creates a separate DTM instance to run each job and complete the request. Preview jobs can also run on Node2.
- •On Node3, the Data Integration Service dispatches two preview requests from a different user login than the preview1 request directly to the worker service process. The worker service process creates a separate DTM instance to run each preview job. SQL data service and web service jobs can also run on Node3.
Rules and Guidelines for Grids that Run Jobs in the Service Process
Consider the following rules and guidelines when you configure a Data Integration Service grid to run SQL data service, web service, and preview jobs in the Data Integration Service process:
- •If the grid contains nodes with the compute role only, the Data Integration Service cannot start.
- •If the grid contains nodes with the service role only, jobs that are dispatched to the service process on the node fail to run.
- •Configure environment variables for the Data Integration Service processes on the Processes view for the service. The Data Integration Service ignores any environment variables configured on the Compute view.
Configuring a Grid that Runs Jobs in the Service Process
When a Data Integration Service grid runs SQL queries against an SQL data service or runs web service requests, configure the Data Integration Service to run jobs in the service process.
To configure a Data Integration Service grid to run SQL queries and web service requests, perform the following tasks:
- 1. Create a grid for SQL data service and web service jobs.
- 2. Assign the Data Integration Service to the grid.
- 3. Configure the Data Integration Service to run jobs in the service process.
- 4. Configure load balancing for web services.
- 5. Configure a shared log directory.
- 6. Optionally, configure properties for each Data Integration Service process that runs on a node in the grid.
- 7. Optionally, configure compute properties for each DTM instance that can run on a node in the grid.
- 8. Recycle the Data Integration Service.
Step 1. Create a Grid
To create a grid, create the grid object and assign nodes to the grid. You can assign a node to more than one grid when the Data Integration Service is configured to run jobs in the service process or in separate local processes.
When a Data Integration Service grid runs SQL queries or web service requests, all nodes in the grid must have both the service and compute roles. When you assign nodes to the grid, select nodes that have both roles.
1. In the Administrator tool, click the Manage tab.
2. Click the Services and Nodes view.
3. In the Domain Navigator, select the domain.
4. On the Navigator Actions menu, click New > Grid.
The Create Grid dialog box appears.
5. Enter the following properties:
Property | Description |
---|
Name | Name of the grid. The name is not case sensitive and must be unique within the domain. It cannot exceed 128 characters or begin with @. It also cannot contain spaces or the following special characters: ` ~ % ^ * + = { } \ ; : ' " / ? . , < > | ! ( ) ] [ |
Description | Description of the grid. The description cannot exceed 765 characters. |
Nodes | Select nodes to assign to the grid. |
Path | Location in the Navigator, such as: DomainName/ProductionGrids |
6. Click OK.
Step 2. Assign the Data Integration Service to the Grid
Assign the Data Integration Service to run on the grid.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Properties tab.
3. In the General Properties section, click Edit.
The Edit General Properties dialog box appears.
4. Next to Assign, select Grid.
5. Select the grid to assign to the Data Integration Service.
6. Click OK.
Step 3. Run Jobs in the Service Process
Configure the Data Integration Service to run jobs in the service process.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Properties tab.
3. In the Execution Options section, click Edit.
The Edit Execution Options dialog box appears.
4. For the Launch Job Options property, select In the service process.
5. Click OK.
Step 4. Configure Load Balancing for Web Services
To run web service requests on a grid, you must configure an external HTTP load balancer. If you do not configure an external HTTP load balancer, the Data Integration Service runs the web service on the node that receives the request.
To configure load balancing, specify the logical URL for the load balancer in the Data Integration Service properties. Then, configure the external load balancer to distribute web service requests to all nodes in the grid that have both the service and compute roles.
1. Complete the following steps in the Administrator tool to configure the Data Integration Service to communicate with the external HTTP load balancer:
- a. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
- b. Select the Properties tab.
- c. In the Web Service Properties section, click Edit.
The Edit Web Service Properties window appears.
- d. Enter the logical URL for the external HTTP load balancer, and then click OK.
2. Configure the external load balancer to distribute requests to all nodes in the grid that have both the service and compute roles.
Step 5. Configure a Shared Log Directory
When the Data Integration Service runs on a grid, a Data Integration Service process can run on each node with the service role. Configure each service process to use the same shared directory for log files. When you configure a shared log directory, you ensure that if the master service process fails over to another node, the new master service process can access previous log files.
1. On the Services and Nodes view, select the Data Integration Service in the Domain Navigator.
2. Select the Processes tab.
3. Select a node to configure the shared log directory for that node.
4. In the Logging Options section, click Edit.
The Edit Logging Options dialog box appears.
5. Enter the location to the shared log directory.
6. Click OK.
7. Repeat the steps for each node listed in the Processes tab to configure each service process with identical absolute paths to the shared directories.
Step 6. Optionally Configure Process Properties
Optionally, configure the Data Integration Service process properties for each node with the service role in the grid. You can configure the service process properties differently for each node.
To configure properties for the Data Integration Service processes, click the Processes view. Select a node with the service role to configure properties specific to that node.
Step 7. Optionally Configure Compute Properties
You can configure the compute properties that the execution Data Transformation Manager (DTM) uses when it runs jobs. When the Data Integration Service runs on a grid, DTM processes run jobs on each node with the compute role. You can configure the compute properties differently for each node.
To configure compute properties for the DTM, click the Compute view. Select a node with the compute role to configure properties specific to DTM instances that run on the node. For example, you can configure a different temporary directory for each node.
When a Data Integration Service grid runs jobs in the Data Integration Service process, you can configure the execution options on the Compute view. If you configure environment variables on the Compute view, they are ignored.
Step 8. Recycle the Data Integration Service
After you change Data Integration Service properties, you must recycle the service for the changed properties to take effect.
To recycle the service, select the service in the Domain Navigator and click Recycle the Service.