High Availability for the Data Integration Service
High availability for the Data Integration Service minimizes interruptions to data integration tasks. High availability enables the Service Manager and the Data Integration Service to react to network failures and failures of the Data Integration Service.
The Data Integration Service has the following high availability features that are available based on your license:
- Restart and Failover
- When a Data Integration Service process becomes unavailable, the Service Manager tries to restart the process or fails the process over to another node based on the service configuration.
- Recovery
- When a Data Integration Service process shuts down unexpectedly, the Data Integration Service can automatically recover canceled workflow instances.
For information about configuring a highly available domain, see the Informatica Administrator Guide.
Data Integration Service Restart and Failover
When a Data Integration Service process becomes unavailable, the Service Manager restarts the Data Integration Service process on the same node or on a backup node.
The restart and failover behavior depends on the following ways that you can configure the Data Integration Service:
- Single node
- When the Data Integration Service runs on a single node and the service process shuts down unexpectedly, the Service Manager tries to restart the service process. If the Service Manager cannot restart the process, the process stops or fails.
- Primary and backup nodes
When the Data Integration Service runs on primary and backup nodes and the service process shuts down unexpectedly, the Service Manager tries to restart the service process. If the Service Manager cannot restart the process, the Service Manager fails the service process over to a backup node.
A Data Integration Service process fails over to a backup node in the following situations:
- - The Data Integration Service process fails and the primary node is not available.
- - The Data Integration Service process is running on a node that fails.
- Grid
When the Data Integration Service runs on a grid, the restart and failover behavior depends on whether the master or worker service process becomes unavailable.
If the master service process shuts down unexpectedly, the Service Manager tries to restart the process. If the Service Manager cannot restart the process, the Service Manager elects another node to run the master service process. The remaining worker service processes register themselves with the new master. The master service process then reconfigures the grid to run on one less node.
If a worker service process shuts down unexpectedly, the Service Manager tries to restart the process. If the Service Manager cannot restart the process, the master service process reconfigures the grid to run on one less node.
The Service Manager restarts the Data Integration Service process based on domain property values set for the amount of time spent trying to restart the service and the maximum number of attempts to try within the restart period.
The Data Integration Service clients are resilient to temporary connection failures during restart and failover of the service.
Data Integration Service Failover Configuration
When you configure the Data Integration Service to run on multiple nodes, verify that each node has access to the source and output files that the Data Integration Service requires to process data integration tasks such as workflows and mappings. For example, a workflow might require parameter files, input files, or output files.
To access logs for completed data integration tasks after a failover occurs, configure a shared directory for the Data Integration Service process Logging Directory property.
Data Integration Service Recovery
The Data Integration Service can recover some workflows that are enabled for recovery. Workflow recovery is the completion of a workflow instance from the point of interruption.
A running workflow instance can be interrupted when an error occurs, when you cancel the workflow instance, when you restart a Data Integration Service, or when a Data Integration Service process shuts down unexpectedly. If you abort the workflow instance, the instance is not recoverable.
The Data Integration Service performs workflow recovery based on the state of the workflow tasks, the values of the workflow variables and parameters during the interrupted workflow instance, and whether the recovery is manual or automatic.
Based on your license, you can configure automatic recovery of workflow instances. If you enable a workflow for automatic recovery, the Data Integration Service automatically recovers the workflow when the Data Integration Service restarts.
If the Data Integration Service runs on a grid and the master service process fails over, all nodes retrieve object state information from the Model repository. The new master automatically recovers workflow instances that were running during the failover and that are configured for automatic recovery.
The Data Integration Service does not automatically recover workflows that are not configured for automatic recovery. You can manually recover these workflows if they are enabled for recovery.
Any SQL data service, web service, mapping, profile, and preview jobs that were running during the failover are not recovered. You must manually restart these jobs.