Managing application ingestion and replication jobs
After you configure and run application ingestion and replication jobs, you might occasionally need to perform some job management tasks such stopping, resuming, undeploying, or redeploying jobs.
Stopping an application ingestion and replication job
You can stop an application ingestion and replication job of any load type that is in the Up and Running, Running with Warning, or On Hold status.
When you stop an incremental load job, Application Ingestion and Replication records an identifier for the position in the change stream where it has stopped the incremental processing. The identifier is stored in a recovery table named INFORMATICA_CDC_RECOVERY on the target. If you restart the job, Application Ingestion and Replication uses this identifier to identify the last change record that was loaded to the target and starts loading the changes that were made after that point in the change stream.
Note: Before stopping an application ingestion and replication incremental load or combined load job that has a Snowflake target and uses the Superpipe option, ensure that all ingested data in the change stream has been merged into the target. Otherwise, if the job is not restarted for an extended period or before the stream expires, any data that remains in the stream is lost and can’t be recovered.
For initial load jobs, the job stops only after its subtasks that are already running complete their operation. The subtasks that are not running remain in their current states.
1Navigate to the row for the job that you want to stop in any of the following monitoring interfaces:
- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in Monitor
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights
You can also use the Stop icon beside the Actions menu to stop the job.
2In the Actions menu for the row, select Stop, or click the Stop icon next to the menu.
The job state switches to Stopping and then to Stopped.
Tip: If the Stop operation is taking too long, you can abort the job.
Resuming an application ingestion and replication job
You can resume an application ingestion and replication job that is in the Stopped, Aborted, or Failed status.
You can resume an application ingestion and replication job from the My Jobs page in the Data Ingestion and Replication service or from the All Jobs tab on the Data Ingestion and Replication page in Operational Insights.
When you resume an initial load job that has multiple subtasks, Application Ingestion and Replication starts only the subtasks that are in the Failed, Stopped, Aborted, or Queued status.
When you resume an incremental load job or a combined initial and incremental load job, Application Ingestion and Replication resumes replicating source data change from the last position recorded in the checkpoint file or target recovery table. A checkpoint will not be available unless a change record was processed for at least one of the tables during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.
When you resume an incremental load job, Application Ingestion and Replication resumes propagating source data from where it last left off.
1Navigate to the row for the job that you want to resume in any of the following monitoring interfaces:
- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights
2In the Actions menu for the row, click Resume, or click the Resume icon next to the menu.
Note: The Resume command is not available if the job is in the Failed state because the task deployment failed.
A subtask is started for each source table.
If an error occurs, an error message is displayed at the top of the page.
You can also use the Resume icon beside the Actions menu to resume the job.
Restart and recovery for incremental load jobs
Application Ingestion and Replication can restart the incremental load jobs that stopped because of an error and the jobs that were stopped or aborted by users without any loss of change data.
After the first job run, Application Ingestion and Replication continually records an identifier for the processing position in the change stream as changes are applied to the target. The identifier is stored in a recovery table named INFORMATICA_CDC_RECOVERY on the target.
When you resume an incremental load job, the job uses the last position recorded in the recovery table to identify the change records that it must load to the target. This process ensures that all changes are ingested to the target.
Overriding schema drift options when resuming an application ingestion and replication job
You can override the schema drift options when you resume an application ingestion and replication job that is in the Stopped, Aborted, or Failed state. The overrides affect only those objects that are currently in the Error state because of the Stop Object, Stop Table, Stop Report, or Stop Job schema drift option. Use the overrides to correct or resolve these errors.
You can override schema drift options and resume an incremental load job or a combined initial and incremental load job either from the My Jobs page in the Data Ingestion and Replication service or from the All Jobs tab on the Data Ingestion and Replication page in Operational Insights.
1Navigate to the row for the job that you want to resume with an override.
2In the Actions menu for the row, click Resume With Options.
Note: The Resume With Options command is not available if the job is in the Failed state because the task deployment failed.
The Resume Options dialog box appears.
3In the Schema Drift Options list, select the schema drift option that will be used to process the DDL operation on the source that caused the application ingestion and replication job to stop.
The following table describes the schema drift options:
Option
Description
Ignore
Do not replicate DDL changes that occur on the source to the target.
Stop Table
Stop processing the source object on which the DDL change occurred.
Important: The application ingestion and replication job cannot retrieve the data changes that occurred on the source object after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options > Resync option.
Resync
Resynchronize the target table with the source object. Use this option for objects that the job stopped processing because of the Stop Object, Stop Table, or Stop Report setting for a Schema Drift option.
Important: This option is available only for combined initial and incremental load jobs.
Replicate
Allow the application ingestion and replication job to replicate the DDL change to the target.
Important: If you specify the Replicate option for Rename Column operations on Microsoft Azure Synapse Analytics targets, the job will end with an error.
4Click Resume With Options.
The resumed job will use the schema drift option that you specified in step 3 to process the schema change that caused the job to stop. Thereafter, the schema drift options that you specified when creating the task take effect again.
Important: Application Ingestion and Replication processes a schema change to a source object only after a DML operation occurs on the object. Therefore, after you resume a job, the object subtask state remains unchanged until the first DML operation occurs on the object.
Redeploying an application ingestion and replication job
Some fields in application ingestion and replication tasks are editable even when you do not undeploy the associated application ingestion and replication jobs. If you edit any available field in an application ingestion and replication task without undeploying the associated job, you must redeploy the job so that the changes can take effect.
The redeploy operation stops each job subtask for a source table, deploys the updated ingestion task, and then starts the subtasks. The subtasks that are started includes the subtasks that were previously stopped and the subtasks that are newly created due to the configuration changes in the task.
Note: For incremental load jobs and combined initial and incremental load jobs, the redeployment does not change the source objects that were selected for ingestion during the previous deployment. To update the list of objects, you must edit the object selection rules in the associated task, and then redeploy the job.
1Navigate to the row for the job that you want to redeploy in any of the following monitoring interfaces:
- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in Monitor
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights
2In the Actions menu for the row, select Redeploy.
The job instance automatically starts running.
If the job was running when you selected Redeploy, Application Ingestion and Replication stops the job and then redeploys the ingestion task and restarts the job.
Undeploying an application ingestion and replication job
You can undeploy an application ingestion and replication job that has a status of Aborted, Completed, Deployed, Failed, or Stopped. You might want to undeploy a job if you no longer need to run the job or you need to change a connection or property in the associated task that cannot be edited without first undeploying the job.
Before you attempt to undeploy a job, ensure that it is not running.
After you undeploy a job, you cannot run it again or redeploy it. If you want to run a job that is undeployed, you must deploy the associated task again from the application ingestion and replication task wizard to create a new job instance. For example, if you want to change the target connection for a job, you must undeploy the job, edit the ingestion task to change the connection, deploy the task again, and then run the new job instance.
1Navigate to the row for the job that you want to undeploy in any of the following monitoring interfaces:
- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in Monitor
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights
2In the Actions menu for the row, click Undeploy.
If the undeploy operation fails, the job status switches to Failed, even if it was in the Aborted state, or remains as Failed.
Note: After undeploying jobs, do not immediately shut down the Secure Agent. Application Ingestion and Replication requires some time to clean up files for tasks in the /root/infaagent/apps/Database_Ingestion/data/tasks directory.
Aborting an application ingestion and replication job
You can abort an application ingestion and replication job of any load type that is in the Up and Running, Running with Warning, On Hold, or Stopping status.
You can abort an application ingestion and replication job from the My Jobs page in the Data Ingestion and Replication service or from the All Jobs tab on the Data Ingestion and Replication page in Operational Insights.
When you abort an incremental load job, Application Ingestion and Replication records an identifier for the position in the change stream where it has stopped the incremental processing. The identifier is stored in a recovery table named INFORMATICA_CDC_RECOVERY on the target. If you restart the job, Application Ingestion and Replication uses this identifier to identify the last change record that was loaded to the target and starts loading the changes that were made after that point in the change stream.
For initial load jobs, the subtasks that are already running stop immediately, and then the job stops. The subtasks that are not running remain in their current states.
On the Actions menu for the job that you want to abort, select Abort.
The status of the job changes to Aborting and then changes to Aborted.
For initial load jobs, the status of the subtasks that were running change to Aborted. For incremental load jobs, the status of the subtasks change to Stopped.
Resynchronizing source and target objects
You can resynchronize source and target objects for a subtask that is part of a running application ingestion and replication job of combined initial and incremental load type. The subtask must be in a state other than Queued or Starting.
Resynchronization loads the latest changes from the source object to the target to make sure the source and target are consistent. Usually, the target object contents is truncated before the current source data is applied. However, for data lake targets, a T (truncate) operation is replicated instead of actually truncating the target contents.
For example, you might want to resynchronize the target with the source if initial load or incremental load processing failed or if you want to start the job over again from a specific restart point.
1Drill down on the application ingestion and replication job that you want to resynchronize from one of the following monitoring interfaces:
- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in Monitor
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights
The job must be in the Up and Running state and be for a combined initial and incremental load operation.
2Click the Object Detail tab.
3In the subtask row for the source and target objects that you want to resynchronize, click the Actions menu and select Resync. The resync operation resynchronizes the target table with the latest source table definition, including any DDL changes.
Note: For the Actions menu and Resync option to be available, the subtask must be in a state other than Queued or Starting.
If you resynchronize a subtask for an application ingestion and replication combined load job that has a NetSuite, Salesforce, or SAP Mass Ingestion source, use one of the following resync options instead of the Resync option:
- Resync (refresh). Use this option to resynchronize the target table with the latest source object definition, including any DDL change that schema drift ignored. After the target table is refreshed, the target table structure matches the current source object structure. This option mimics the behavior of the Resync option.
- Resync (retain). Use this option to resynchronize the same fields that have been processed for CDC, retaining the current structure of the source and target tables. No checks for changes in the source or target table definitions are performed. If source DDL changes affected the source object structure, those changes are not processed.
Notes:
•If the source table contains many rows, the resynchronization might take a long time to perform.
•If the source table schema does not match the target table schema, the ingestion subtask drops the target table and creates a new table that matches the source schema. Regardless of whether the target tables are re-created, the subtask truncates the target tables and then reloads source data to the tables.
•When you resync an application ingestion and replication subtask with a Snowflake target and use Audit apply mode, you can retain the audit information. Data Ingestion and Replication re-creates the target table and renames the existing table that contains the audit information using an appended timestamp in the format <target_table_name>_<current_UTC_timestamp>. If you want the audit information in the actual target table, you need to load it, for example, with a join operation. If adding the timestamp to the existing table name causes the name to exceed the maximum number of characters, the subtask fails with an error. If you enable schema drift and a schema drift change, such as Add Column, occurs, the new column will be in the re-created target table but not in the renamed table. To enable this behavior, set the backupTargetTableBeforeResync custom property to true on the Target page of the task wizard.
Consider the following limitations when you resync a combined load job that has an existing audit table:
- Storing the existing audit table on the target consumes extra database storage.
- To obtain a unified view of audit information, you need to join the multiple versions of the audit tables.