Database Ingestion and Replication > Database Ingestion and Replication > Managing database ingestion and replication jobs

Managing database ingestion and replication jobs

After you configure and run database ingestion and replication tasks, you might occasionally need to perform some job management tasks such stopping, resuming, undeploying, or redeploying jobs.

Stopping a database ingestion and replication job

You can stop a database ingestion and replication job of any load type that is in the Up and Running, Running with Warning, or On Hold state.

For an incremental load job, the job stops after a checkpoint is taken. A checkpoint records the point in the change stream where incremental processing left off for recovery purposes.

For a combined initial and incremental load job, initial load subtasks that are running are allowed to run to completion and Initial load subtasks that are not running remain in their current states. For the incremental load portion of the job, a checkpoint is written to the checkpoint file or target recovery table before the job stops. The database ingestion and replication job will not be able to record a checkpoint unless a change record has been processed for at least one of the tables in the job during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.

For an initial load job, any running subtasks are allowed to run to completion and then the job stops. Non-running subtasks remain in their current states.

Note: Before stopping a database ingestion and replication incremental load or combined load job that has a Snowflake target and uses the Superpipe option, ensure that all ingested data in the change stream has been merged into the target. Otherwise, if the job is not restarted for an extended period or before the stream expires, any data that remains in the stream is lost and can’t be recovered.

1Navigate to the row for the job that you want to stop in any of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

2In the Actions menu for the row, select Stop, or click the Stop icon next to the menu.

The job state switches to Stopping and then to Stopped.

Tip: If the Stop operation is taking too long, you can abort the job.

Aborting a database ingestion and replication job

You can abort a database ingestion and replication job that is in the Up and Running, On Hold, Running with Warning, or Stopping state.

For an incremental load job, the job stops immediately after a checkpoint is taken. A checkpoint records the point in the change stream where incremental processing left off for recovery purposes.

For a combined initial and incremental load job, any running initial load subtasks stop immediately. For the incremental load portion of the job, a checkpoint is taken and then the job stops.

For an initial load job, any running subtasks stop immediately and then the job stops. Non-running subtasks remain in their current states.

1Navigate to the row for the job that you want to abort in any of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

2From the Actions menu for the job, select Abort.

The job state switches to Aborting and then to Aborted.

For initial load jobs, the state of started and running subtasks switches to Aborted. For incremental load or combined initial and incremental load jobs, the state of subtasks switches to Stopped.

Resuming a database ingestion and replication job

You can resume a database ingestion and replication job that is in the Stopped, Aborted, or Failed state.

When you resume an initial load job that has multiple subtasks, Database Ingestion and Replication starts only the subtasks that are in a Failed, Stopped, Aborted, or Queued state.

When you resume an incremental load job or a combined initial and incremental load job, Database Ingestion and Replication resumes replicating source data change from the last position recorded in the checkpoint file or target recovery table. A checkpoint will not be available unless a change record was processed for at least one of the tables during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.

Note: For initial load jobs, the Run command might also be available. Click Run if you want the database ingestion and replication job to truncate all of the target tables and then reload the source data to the target tables.

1Navigate to the row for the job that you want to resume in any of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

2In the Actions menu for the row, click Resume, or click the Resume icon next to the menu.

Note: The Resume command is not available if the job is in the Failed state because the task deployment failed.

A subtask is started for each source table.

If an error occurs, an error message is displayed at the top of the page.

Overriding schema drift options when resuming a database ingestion and replication job

You can override the schema drift options when you resume a database ingestion and replication job that is in the Stopped, Aborted, or Failed state. The overrides affect only those tables that are currently in the Error state because of the Stop Table or Stop Job Schema Drift option. Use the overrides to correct or resolve these errors.

You can override schema drift options and resume an incremental load job or a combined initial and incremental load job from the All Jobs tab on the Data Ingestion and Replication page in Operational Insights.

1Navigate to the row for the job that you want to resume with an override.

2Click the Actions menu for the row and select Resume With Options.

Note: The Resume With Options command is not available if the job is in the Failed state because the task deployment failed.

The Resume Options dialog box appears.

3In the Schema Drift Options list, select the schema drift option that will be used to process the DDL operation on the source that caused the database ingestion and replication job to stop.

The following table describes the schema drift options:

Option	Description
Ignore	Do not replicate DDL changes that occur on the source database to the target.
Stop Table	Stop processing the source table on which the DDL change occurred. Important: The database ingestion and replication job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options > Resync option.
Resync	Resynchronize the target tables with the latest source table definitions, including any DDL changes that the schema drift ignored. Use this option for tables that the job stopped processing because of the Stop Table setting for a Schema Drift option. Important: This option is available only for combined initial and incremental load jobs.
Resync (refresh)	For database ingestion and replication combined load jobs that have an Oracle or SQL Server source, use this option to resynchronize the target tables with the latest source table definitions, including any DDL changes that schema drift ignored. After the target tables are refreshed, the structure of source and target tables match. This option mimics the behavior of the Resync option.
Resync (retain)	For database ingestion and replication combined load jobs that have an Oracle or SQL Server source, use this option to resynchronize the same columns that have been processed for CDC, retaining the current structure of the source and target tables. No checks for changes to the source or target table definitions are performed. If source DDL changes affected the source table structure, those changes are not processed.
Replicate	Allow the database ingestion and replication job to replicate the DDL change to the target. Important: If you specify the Replicate option for Rename Column operations on Microsoft Azure Synapse Analytics targets, the job will end with an error.

4Click Resume With Options.

The resumed job will use the schema drift option that you specified in step 3 to process the schema change that caused the job to stop. Thereafter, the schema drift options that you specified when creating the task take effect again.

Important: Database Ingestion and Replication processes a schema change to a source table only after a DML operation occurs on the table. Therefore, after you resume a job, the table subtask state remains unchanged until the first DML operation occurs on the table.

Redeploying a database ingestion and replication job

Redeploy a database ingestion and replication job after editing available fields in the associated database ingestion and replication task so that the new settings can take effect.

You can edit some but not all of the fields in an ingestion task definition that has been previously deployed, without first undeploying the job. You can add a source table and change any of the runtime and target options that are available for editing. For example, you might want to reset some target options to test the effects of different settings.

The redeploy operation stops each job subtask for a source table, deploys the updated database ingestion and replication task, and automatically starts the subtasks that were stopped and any new subtasks for added source tables.

1Navigate to the row for the job that you want to redeploy in any of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

2In the Actions menu for the row, select Redeploy.

The job instance automatically starts running.

If the job was running when you selected Redeploy, Database Ingestion and Replication stops the job and then redeploys the database ingestion and replication task and restarts the job.

Notes:

•For incremental load jobs and combined initial and incremental load jobs, the redeploy operation does not change the list of selected tables that was created during a previous deployment. To update the list of tables, edit the table selection rules in the associated task and then redeploy the job. You must make an update to the table selection rules, even if you added a table that matches the existing table selection rules.
•For a database ingestion and replication combined initial and incremental load job, if you select additional columns for a source table for which you've previously selected a subset of columns and then redeploy the job, the job triggers a resync operation to get the data for the added columns and write it to the target. If you deselect columns for a table, the resync operation is not triggered. Instead, the deselected columns are processed based on the schema drift Drop Column setting. For an initial load job, the job writes the content for added columns to the target the next time you run the job.
•If you redeploy an incremental load job after making some column DDL changes to tables for which you've previously selected a subset of columns, the job attempts to process the changes based on the schema drift options that you set on the Schedule and Runtime Options page of the task wizard. However, if the column selection and DDL changes are made at the same time, the results might be incorrect. If you redeploy a combined initial and incremental load task in the same situation, a resync operation is automatically triggered to make the source and target consistent.
•For jobs with Microsoft Azure Synapse Analytics or Snowflake Cloud Data Warehouse targets, the redeploy operation also validates that the target tables exist and creates new ones if table selection rules have changed.

Undeploying a database ingestion and replication job

You can undeploy a database ingestion and replication job that has a status of Aborted, Completed, Deployed, Failed, or Stopped. You might want to undeploy a job if you no longer need to run the job or you need to change a connection or property in the associated task that cannot be edited without first undeploying the job.

Before you attempt to undeploy a job, ensure that it is not running.

After the job is undeployed, you cannot run it again or redeploy it. If you want to run a job for the associated database ingestion and replication task again, you must deploy the task again from the task wizard to create a new job instance. For example, if you want to change the target connection, undeploy the job, edit the task to change the connection, deploy the task again, and then run the new job instance.

1Navigate to the row for the job that you want to undeploy in any of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs and Running Jobs pages in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

2In the Actions menu for the row, click Undeploy.

If the undeploy operation fails, the job status switches to Failed, even if it was in the Aborted state, or remains as Failed.

Note: After undeploying jobs, do not immediately shut down the Secure Agent. Database Ingestion and Replication requires some time to clean up files for tasks in the /root/infaagent/apps/Database_Ingestion/data/tasks directory.

Resynchronizing source and target objects

You can resynchronize source and target objects for a subtask that is part of a running database ingestion and replication combined initial and incremental load job. The subtask must be in a state other than Queued or Starting.

Resynchronization loads the latest changes from the source table to the target to make sure the source and target are consistent. Usually, the target table contents is truncated before the current source data is applied. However, for data lake targets, a T (truncate) operation is replicated instead of actually truncating the target contents.

For example, you might want to resynchronize the target with the source if initial load or incremental load processing failed or if you want to start the job over again from a specific restart point.

Important: To resynchronize tables that stopped and are currently in the Error state because of the Schema Drift setting of Stop Table, you must use the Resume With Options > Resync option in the Actions menu. For more information, see Overriding schema drift options when resuming a database ingestion and replication job.

1Drill down on the database ingestion and replication job that you want to resynchronize from one of the following monitoring interfaces:

- My Jobs page that's accessed from the navigation bar of the Home page
- All Jobs page in the Monitor service
- All Jobs tab on the Data Ingestion and Replication page in Operational Insights

The job must be in the Up and Running state and be for a combined initial and incremental load operation.

2Click the Object Detail tab.

3In the subtask row for the source and target objects that you want to resynchronize, click the Actions menu and select Resync. The resync operation resynchronizes the target table with the latest source table definition, including any DDL changes.

Note: For the Actions menu and Resync option to be available, the subtask must be in a state other than Queued or Starting.

If you are resynchronizing a subtask for a database ingestion and replication combined load job that has a Db2 for i, Oracle or SQL Server source, use one of the following resync options instead of the Resync option:

- Resync (refresh) - Use this option to resynchronize the target table with the latest source table definition, including any DDL changes that schema drift ignored. After the target table is refreshed, the target table structure matches the current source table structure. This option mimics the behavior of the Resync option.
- Resync (retain) - Use this option to resynchronize the same columns that have been processed for CDC, retaining the current structure of the source and target tables. No checks for changes in the source or target table definitions are performed. If source DDL changes affected the source table structure, those changes are not processed.

Notes:

•If the source table contains many rows, the resynchronization might take a long time to perform.
•If the source table schema does not match the target table schema, the ingestion subtask drops the target table and creates a new table that matches the source schema. Regardless of whether the target tables are re-created, the subtask truncates the target tables and then reloads source data to the tables.
•When you resync a database ingestion and replication subtask with a Snowflake target and use Audit apply mode, you can retain the audit information. Data Ingestion and Replication re-creates the target table and renames the existing table that contains the audit information using an appended timestamp in the format <target_table_name>_<current_UTC_timestamp>. If you want the audit information in the new target table, you need to load it, for example, with a join operation. If adding the timestamp to the existing table name causes the name to exceed the maximum number of characters, the subtask fails with an error. If you enable schema drift and a schema drift change, such as Add Column, occurs, the new column will be in the re-created target table but not in the renamed table. To enable this behavior, set the backupTargetTableBeforeResync custom property to true on the Target page of the task wizard.

Consider the following limitations when you resync a combined load job that has existing audit information:

- Storing the existing table with the audit information on the target consumes extra database storage.
- To obtain a unified view of audit information, you need to join the multiple versions of the target tables.

Restart and recovery for incremental change data processing

Database Ingestion and Replication can restart incremental load and combined initial and incremental load jobs that stopped because of an error or a user stop request without losing change data.

After the first job run, Database Ingestion and Replication continually records an identifier for the processing position in the change stream as changes are applied to the target. For file-based targets such as Amazon S3, Azure Data Lake Storage, Google Cloud Storage, Kafka, and Oracle Cloud Object Storage, the identifier is stored in a checkpoint file. For database targets, the identifier is stored in a generated recovery table, called INFORMATICA_CDC_RECOVERY, on the target.

Note: For the first run of an incremental load job, Database Ingestion and Replication uses the start point that you set in the Initial Start Point for Incremental Load field when defining the database ingestion and replication task.

If incremental change data processing ends abnormally or in response to a user stop or abort request and you then resume the job, the job resumes from the last position saved to the checkpoint file or recovery table. A checkpoint will not be available unless a change record was processed for at least one of the tables during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.

Running data validation for a database ingestion and replication jobs

For initial load jobs that completed successfully, you can run data validation to compare the source and target data. Data validation is available only for initial load jobs that have an Oracle or a SQL Server source and a Snowflake target.

1To display the job details, drill down on a job from the My Jobs page in the Data Integration service, the All Jobs page in the Monitor service, or from the Data Ingestion and Replication page in Operational Insights service.

2On the Object Detail pane, navigate to the subtask row for which you want to run data validation. In the Actions menu for the row, select Run Data Validation.

Note: For the Run Data Validation option to be available, the task must have the status ofCompleted.

3Configure how the data should be validated:

aSelect the Flat file connection.

This connection will be used to store the data validation results.

Note: The Flat file connection and the database ingestion and replication job must be on the same runtime environment.

bIn the Sample field, select the option for sampling the size of the data for comparison. The default value is Last 1000 Rows.

4Click Run.

The data validation process starts. The Data Validation column in the Object Detail pane shows the data validation status for the selected task.

If data validation processing completes successfully, you can click the Success status to view the Data Validation Summary. The summary contains the results of the row count validation and the cell-to-cell comparison.

To download a detailed data validation report, click the Download icon. The report highlights any missing or modified rows and columns based on a comparison of the source and target tables.

If an error occurred during the data validation processing, click the Download icon next to the Error status to view the error message.