Database Ingestion and Replication > Database Ingestion and Replication > Managing database ingestion and replication jobs
  

Managing database ingestion and replication jobs

After you configure and run database ingestion and replication tasks, you might occasionally need to perform some job management tasks such stopping, resuming, undeploying, or redeploying jobs.

Stopping a database ingestion and replication job

You can stop a database ingestion and replication job of any load type that is in the Up and Running, Running with Warning, or On Hold state.
For an incremental load job, the job stops after a checkpoint is taken. A checkpoint records the point in the change stream where incremental processing left off for recovery purposes.
For a combined initial and incremental load job, initial load subtasks that are running are allowed to run to completion and Initial load subtasks that are not running remain in their current states. For the incremental load portion of the job, a checkpoint is written to the checkpoint file or target recovery table before the job stops. The database ingestion and replication job will not be able to record a checkpoint unless a change record has been processed for at least one of the tables in the job during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.
For an initial load job, any running subtasks are allowed to run to completion and then the job stops. Non-running subtasks remain in their current states.
    1Navigate to the row for the job that you want to stop in any of the following monitoring interfaces:
    2Click the Actions menu for the job and select Stop.
    The job state switches to Stopping and then to Stopped.
    Tip: If the Stop operation is taking too long, you can abort the job.

Aborting a database ingestion and replication job

You can abort a database ingestion and replication job that is in the Up and Running, On Hold, Running with Warning, or Stopping state.
For an incremental load job, the job stops immediately after a checkpoint is taken. A checkpoint records the point in the change stream where incremental processing left off for recovery purposes.
For a combined initial and incremental load job, any running initial load subtasks stop immediately. For the incremental load portion of the job, a checkpoint is taken and then the job stops.
For an initial load job, any running subtasks stop immediately and then the job stops. Non-running subtasks remain in their current states.
    1Navigate to the row for the job that you want to abort in any of the following monitoring interfaces:
    2From the Actions menu for the job, select Abort.
    The job state switches to Aborting and then to Aborted.
    For initial load jobs, the state of started and running subtasks switches to Aborted. For incremental load or combined initial and incremental load jobs, the state of subtasks switches to Stopped.

Resuming a database ingestion and replication job

You can resume a database ingestion and replication job that is in the Stopped, Aborted, or Failed state.
When you resume an initial load job that has multiple subtasks, Database Ingestion and Replication starts only the subtasks that are in a Failed, Stopped, Aborted, or Queued state.
When you resume an incremental load job or a combined initial and incremental load job, Database Ingestion and Replication resumes replicating source data change from the last position recorded in the checkpoint file or target recovery table. A checkpoint will not be available unless a change record was processed for at least one of the tables during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.
Note: For initial load jobs, the Run command might also be available. Click Run if you want the database ingestion and replication job to truncate all of the target tables and then reload the source data to the target tables.
    1Navigate to the row for the job that you want to resume in any of the following monitoring interfaces:
    2In the Actions menu for the row, click Resume.
    Note: The Resume command is not available if the job is in the Failed state because the task deployment failed.
    A subtask is started for each source table.
    If an error occurs, an error message is displayed at the top of the page.

Overriding schema drift options when resuming a database ingestion and replication job

You can override the schema drift options when you resume a database ingestion and replication job that is in the Stopped, Aborted, or Failed state. The overrides affect only those tables that are currently in the Error state because of the Stop Table or Stop Job Schema Drift option. Use the overrides to correct or resolve these errors.
You can override schema drift options and resume an incremental load job or a combined initial and incremental load job from the All Jobs tab on the Data Ingestion and Replication page in Operational Insights.
    1Navigate to the row for the job that you want to resume with an override.
    2Click the Actions menu for the row and select Resume With Options.
    Note: The Resume With Options command is not available if the job is in the Failed state because the task deployment failed.
    The Resume Options dialog box appears.
    3In the Schema Drift Options list, select the schema drift option that will be used to process the DDL operation on the source that caused the database ingestion and replication job to stop.
    The following table describes the schema drift options:
    Option
    Description
    Ignore
    Do not replicate DDL changes that occur on the source database to the target.
    Stop Table
    Stop processing the source table on which the DDL change occurred.
    Important: The database ingestion and replication job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options > Resync option.
    Resync
    Resynchronize the target tables with the latest source table definitions, including any DDL changes that the schema drift ignored. Use this option for tables that the job stopped processing because of the Stop Table setting for a Schema Drift option.
    Important: This option is available only for combined initial and incremental load jobs.
    Resync (refresh)
    For database ingestion and replication combined load jobs that have an Oracle or SQL Server source, use this option to resynchronize the target tables with the latest source table definitions, including any DDL changes that schema drift ignored. After the target tables are refreshed, the structure of source and target tables match. This option mimics the behavior of the Resync option.
    Resync (retain)
    For database ingestion and replication combined load jobs that have an Oracle or SQL Server source, use this option to resynchronize the same columns that have been processed for CDC, retaining the current structure of the source and target tables. No checks for changes to the source or target table definitions are performed. If source DDL changes affected the source table structure, those changes are not processed.
    Replicate
    Allow the database ingestion and replication job to replicate the DDL change to the target.
    Important: If you specify the Replicate option for Rename Column operations on Microsoft Azure Synapse Analytics targets, the job will end with an error.
    4Click Resume With Options.
    The resumed job will use the schema drift option that you specified in step 3 to process the schema change that caused the job to stop. Thereafter, the schema drift options that you specified when creating the task take effect again.
    Important: Database Ingestion and Replication processes a schema change to a source table only after a DML operation occurs on the table. Therefore, after you resume a job, the table subtask state remains unchanged until the first DML operation occurs on the table.

Redeploying a database ingestion and replication job

Redeploy a database ingestion and replication job after editing available fields in the associated database ingestion and replication task so that the new settings can take effect.
You can edit some but not all of the fields in an ingestion task definition that has been previously deployed, without first undeploying the job. You can add a source table and change any of the runtime and target options that are available for editing. For example, you might want to reset some target options to test the effects of different settings.
The redeploy operation stops each job subtask for a source table, deploys the updated database ingestion and replication task, and automatically starts the subtasks that were stopped and any new subtasks for added source tables.
    1Navigate to the row for the job that you want to redeploy in any of the following monitoring interfaces:
    2In the Actions menu for the row, select Redeploy.
    The job instance automatically starts running.
    If the job was running when you selected Redeploy, Database Ingestion and Replication stops the job and then redeploys the database ingestion and replication task and restarts the job.
Notes:

Undeploying a database ingestion and replication job

Undeploy a database ingestion and replication job if you no longer need to run the job, the job is in the Failed state, or you need to change a connection or property in the associated task that cannot be edited without first undeploying the job.
Before you attempt to undeploy a job, ensure that it is not running.
After the job is undeployed, you cannot run it again or redeploy it. If you want to run a job for the associated database ingestion and replication task again, you must deploy the task again from the task wizard to create a new job instance. For example, if you want to change the target connection, undeploy the job, edit the task to change the connection, deploy the task again, and then run the new job instance.
    1Navigate to the row for the job that you want to undeploy in any of the following monitoring interfaces:
    2In the Actions menu for the row, click Undeploy.
    If the undeploy operation fails, the job status switches to Failed, even if it was in the Aborted state, or remains as Failed.
    Note: After undeploying jobs, do not immediately shut down the Secure Agent. Database Ingestion and Replication requires some time to clean up files for tasks in the /root/infaagent/apps/Database_Ingestion/data/tasks directory.

Resynchronizing source and target objects

You can resynchronize source and target objects for a subtask that is part of a running database ingestion and replication combined initial and incremental load job. The subtask must be in a state other than Queued or Starting.
For example, you might want to resynchronize the target with the source if initial load or incremental load processing failed or if you want to start the job over again from a specific restart point.
Important: To resynchronize tables that stopped and are currently in the Error state because of the Schema Drift setting of Stop Table, you must use the Resume With Options > Resync option in the Actions menu. For more information, see Overriding schema drift options when resuming a database ingestion and replication job.
    1Drill down on the database ingestion and replication job that you want to resynchronize from one of the following monitoring interfaces:
    The job must be in the Up and Running state and be for a combined initial and incremental load operation.
    2Click the Object Detail tab.
    3In the subtask row for the source and target objects that you want to resynchronize, click the Actions menu and select Resync. The resync operation resynchronizes the target table with the latest source table definition, including any DDL changes.
    Note: For the Actions menu and Resync option to be available, the subtask must be in a state other than Queued or Starting.
    If you are resynchronizing a subtask for a database ingestion and replication combined load job that has an Oracle or SQL Server source, use one of the following resync options instead of the Resync option:
Notes:

Restart and recovery for incremental change data processing

Database Ingestion and Replication can restart incremental load and combined initial and incremental load jobs that stopped because of an error or a user stop request without losing change data.
After the first job run, Database Ingestion and Replication continually records an identifier for the processing position in the change stream as changes are applied to the target. For file-based targets such as Amazon S3, Azure Data Lake Storage, Google Cloud Storage, Kafka, and Oracle Cloud Object Storage, the identifier is stored in a checkpoint file. For database targets, the identifier is stored in a generated recovery table, called INFORMATICA_CDC_RECOVERY, on the target.
Note: For the first run of an incremental load job, Database Ingestion and Replication uses the start point that you set in the Initial Start Point for Incremental Load field when defining the database ingestion and replication task.
If incremental change data processing ends abnormally or in response to a user stop or abort request and you then resume the job, the job resumes from the last position saved to the checkpoint file or recovery table. A checkpoint will not be available unless a change record was processed for at least one of the tables during the first job run after deployment. If a checkpoint is not available, the job resumes processing from the configured restart point, which is the latest available position in the change stream by default.

Running data validation for a database ingestion and replication jobs

For initial load jobs that completed successfully, you can run data validation to compare the source and target data. Data validation is available only for initial load jobs that have an Oracle or a SQL Server source and a Snowflake target.
    1To display the job details, drill down on a job from the My Jobs page in the Data Integration service, the All Jobs page in the Monitor service, or from the Data Ingestion and Replication page in Operational Insights service.
    2On the Object Detail pane, navigate to the subtask row for which you want to run data validation. In the Actions menu for the row, select Run Data Validation.
    Note: For the Run Data Validation option to be available, the task must have the status ofCompleted.
    3Configure how the data should be validated:
    1. aSelect the Flat file connection.
    2. This connection will be used to store the data validation results.
      Note: The Flat file connection and the database ingestion and replication job must be on the same runtime environment.
    3. bIn the Sample field, select the option for sampling the size of the data for comparison. The default value is Last 1000 Rows.
    4Click Run.
    The data validation process starts. The Data Validation column in the Object Detail pane shows the data validation status for the selected task.
    If data validation processing completes successfully, you can click the Success status to view the Data Validation Summary. The summary contains the results of the row count validation and the cell-to-cell comparison.
    To download a detailed data validation report, click the Download icon. The report highlights any missing or modified rows and columns based on a comparison of the source and target tables.
    If an error occurred during the data validation processing, click the Download icon next to the Error status to view the error message.