Key concepts

Database Ingestion and Replication can detect the following types of source schema changes:

When you define a task, on the Let's Go page of the database ingestion and replication task wizard, you can configure how the supported types of schema changes are handled. For example, you can configure schema drift options to ignore the changes, replicate them, or stop the job or subtask when a schema change occurs. For more information, see Finalize the task definition. Note that dIfferent types of schema changes might have different default settings, depending on the target type.

Schema drift options are supported for the following source - target combinations and load types:

Source	Load Type	Target
Db2 for i	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Db2 for LUW	Incremental Combined initial and incremental	Snowflake
Db2 for z/OS, except Db2 11	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, Snowflake, and SQL Server
Microsoft SQL Server	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Oracle	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure SQL Database, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
PostgreSQL	Incremental Combined initial and incremental	Incremental loads: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, and Snowflake Combined initial and incremental loads: Oracle, PostgreSQL, and Snowflake

When you define the task, select Audit in the Apply Mode field on the Target page. The Apply Mode field is available for new or undeployed tasks.

To specify the audit metadata columns to add, select one or more of the check boxes under Advanced on the Target page:

These fields are optional. Only the Add Operation Type check box is selected by default to add a column that shows the DML operation type: I (insert), D (delete), or U (update).

The first time you run a job associated with the task, the job generates the target tables with the selected audit metadata columns. Ensure that no constraints other than indexes exist on the target tables.

For targets that support Soft Deletes mode, for each delete operation processed as a soft delete, the operation type of "D" appears in the INFA_OPERATION_TYPE column and values are written to any other metadata columns that you selected. However, for update and insert operations, the INFA_OPERATION_TYPE column and all other selected metadata columns are NULL.

Example for Audit mode

For example, assume the following DML change operations occur on a source table in the order shown:

All of the following rows appear in the target table, providing an audit trail of all of the source DML changes:

In this example, the only audit column selected is opType.

Note that when the task's apply mode is Standard, none of these rows appears on the target table because the last DML operation is a Delete, which supersedes the prior changes.

A soft delete marks a deleted row as deleted without actually removing it from the database. The row is applied to the target with the value of "D" in the generated INFA_OPERATION_TYPE metadata column.

Example scenario: Your organization wants to use soft deletes in a data warehouse to mark the rows that were deleted at the source while still retaining the rows for audit purposes.

To enable soft deletes, set the Apply Mode field to Soft Deletes on Target page of the task wizard when you configure the database ingestion and replication task.

If you enable staging groups, a CDC Staging Task reads data from the source database in a single pass and writes the data to common storage. The apply tasks that you associate with the group can then read data from the common storage and apply the data to different targets.

This feature can improve CDC performance and scalability because data is read from the source logs once by the CDC Staging Task instead of being read multiple times by the apply jobs. Also, if you use a Secure Agent group, the apply jobs can run on different agents in the same Secure Agent group.

When you create a new task in the task wizard, you can add it to an existing staging group or create a new staging group. You cannot add a pre-existing task to a staging group.

When determining whether to use a CDC staging group, consider the following points

For Db2 for z/OS sources, you might need to perform some prerequisite tasks before using staging groups: 1) create the INFA_DATA’ table that stores DBID, OBID and PSID information for all of the selected source tables, if this information might be greater than 16 KB and 2) install the Db2 stored procedure dated 11/01/2024 or later that can use this table to identify the selected tables for processing. For stored procedure installation instructions, go to the Database Ingestion and Replication documentation on the Doc Portal and see Installing and configuring the stored procedure for Db2 for z/OS CDC. For information about creating the INFA_DATA table, see INFA_DATA table for Db2 for z/OS sources in a staging group.

The CDC Staging Task uses the following types of common storage:

In the monitoring interfaces, you can filter for the asset type of "CDC Staging Task" and view the task's progress in capturing the source change data to cloud storage. In the Task Summary section of the job details, the Target reflects the SAP HANA source connection instead of the cache database type or an ultimate target type.

The following status interactions between the staging task and dependent tasks can occur:

If the INFA_DATA table does not exist and the computed length of these values is greater than 16 KB, the CDC staging task will fail.

You can create the INFA_DATA table by using the #INFATBL installation member. The installation member includes the following SQL statements for creating the table:

Grant the following privileges on the INFA_DATA table to the user specified in the connection properties for the primary source connection:

Where:

When you define a database ingestion and replication task, you can select the Execute in Taskflow option to make the task available to add to taskflows in Data Integration. For incremental load and combined load jobs with an Amazon Redshift, Oracle, Snowflake (without Superpipe), or SQL Server target, you can optionally select the Add Cycle ID option to include cycle ID metadata in the target table. The Cycle ID column identifies the cycle in which the row got updated. It's passed as a parameter to the taskflow, where you can use it to filter the rows on which to execute transformation logic.

When you configure the taskflow in Data Integration, you can select the task as an event source and add any appropriate transformation type to transform the ingested data.

Configuration task flow:

The taskflow is automatically triggered to start when either the initial load task successfully completes or after each CDC cycle in an incremental load operation. If a CDC cycle ends but the previous taskflow run is still running, the data is queued and waits for the previous taskflow to complete.

Target-side truncation points:

Source-side truncation considerations:

The default mappings are described in Default data-type mappings.

This feature is supported for tasks that have the following source and target combinations:

For example, you can create a data-type rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision, instead of using the default mapping to the Snowflake VARCHAR(255) data type.

To create a data-type mapping rule:

To delete a rule, click the Delete icon at the right end of the rule row.

After you deploy a task with custom mapping rules, you cannot edit the rules until the task is undeployed.

Usage notes:

Key concepts

Schema drift handling

Applying an audit history of all source table DML changes to target tables

Applying deletes as soft deletes on the target

CDC staging groups and tasks

INFA_DATA table for Db2 for z/OS sources in a staging group

Integrating Database Ingestion and Replication Tasks with Data Integration Taskflows

About LOB truncation

Customizing data type mappings