Database Ingestion and Replication > Database Ingestion and Replication > Key concepts
  

Key concepts

Review the key concepts in this section to understand some configuration options before configuring adatabase ingestion and replication task.

Schema drift handling

Database Ingestion and Replication can be configured to automatically detect some source schema changes and handle these changes on the target. This process is referred to as schema drift.
Database Ingestion and Replication can detect the following types of source schema changes:
When you define a task, on the Let's Go page of the database ingestion and replication task wizard, you can configure how the supported types of schema changes are handled. For example, you can configure schema drift options to ignore the changes, replicate them, or stop the job or subtask when a schema change occurs. For more information, see Finalize the task definition. Note that dIfferent types of schema changes might have different default settings, depending on the target type.
Schema drift options are supported for the following source - target combinations and load types:
Source
Load Type
Target
Db2 for i
Incremental
Combined initial and incremental
Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Db2 for LUW
Incremental
Combined initial and incremental
Snowflake
Db2 for z/OS, except Db2 11
Incremental
Combined initial and incremental
Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, Snowflake, and SQL Server
Microsoft SQL Server
Incremental
Combined initial and incremental
Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Oracle
Incremental
Combined initial and incremental
Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure SQL Database, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
PostgreSQL
Incremental
Combined initial and incremental
Incremental loads: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka, Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, and Snowflake
Combined initial and incremental loads: Oracle, PostgreSQL, and Snowflake

Applying an audit history of all source table DML changes to target tables

You can configure database ingestion and replication incremental load and combined initial and incremental load tasks that have Databricks, Google BigQuery, Oracle, PostgreSQL, Snowflake, or SQL Server targets to write an audit trail of every DML change operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image.
When you define the task, select Audit in the Apply Mode field on the Target page. The Apply Mode field is available for new or undeployed tasks.
To specify the audit metadata columns to add, select one or more of the check boxes under Advanced on the Target page:
These fields are optional. Only the Add Operation Type check box is selected by default to add a column that shows the DML operation type: I (insert), D (delete), or U (update).
The first time you run a job associated with the task, the job generates the target tables with the selected audit metadata columns. Ensure that no constraints other than indexes exist on the target tables.
For targets that support Soft Deletes mode, for each delete operation processed as a soft delete, the operation type of "D" appears in the INFA_OPERATION_TYPE column and values are written to any other metadata columns that you selected. However, for update and insert operations, the INFA_OPERATION_TYPE column and all other selected metadata columns are NULL.
Example for Audit mode
For example, assume the following DML change operations occur on a source table in the order shown:
Insert into tableA pkey = 1
Update tableA where pkey=1
Update tableA where pkey=1
Delete from tableA where pkey = 1
All of the following rows appear in the target table, providing an audit trail of all of the source DML changes:
opType=I, pkey=1….
opType=U, pkey=1...
opType=U, pkey=1...
opType=D, pkey=1...
In this example, the only audit column selected is opType.
Note that when the task's apply mode is Standard, none of these rows appears on the target table because the last DML operation is a Delete, which supersedes the prior changes.
Note: If a combined initial and incremental load job captures an incremental insert change record during the initial unload phase, the job manufactures a delete for the same row to remove any duplicate that might have been obtained from the initial unload. This manufactured activity will be reflected in audit apply mode.

Applying deletes as soft deletes on the target

For database ingestion and replication incremental load and combined initial and incremental load jobs that have any supported source type and a Databricks, Google BigQuery, PostgreSQL, Snowflake, or SQL Server target, you can configure the task to process delete operations on the source as soft deletes on the target.
A soft delete marks a deleted row as deleted without actually removing it from the database. The row is applied to the target with the value of "D" in the generated INFA_OPERATION_TYPE metadata column.
Important: You can use Soft Deletes only if all of the source objects have primary keys and the source does not allow the primary key values to change in any row after the row is first created. If the primary key values change, duplicate rows might be written to the target, causing the target to become corrupted.
Example scenario: Your organization wants to use soft deletes in a data warehouse to mark the rows that were deleted at the source while still retaining the rows for audit purposes.
To enable soft deletes, set the Apply Mode field to Soft Deletes on Target page of the task wizard when you configure the database ingestion and replication task.
Note: If a combined initial and incremental load job captures an incremental insert change record during the initial unload phase, the job manufactures a delete for the same row to remove any duplicate that might have been obtained from the initial unload. This manufactured activity will be reflected in the soft delete apply mode.

CDC staging groups and tasks

You can define a CDC staging group that includes multiple database ingestion and replication incremental load or combined initial and incremental load tasks that have a Db2 for i, Db2 for z/OS, Oracle, Microsoft SQL Server, or SAP HANA source. SAP HANA sources must use the Log-based CDC method.
If you enable staging groups, a CDC Staging Task reads data from the source database in a single pass and writes the data to common storage. The apply tasks that you associate with the group can then read data from the common storage and apply the data to different targets.
This feature can improve CDC performance and scalability because data is read from the source logs once by the CDC Staging Task instead of being read multiple times by the apply jobs. Also, if you use a Secure Agent group, the apply jobs can run on different agents in the same Secure Agent group.
Note: To get access to this feature, contact Informatica Global Customer Support. They'll set the cdir.cdc.group.enabled flag to true for your organization.
When you create a new task in the task wizard, you can add it to an existing staging group or create a new staging group. You cannot add a pre-existing task to a staging group.
When determining whether to use a CDC staging group, consider the following points
For Db2 for z/OS sources, you might need to perform some prerequisite tasks before using staging groups: 1) create the INFA_DATA’ table that stores DBID, OBID and PSID information for all of the selected source tables, if this information might be greater than 16 KB and 2) install the Db2 stored procedure dated 11/01/2024 or later that can use this table to identify the selected tables for processing. For stored procedure installation instructions, go to the Database Ingestion and Replication documentation on the Doc Portal and see Installing and configuring the stored procedure for Db2 for z/OS CDC. For information about creating the INFA_DATA table, see INFA_DATA table for Db2 for z/OS sources in a staging group.
The CDC Staging Task uses the following types of common storage:
In the monitoring interfaces, you can filter for the asset type of "CDC Staging Task" and view the task's progress in capturing the source change data to cloud storage. In the Task Summary section of the job details, the Target reflects the SAP HANA source connection instead of the cache database type or an ultimate target type.
The following status interactions between the staging task and dependent tasks can occur:

INFA_DATA table for Db2 for z/OS sources in a staging group

If you expect the information that identifies selected source tables to the CDC staging task to be greater than 16 KB, you must install or manually create the INFA_DATA table. The table stores DBID, OBID and PSID information for each of the source tables selected in the tasks that are associated with the staging group.
If the INFA_DATA table does not exist and the computed length of these values is greater than 16 KB, the CDC staging task will fail.
You can create the INFA_DATA table by using the #INFATBL installation member. The installation member includes the following SQL statements for creating the table:
CREATE TABLE <schema>.INFA_DATA
(INFA_CODE CHARACTER(2) NOT NULL
,INFA_GROUP_ID INTEGER NOT NULL
,INFA_SEQ INTEGER NOT NULL
,INFA_TS TIMESTAMP (6) WITHOUT TIME ZONE NOT NULL
,INFA_DATA VARCHAR(4096)
,PRIMARY KEY(INFA_CODE,INFA_GROUP_ID,INFA_SEQ,INFA_TS)
)
APPEND NO
DATA CAPTURE CHANGES
CCSID EBCDIC;
Note: You can use a table name other than the default name of INFA_DATA if you prefer.
Grant the following privileges on the INFA_DATA table to the user specified in the connection properties for the primary source connection:
GRANT INSERT ON <schema>.INFA_DATA to <stagingtask_connecton_user>;
GRANT DELETE ON <schema>.INFA_DATA to <stagingtask_connecton_user>;
GRANT EXECUTE <schema>.INFA_DATA to <stagingtask_connecton_user>;
COMMIT;
Where:

Integrating Database Ingestion and Replication Tasks with Data Integration Taskflows

To perform post-replication transformation, you can configure database ingestion and replication tasks to trigger Data Integration taskflows that process and transform the ingested data. This feature is available for tasks that have any supported load type, source type, and target type.
When you define a database ingestion and replication task, you can select the Execute in Taskflow option to make the task available to add to taskflows in Data Integration. For incremental load and combined load jobs with an Amazon Redshift, Oracle, Snowflake (without Superpipe), or SQL Server target, you can optionally select the Add Cycle ID option to include cycle ID metadata in the target table. The Cycle ID column identifies the cycle in which the row got updated. It's passed as a parameter to the taskflow, where you can use it to filter the rows on which to execute transformation logic.
When you configure the taskflow in Data Integration, you can select the task as an event source and add any appropriate transformation type to transform the ingested data.
Configuration task flow:
  1. 1In the Data Ingestion and Replication task configuration wizard, select the following options when defining a database ingestion and replication task:
  2. 2When done defining the task, Save it.
  3. 3To define a taskflow in Data Integration, click the Orchestrate panel on the Home page.
  4. 4To add the database ingestion and replication task in the taskflow, perform the following steps:
    1. aUnder Task Properties, click Start.
    2. bIn the Binding field, select Event.
    3. cIn the Event Source Name field, click Select. Then in Select Event Source dialog box, select the database ingestion and replication task and click Select.
    4. Note: You can filter the list of tasks by task type.
    5. dCheck that the Event Source Name field and Input Fields display the task name. For example:
The taskflow is automatically triggered to start when either the initial load task successfully completes or after each CDC cycle in an incremental load operation. If a CDC cycle ends but the previous taskflow run is still running, the data is queued and waits for the previous taskflow to complete.

About LOB truncation

If you select the Include LOBs option for a Db2 for LUW, Oracle, PostgreSQL, or SQL Server source, LOB data might be truncated, primarily depending on the maximum size that the target allows.
Target-side truncation points:
Source-side truncation considerations:

Customizing data type mappings

When you configure a target for a database ingestion and replication task, you can optionally define data-type mapping rules to override the default mappings of source data types to target data types.
The default mappings are described in Default data-type mappings.
This feature is supported for tasks that have the following source and target combinations:
For example, you can create a data-type rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision, instead of using the default mapping to the Snowflake VARCHAR(255) data type.
To create a data-type mapping rule:
  1. 1Expand Data Type Rules.
  2. 2In the Create Rule fields, enter a source data type and the target data type that you want to map it to.
  3. In the Source field only, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10).
  4. 3Click Add Rule.
  5. The rule appears in the list of rules.
To delete a rule, click the Delete icon at the right end of the rule row.
After you deploy a task with custom mapping rules, you cannot edit the rules until the task is undeployed.
Usage notes: