Review the key concepts in this section to understand some configuration options before configuring an application ingestion and replication task.
Integrating Application Ingestion and Replication tasks with Data Integration taskflows
You can configure application ingestion and replication tasks to trigger Data Integration taskflows that process and transform the ingested data.
This feature is available for tasks that use any supported load type and have an Amazon Redshift, Oracle, SQL Server, or Snowflake target. It's also available for tasks that use the initial load type and have an Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage.
When you define an application ingestion and replication task, you can select the Execute in Taskflow option to make the task available to add to taskflows in Data Integration. For incremental load and combined load jobs with an Amazon Redshift, Oracle, Snowflake (without Superpipe), or SQL Server target, you can optionally select the Add Cycle ID option to include cycle ID metadata in the target table. The Cycle ID column identifies the cycle in which the row got updated. It's passed as a parameter to the taskflow, where you can use it to filter the rows on which to execute transformation logic.
When you configure the taskflow in Data Integration, you can select the task as an event source and add any appropriate transformation type to transform the ingested data.
Configuration task flow:
1In the Data Ingestion and Replication task configuration wizard, select the following options when defining a application ingestion and replication task:
- (Optional) Add Cycle ID on the Target page. For incremental load jobs, this option adds the generated cycle ID for each CDC cycle to the target table. Applies to Amazon Redshift, Oracle, and SQL Server targets and to Snowflake targets that don't use the Superpipe option.
- Execute in Taskflow on the Schedule and Runtime Options page. This option makes the task available for selection when defining a taskflow in Data Integration.
2When done defining the task, Save it.
3To define a taskflow in Data Integration, click the Orchestrate panel on the Home page.
4To add the application ingestion and replication task in the taskflow, perform the following steps:
aUnder Task Properties, click Start.
bIn the Binding field, select Event.
cIn the Event Source Name field, click Select. Then in Select Event Source dialog box, select the database ingestion and replication task and click Select.
Note: You can filter the list of tasks by task type.
dCheck that the Event Source Name field and Input Fields display the task name. For example:
The taskflow is automatically triggered to start when either the initial load task successfully completes or after each CDC cycle in an incremental load operation. If a CDC cycle ends but the previous taskflow run is still running, the data is queued and waits for the previous taskflow to complete.
Handling source schema changes
You can configure Application Ingestion and Replication to automatically detect source schema changes, also called schema drift, and handle these changes on the target. This feature is available only for the incremental load and combined initial and incremental load tasks.
When you configure an application ingestion and replication task, on the Let's Go page of the application ingestion and replication task wizard, you can specify the types of source schema changes that Application Ingestion and Replication must propagate for the job associated with the task. You can also specify how the job must handle each type of source schema change. For example, you can configure the task to ignore the changes, replicate them, or stop the job when a particular type of schema change occurs on the source. For more information, see Finalize the task definition.
Note: In combined initial and incremental load jobs, Application Ingestion and Replication starts detecting and replicating source schema changes only after the source object reaches the Normal state.
The following table describes the types of schema changes that Application Ingestion and Replication can detect for each source type:
Source type
Supported schema changes
Adobe Analytics
- Add field
- Modify field
- Drop field
- Rename field
Google Analytics
- Add column
- Modify column
- Drop column
- Rename column
Marketo
Not supported
Microsoft Dynamics 365
- Add column
- Modify column
- Drop column
NetSuite
- Add column
- Modify column
Oracle Fusion Cloud
- REST - Not supported
- BICC - Not supported
Salesforce
- Add field
- Modify field
- Drop field
- Rename field
SAP
- Using SAP ODP Extractor connector - Not supported
- Using SAP Mass Ingestion connector - Not supported
ServiceNow
- Add column
- Modify column
- Drop column
Workday
- SOAP - Not supported
- RaaS - Not supported
Zendesk
Not supported
Application Ingestion and Replication detects a schema change in a source object only after Data Manipulation Language (DML) operations occur on the altered source object. If multiple schema changes occur without intervening DML operations, Application Ingestion and Replication detects all the schema changes together when a DML operation occurs.
Ability to apply deletes as soft deletes on the target
For application ingestion and replication incremental load and combined initial and incremental load jobs that have any supported source type and a Databricks, Google BigQuery, or Snowflake target, you can configure the task to process delete operations on the source as soft deletes on the target.
A soft delete marks a deleted row as deleted without actually removing it from the database. The row is applied to the target with the value of "D" in the generated INFA_OPERATION_TYPE metadata column.
Important: You can use Soft Deletes only if all of the source objects have primary keys and the source does not allow the primary key values to change in any row after the row is first created. If the primary key values change, duplicate rows might be written to the target, causing the target to become corrupted.
Example scenario: Your organization wants to use soft deletes in a data warehouse to mark the rows that were deleted at the source while still retaining the rows for audit purposes.
To enable soft deletes, set the Apply Mode field to Soft Deletes on Target page of the task wizard when you configure the ingestion and replication task.
Apply an audit history of all source table change operations to target tables
You can configure application ingestion and replication incremental load and combined initial and incremental load tasks that have Databricks, Snowflake, Google BigQuery, or Oracle targets to write an audit trail of every DML change operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image.
When you define the task, select Audit in the Apply Mode field on the Target page. The Apply Mode field is available for new or undeployed tasks.
To specify the audit metadata columns to add, select one or more of the check boxes under Advanced on the Target page:
•Add Last Replicated Time. Adds a column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp, except for Snowflake targets that use the Superpipe option where the seconds or minutes might vary slightly. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
•Add Operation <metadata_type>. Adds columns that contain metadata for change operations, such as the DML operation type, time, owner, transaction ID, and generated ascending sequence number. The columns are populated when data is loaded to the target tables.
•Add Before Images. Adds _OLD columns that contain before-image data for Updates. You can compare the old and new column values in the tables.
•Prefix for Metadata Columns. Add a prefix to the names of the added audit columns to differentiate them from other table columns. Default is INFA_.
These fields are optional. Only the Add Operation Type check box is selected by default to add a column that shows the DML operation type: D (delete), or U (update).
The first time you run a job associated with the task, the job generates the target tables with the selected audit metadata columns. Ensure that no constraints other than indexes exist on the target tables.
For targets that support Soft Deletes mode, including Snowflake and Databricks targets, for each delete operation processed as a soft delete, the operation type of "D" appears in the INFA_OPERATION_TYPE column and values are written to any other metadata columns that you selected. However, for update and insert operations, the INFA_OPERATION_TYPE column and all other selected metadata columns are NULL.
Example for Audit mode
For example, assume the following DML change operations occur on a source table in the order shown:
Insert into tableA pkey = 1 Update tableA where pkey=1 Update tableA where pkey=1 Delete from tableA where pkey = 1
All of the following rows appear in the target table, providing an audit trail of all of the source DML changes:
In this example, the only audit column selected is opType.
Note that when the task's apply mode is Standard, none of these rows appears on the target table because the last DML operation is a Delete, which supersedes the prior changes.
Note: If a combined initial and incremental load job captures an incremental insert change record during the initial unload phase, the job manufactures a delete for the same row to remove any duplicate that might have been obtained from the initial unload. This manufactured activity will be reflected in audit apply mode.