Database Ingestion and Replication > Database Ingestion and Replication > Configuring a database ingestion and replication task

Configuring a database ingestion and replication task

In Data Integration, use the database ingestion and replication task wizard to configure a database ingestion and replication task.

For a database ingestion and replication initial load or incremental load task that has a Snowflake target, you can enable the task to integrate with Data Integration taskflows. For more information, see Integrating Database Ingestion and Replication Tasks with Data Integration Taskflows.

To configure a task, click the Ingest panel on the Home page and then complete the following configuration steps in the task wizard:

1Define basic task information, such as the task name, project location, runtime environment, and load type.
2Configure the source.
3Configure the target.
4Configure schedule and runtime options.

Click Next or Back to navigate from one page to another. At any point, you can click Save to save the information that you have entered so far.

After you complete all wizard pages, save the information and then click Deploy to make the task available as an executable job to the Secure Agent.

Before you begin

Before you begin, complete the following prerequisite tasks in Administrator:

•Check that your organization has licenses for Database Ingestion and Replication and the DBMI packages.
•Verify that the Secure Agent in your runtime environment is running and that you can access the Data Ingestion and Replication service.
•Define the source and target connections.

Also, if you plan to perform incremental load operations with Oracle sources, ensure that the ORACLE_HOME environment variable is defined on the Secure Agent system.

Integrating Database Ingestion and Replication Tasks with Data Integration Taskflows

To perform post-replication transformation, you can configure database ingestion and replication tasks to trigger Data Integration taskflows that process and transform the ingested data. This feature is available for initial load tasks and incremental load tasks that have any supported source type and a Snowflake target for which the Superpipe option isn't selected.

When you define a database ingestion and replication task, you can select the Execute in Taskflow option to make the task available to add to taskflows in Data Integration. You can also optionally select the Add Cycle ID option to include cycle ID metadata in the target table. The Cycle ID column identifies the cycle in which the row got updated. It's passed as a parameter to the taskflow, where you can use it to filter the rows on which to execute transformation logic.

When you configure the taskflow in Data Integration, you can select the task as an event source and add any appropriate transformation type to transform the ingested data.

Configuration task flow:

1In the Data Ingestion and Replication task configuration wizard, select the following options when defining a database ingestion and replication task:

- (Optional) Add Cycle ID on the Target page. This option adds the generated cycle ID for each CDC cycle to the target table.
- Execute in Taskflow on the Schedule and Runtime Options page. This option makes the task available for selection when defining a taskflow in Data Integration.

2When done defining the task, Save it.
3To define a taskflow in Data Integration, click the Orchestrate panel on the Home page.
4To add the database ingestion and replication task in the taskflow, perform the following steps:

aUnder Task Properties, click Start.
bIn the Binding field, select Event.
cIn the Event Source Name field, click Select. Then in Select Event Source dialog box, select the database ingestion and replication task and click Select.

Note: You can filter the list of tasks by task type.

dCheck that the Event Source Name field and Input Fields display the task name. For example:

The taskflow is automatically triggered to start when either the initial load task successfully completes or after each CDC cycle in an incremental load operation. If a CDC cycle ends but the previous taskflow run is still running, the data is queued and waits for the previous taskflow to complete.

Defining basic task information

To begin defining a database ingestion and replication task, you must first enter some basic information about the task, such as a task name, project or project folder location, and load operation type.

1Start the task wizard in one of the following ways:

- On the Home page, click the Ingest panel and select Database Ingestion and Replication Task.

Click the box for the Database Ingestion and Replication task type.

- In the navigation bar on the Explore page or the Home page, click New to open the New Asset dialog box. Then, select Data Ingestion and Replication > Database Ingestion and ReplicationTask and click Create.

Note: If your organization does not have a custom license for application, database, file, or streaming ingestion and replication, the Data Ingestion and Replication category still appears but you cannot select and configure an ingestion and replication task of the unlicensed task type or types.

The Definition page of the database ingestion and replication task wizard appears.

2Configure the following properties:

Property	Description
Name	A name for the database ingestion and replication task. Task names can contain Latin alphanumeric characters, spaces, periods (.), commas (,), underscores (_), plus signs (+), and hyphens (-). Task names cannot include other special characters. Task names are not case sensitive. Maximum length is 50 characters. Note: If you include spaces in the database ingestion and replication task name, after you deploy the task, the spaces do not appear in the corresponding job name.
Location	The project or project\folder that will contain the task definition. The default is the currently selected project or project subfolder in Explore. If a project or project subfolder is not selected, the default is the Default project.
Runtime Environment	The runtime environment in which you want to run the task. The runtime environment can be a Secure Agent group that consists of one or more Secure Agents. A Secure Agent is a lightweight program that runs tasks and enables secure communication. For selected cloud source types, you can use a serverless runtime environment. For database ingestion and replication tasks, the Cloud Hosted Agent is not supported and does not appear in the Runtime Environment list. Tip: Click the Refresh icon to refresh the list of runtime environments.
Description	An optional description for the task. Maximum length is 4,000 characters.
Load Type	The type of load operation that the database ingestion and replication task performs. Options are: - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent. - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run. - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis. Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.

3Click Next.

Configuring the source

Configure the source on the Source page of the database ingestion and replication task wizard.

Note: For MongoDB sources only, the task wizard displays database instead of schema and displays collection instead of table. However, for simplicity, the terms schema and table are used in this documentation to cover all source types.

1In the Connection list, select the connection for the source system. The connection type appears in parentheses after the connection name.

The connection must be predefined in Administrator for a runtime environment that your organization uses.

The list includes only the connection types that are valid for the load type selected on the Definition page. No connections are listed if you did not select a load type.

If you change the load type and the selected connection is no longer valid, a warning message is issued and the Connection field is cleared. You must select another connection that is valid for the updated load type.

Note: After you deploy the database ingestion and replication task, you cannot change the connection without first undeploying the associated job. After you change the connection, you must deploy the task again.

2In the Schema list, select the source schema that includes the source tables. If you specified a schema in the connection properties, it is selected by default but you can change it.

The list includes only the schemas that are available in the database accessed with the specified source connection.

When creating a task that has an Oracle, Microsoft SQL Server, Netezza, or PostgreSQL source, the schema name that is specified in the connection properties is displayed by default.

3If you are defining a Db2 for i source for an incremental load task, in the Journal Name field, select the name of the journal that records the changes made to the source tables.

4If you are defining a PostgreSQL source for an incremental load or combined initial and incremental load task, complete the following fields:

Field	Description
Replication Slot Name	The unique name of a PostgreSQL replication slot. A slot name can contain Latin alphanumeric characters in lowercase and the underscore (_) character. Maximum length is 63 characters. Important: Each database ingestion and replication task must use a different replication slot. If you're using a PostgreSQL replica, enter the name of the replcation slot you defined on the replica.
Replication Plugin	The PostgreSQL replication plug-in. Options are: - pgoutput. You can select this option only for PostgreSQL version 10 and later. - wal2json
Publication	If you selected pgoutput as the replication plug-in, specify the publication name that this plug-in uses. Note: This field is not displayed if you selected wal2json as the replication plug-in.

5If you are defining an incremental load or combined initial and incremental load task that has a Db2 for LUW, Oracle, or SQL Server source, select the capture method that you want to use under Change Data Capture Method.

aIn the CDC Method field, select one of the following options to indicate the method to use for capturing source changes:

Method	Supported Sources	Description
CDC Tables	SQL Server only	Read data changes directly from the SQL Server CDC tables. For SQL Server sources, this method provides the best replication performance and highest reliability of results.
Log-based	Oracle and SQL Server	Capture Inserts, Updates, Deletes, and column DDL changes in near real time by reading the database transaction logs. For Oracle sources, data changes are read from the Oracle redo logs. For SQL Server sources, data changes are read from the SQL Server transaction log and the enabled SQL Server CDC tables. Exception: For Azure SQL Database sources, data changes are read from CDC tables only.
Query-based	Db2 for LUW, Oracle, and SQL Server	Capture Inserts and Updates by using a SQL WHERE clause that points to a CDC query column. The query column is used to identify the rows that contain the changes made to the source tables since the beginning of the CDC interval. For Db2 for LUW sources in incremental load and initial and incremental load jobs, this capture method is the only available option.

bIf you selected the Query-based option, complete the following additional fields:

▪ CDC Query Column Type. The column data type of the CDC query column in the source table. The column data type must combine the date and time. For Db2 for LUW and Oracle, the only available option is Timestamp. For SQL Server, the only available option is Datetime.
▪ CDC Query Column Name. The case-sensitive name of the CDC query column in the source table. The column must be present in the source table. Maximum length is 70 characters.
▪ CDC Interval. The frequency of a query-based change data capture cycle, expressed in days, hours, and minutes. You must enter a positive number in at least one of the interval fields. Otherwise, an error will be issued when you try to save the task. The default value is 5 minutes.

6Under Table Selection, use one of the following methods to select source tables:

- Select Select All to select all tables and columns in the schema for data replication. The Tables Selected field shows the count of all selected tables. You won't be able to edit the selection later under Table View.

Note: Fetching information for all of the tables might take a long time.

- Under Rules, define rules to specify a subset of source tables to replicate. If you use rules, you'll be able to individually select or deselect tables and the columns in selected tables under Table View. You'll also be able to set an option for trimming spaces in character data. To add rules, go to 8.

7If you want to include database views as sources, select the Include Views check box to the right of the Refresh icon. This check box is available for initial load tasks that have a Db2 for i, Db2 for LUW, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, or Teradata source. It’s also available for incremental load and combined initial and incremental load tasks that have a Db2 for LUW source and use the Query-based CDC method.

The views will be fetched and included in the Tables Selected count and in the list of table names.

8To add a table-selection rule, first make sure that the Select All check box is cleared. Then perform the following substeps:

aUnder Rules, click the Add Rule (+) icon above the first table. A row is added to the table.
bIn the Table Rule column, select Include or Exclude to create an inclusion or exclusion rule, respectively.
cIn the Condition column, enter a table name or a table-name mask that includes one or more wildcards to identify the source tables to include in or exclude from table selection. Use the following guidelines:

▪ A mask can contain one or both of the following wildcards: an asterisk (*) wildcard to represent one or more characters and a question mark (?) wildcard to represent a single character. A wildcard can occur multiple times in a mask value and can occur anywhere in the value.
▪ The task wizard is case sensitive. Enter the table names or masks in the case with which the tables were defined.
▪ Do not include delimiters such as quotation marks or brackets, even if the source database uses them. For example, Oracle requires quotation marks around lowercase and mixed-case names to force the names to be stored in lowercase or mixed case. However, in the task wizard, you must enter the lowercase or mixed-case names without quotation marks.
▪ If a table name includes special characters such as a backslash (\), asterisk(*), dollar sign ($), caret (^), or question mark (?), escape each special character with a backslash (\) when you enter the rule.

dDefine additional rules as needed.

If you define multiple Include and Exclude rules, they'll be processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order. For an example of using multiple rules, see Example of rules for selecting source tables.

eWhen finished, click Apply Rules.

The Total Tables Selected and Table View counts are updated. If you click the Refresh icon, the Tables Affected count for each rule is shown.

The following image shows multiple rules defined on the Source page:

After you apply rules, if you add, delete, or change rules, you must click Apply Rules again. Click the Refresh icon to update the table counts. If you delete all rules without clicking Apply Rules, a validation error will occur at deployment, even if the Table View list still lists tables. If you switch to Select All, the rules are no longer in effect and disappear.

9To perform trim actions on character columns in the source tables selected based on rules, create column action rules in the second "Action" table under Rules.

Note: You cannot create column action rules for MongoDB sources.

aClick the Add Rule (+) icon above the second table.
bIn the Action column, select one of the following options:

▪ LTRIM. Trims spaces to the left of character column values.
▪ RTRIM. Trims spaces to the right of character column values.
▪ TRIM. Trims spaces to the left of and to the right of character column values.

cIn the Condition column, enter a column name or a column name mask that includes one or more asterisk (*) or question mark (?) wildcards. The value is matched against columns in the selected source tables to identify the columns to which the action applies.

Note: You can define multiple rules for different action types or for the same action type with different conditions. The rules are processed in the order in which they're listed, from top to bottom. Use the arrow icons to change the order.

10Under Table View, view or edit the set of selected source tables and columns.

If you selected Select All, the lists of tables and columns are view only.

If you applied rules, you can refine the set of selected tables by clicking the check box next to individual tables. Deselect any tables that you do not want to replicate, or select additional items to replicate. Click the Refresh icon to update the selected tables count.

For Db2 for i, Oracle and SQL Server sources only, you can also individually deselect or reselect the columns in a selected source table. To view or change the columns from which data will be replicated for a selected table, click the highlighted number of columns in the Columns column. The column names and data types are displayed to the right. By default, all columns are selected for a selected source table. To deselect a column or reselect it, click the check box next to the column name. You cannot deselect a primary key column.

The following image shows selected tables and the selected columns for the first table:

Notes:

- To search for tables or columns or both, in the drop-down list above the columns list, select Table Name, Columns, or All and then enter a search string in the Find box and click Search. You can include a single asterisk (*) wildcard at the beginning or end of the string.
- The first time you change a check box setting for a table or column, the rules are no longer in effect. The selections under Table View take precedence. However, if you click the Add Rule (+) icon again, any tables that you deselected or selected individually are reflected as new rules in the Rules list and the rules once again take precedence. If you want to return to the Table View list, click Apply Rules again.
- In the Table View section, the length of character columns is shown in bytes. Since the number of bytes per character differs depending on the character set encoding used by the database, for some sources, the actual column length in number of characters might be different than the number of bytes shown in the Table View.
- If you select columns individually, the resulting set of columns is fixed and will not be updated by any schema change, regardless of schema drift settings. For example, if a source column is added or renamed, that column is silently excluded from CDC processing because it's not in the list of selected columns. However, if a selected column is dropped on the source, the schema drift Drop Column option controls how it's handled. The dropped column operation is not reflected in the list of columns until you apply rules again.
- For combined initial and incremental load jobs, if you add to or modify a column selection, a resync operation is triggered automatically. The resync is required to ensure that the target table has the same values as the source. For incremental load jobs, which do not have a resync option, modifying a column selection results in an error. If you modify the column selection of a deployed job and then redeploy it, the source and target will no longer match.

Note: A resync operation is not triggered, and no error is reported, if you remove a selected column. Removal of a selected column is treated the same as a drop column event, triggering your drop column schema drift settings for the task.

11 If you are defining an incremental load or combined initial and incremental load task that has a Db2 for i, Db2 for z/OS, Microsoft SQL Server, Oracle, PostgreSQL, SAP HANA, or SAP HANA Cloud source and one or more of the selected source tables are not enabled for change data capture, you can generate a script for enabling CDC and then run or download the script.

Note: If you selected Query-based as the CDC method for Db2 for LUW, Oracle, or SQL Server sources, the CDC Script field is not available because it is not applicable for the query-based change capture method.

aIn the CDC Script field, select one of the following options:

▪ Enable CDC for all columns. Enables CDC for all columns in the selected source tables. This option is the only valid option for a Db2 for i, Db2 for z/OS, PostgreSQL, SAP HANA, SAP HANA Cloud, or SQL Server source.

Note: For source tables without a primary key, including any tables with unique indexes, CDC is enabled for all columns by default, regardless of which option is selected.

▪ Enable CDC for primary key columns. Enables CDC only for primary key columns in the selected source tables. Do not use this option for a Db2 for i, Db2 for z/OS, PostgreSQL, or SQL Server source or for any task that has a Google Big Query target.

The script enables CDC in the following ways, depending on the source type:

▪ For Db2 for i sources, the script enables journaling on the source tables.
▪ For Db2 for z/OS sources, the script sets DATA CAPTURE CHANGES for source tables and certain Db2 catalog tables needed for CDC. After DATA CAPTURE CHANGES is set for one job, all other jobs recognize that the attribute is enabled in Db2 for the catalog tables needed because the Db2 catalog tables are a set of tables shared by all users of Db2.
▪ For Microsoft SQL Server sources, the script runs the sys.sp_cdc_enable_db and sys.sp_cdc_enable_table stored procedures to enable CDC on the source database and tables. For RDS for SQL Server, the script runs the msdb.dbo.rds_cdc_enable_db procedure to enable CDC on the source database, and the sys.sp_cdc_enable_table script to track CDC for tables.
▪ For Oracle sources, the script enables supplemental logging for all or primary key columns in the selected source tables to log additional information in the redo logs.
▪ For PostgreSQL sources, the script sets REPLICATION IDENTITY FULL on the selected source tables to write all column values to the WAL file. The script also creates a replication slot of the type of pgoutput or wal2json. If the slot type is pgoutput, the script also creates the publication and adds tables to it. If you have a high availability environment with a primary database and replica, edit the script to remove the statement for creating the replication slot and then run the edited script on the primary database, outside of the wizard. Separately, run the statement for creating the replication slot, extracted from the original script, on the replica.
▪ For SAP HANA and SAP HANA Cloud sources, the script creates the required PKLOG, PROCESSED, and _CDC shadow tables. The script also creates three triggers and a sequence for each selected source table.

bTo run the script, click Execute.

If you do not have a database role or privilege that allows you to run the script, click the Download icon to download the script. The script file name has the following format: cdc_script_taskname_number.txt. Then ask your database administrator to run the script.

Make sure the script runs before you run the database ingestion and replication task.

Note: If you change to the CDC Script option later and run the script again, the script first drops CDC for the original set of columns and then enables CDC for the current set of columns. For SAP HANA sources, if the PROCESSED and PKLOG tables already exist, they are omitted from the new script. If the shadow _CDC table and triggers already exist for any selected table, the SQL statements for creating those objects are commented out in the new script.

12For Microsoft SQL Server sources, complete the following fields:

- In the Capture Filegroup field, enter the name of the filegroup to be used for the change table that is created for the capture. If you leave this field empty, the change table is located in the default filegroup of the database.
- In the Gating Role field, enter the name of the database role that is used to gate access to change data. If you leave this field empty, the database does not use the gating role.

13To create and download a list of the source tables that match the table selection criteria, perform the following substeps:

aIf you used rule-based table selection, in the List Tables by Rule Type list, select the type of selection rules that you want to use. Options are:

▪ Include Rules Only
▪ Exclude Rules Only
▪ Include And Exclude Rules

bTo include columns in the list, regardless of which table selection method you used, select the Include Columns check box.

Note: This option is not available for MongoDB sources.

cClick the Download icon.

A downloaded list that includes columns has the following format:

status,schema_name,table_name,object_type,column_name,comment

The following table describes the information that is displayed in the downloaded list:

Field	Description
status	Indicates whether Database Ingestion and Replication excludes the source table or column from processing because it has an unsupported type. Valid values are: - E. The object is excluded from processing by an Exclude rule. - I. The object is included in processing. - X. The object is excluded from processing because it is an unsupported type of object. For example, unsupported types of objects include columns with unsupported data types and tables that include only unsupported columns. The comment field provides detail on unsupported types.
schema_name	Specifies the name of the source schema.
table_name	Specifies the name of the source table.
object_type	Specifies the type of the source object. Valid values are: - C. Column. - T. Table.
column_name	Specifies the name of the source column. This information appears only if you selected the Columns check box.
comment	Specifies the reason why a source object of an unsupported type is excluded from processing even though it matches the selection rules.

14Under Advanced, set the advanced properties that are available for your source type and load type.

Property	Source and Load Type	Description
Disable Flashback	Oracle sources - Initial loads	Select this check box to disable Database Ingestion and Replication use of Oracle Flashback when fetching data from the database. This check box is selected by default for new initial load tasks. For existing initial load tasks, this check box is cleared by default, which causes Oracle Flashback to remain enabled. For tasks that have partitioning enabled, this check box is automatically selected and unavailable for editing.
Include LOBs	Oracle sources: - All load types to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets. Incremental loads and combined loads can use either the Log-based or Query-based CDC method. Db2 for LUW sources: - All load types to Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, or Snowflake targets. Incremental loads and combined loads must use the Query-based CDC method. PostgreSQL sources: - Initial loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets. - Incremental loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka-enabled Azure Event Hubs, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets. - Combined initial and incremental loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft AzureSynapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets. SQL Server sources: - Initial loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets. - Incremental loads to Kafka-enabled Azure Event Hubs, Databricks, PostgreSQL, Snowflake, and SQL Server targets. Disabled if you selected the Query-based CDC method. - Combined initial and incremental loads to Databricks, PostgreSQL, Snowflake, and SQL Server targets. Disabled if you selected the Query-based CDC method.	Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target. LOB data types: - For Db2 for LUW: BLOB, CLOB, DBCLOB, LONG VARCHAR, LONG VARCHAR FOR BIT, LONG VARGRAPHIC, and XML - For Oracle: BLOB, CLOB, NCLOB, LONG, LONG RAW, and XML - For PostgreSQL: BYTEA, TEXT, and XML plus some other potentially large types such as JSON, JSONB - For SQL Server: GEOGRAPHY, GEOMETRY, IMAGE, NTEXT, NVARCHAR(MAX), TEXT, VARBINARY(MAX), VARCHAR(MAX), and XML LOB data might be truncated, primarily depending on the maximum size that the target allows. Target-side truncation points: - BLOB, BYTEA, GEOGRAPHY, GEOMETRY, IMAGE, LONG RAW, LONG VARCHAR FOR BIT, or VARBINARY(MAX) columns are truncated before being written to BINARY columns on the target. - For Amazon S3, Databricks, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Oracle, Oracle Cloud Object Storage, Microsoft Fabric OneLake, PostgreSQL, and SQL Server targets, the data is truncated to 16777216 bytes. - For Amazon Redshift targets, the data is truncated to 1024000 bytes. - For Microsoft Azure Synapse Analytics targets, the data is truncated to 1000000 bytes. - For Google BigQuery and Snowflake targets, the data is truncated to 8388608 bytes. - CLOB, DBCLOB, NCLOB, LONG, LONG VARCHAR, LONG VARGRAPHIC, TEXT, NTEXT, NVARCHAR(MAX), RAW, VARCHAR(MAX), or XML columns are truncated before being written to VARCHAR columns on the target. - For Amazon S3, Databricks, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Oracle, Microsoft Fabric OneLake, Oracle Cloud Object Storage, PostgreSQL, and Snowflake targets, the data is truncated to 16777216 bytes. - For Amazon Redshift targets, the data is truncated to 65535 bytes. - For Google BigQuery targets, the data is truncated to 8388608 bytes. - For Microsoft Azure Synapse Analytics targets, the data is truncated to 500000 bytes. - For SQL Server targets, CLOB, TEXT and VARCHAR(MAX) data is truncated to 16777216 bytes, NCLOB, NTEXT and NVARCHAR(MAX) data to 33554432 bytes, and XML data to 33554442 bytes. - For Azure Event Hubs targets, the overall record maximum size is 1 MB. If the record size exceeds 1 MB, Azure Event Hubs generates an error, and the task fails. Source-side truncation considerations: - For Db2 for LUW and Oracle sources, you can specify the custom properties dbmiSourceBlobTruncationSize and dbmiSourceClobTruncationSize on the Source page to control the number of bytes at which truncation occurs for blob and clob types of data, respectively, when you want the data truncated at a point less than the maximum size that the target allows. - For PostgreSQL sources in incremental loads and combined loads, if large-object columns contain more than 1 MB of data, the data is truncated to 1 MB.
Enable Persistent Storage	All sources except Db2 for LUW (query-based CDC), MongoDB, Oracle (query-based CDC), PostgreSQL, SAP HANA, SAP HANA Cloud, and SQL Server (query-based CDC) - Incremental loads and combined initial and incremental loads. For Db2 for LUW, Oracle, and SQL Server sources that use the query-based CDC method, this field is not displayed because persistent storage is enabled by default and cannot be changed. For MongoDB, PostgreSQL, SAP HANA, and SAP HANA Cloud change data sources, the field is not displayed because persistent storage is enabled by default and cannot be changed.	Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed. Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
Enable Partitioning	Oracle sources - Initial loads and combined initial and incremental loads SAP HANA sources - Initial loads SQL Server sources - Initial loads and combined initial and incremental loads	Select this check box to enable partitioning of source objects. When an object is partitioned, the database ingestion and replication job processes the records read from each partition in parallel. For Oracle and SAP HANA sources, Database Ingestion and Replication determines the range of partitions by using the ROWID as the partition key. For Oracle sources, when you select the Enable Partitioning check box, the Disable Flashback check box is also automatically selected. For SQL Server sources, partitioning is based on the primary key. Note: In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase.
Number of Partitions	Oracle sources - Initial loads and combined initial and incremental loads SAP HANA sources - Initial loads SQL Server sources - Initial loads and combined initial and incremental loads	If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2.
Initial Start Point for Incremental Load	All sources - Incremental loads	Set this field if you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs. Options are: - Earliest Available. The earliest available position in the database log or structure where changes are stored. - For Db2 for i, the start of the current journal. - For Db2 for LUW, this option is not available. - For Db2 for z/OS, if Earliest Available is selected, Latest Available is used. - For MongoDB, this option is not available. - For MySQL, the earliest available record in the first binlog file. - For Oracle, this option is not available. - For PostgreSQL, the earliest available record in the replication slot. - For SAP HANA and SAP HANA Cloud, the earliest available record in the PKLOG table. - For SQL Server, the earliest available record in the active transaction log. This option is not available if the CDC Method is set to Query-based. - Latest Available. The latest available position in the database log or structure. - Position. A position in the change stream from which you want the database ingestion job to start retrieving change records. The position value is a Db2 for i sequence, Db2 for z/OS LRSN, Oracle SCN, PostgreSQL LSN, SAP HANA sequence value, or SQL Server LSN. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which results in the following start positions: - For Db2 for i, do not use the default value of 0. - For Db2 for LUW, this option is not available. - For Db2 for z/OS, the value of 0 causes the latest available point to be used. - For MongoDB and MySQL, this option is not available. - For Oracle, the value of 0 causes the latest available point to be used. This option is not available if the CDC Method is set to Query-based. - For PostgreSQL, the value of 0 causes the earliest available point to be used. - For SAP HANA, the value of 0 causes the earliest available point to be used. - For SQL Server, the value of 0 causes the earliest available point to be used. A non-zero LSN that predates the beginning of the active transaction log causes data to be read from the CDC tables instead of from the transaction log. This option is not available if the CDC Method is set to Query-based. - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM\|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail. For MySQL sources, this option is not available. The default is Latest Available.
Fetch Size	MongoDB - Initial loads and incremental loads	For a MongoDB source, the number of records that a database ingestion and replication job must read at a single time from the source. Valid values are 1 to 2147483647. The default is 5000.

15Under Custom Properties, you can specify custom properties that Informatica provides to meet your special requirements. To add a property, in the Create Property fields, enter the property name and value. Then click Add Property.

Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).

Tip: To delete a property, click the Delete icon at the right end of the property row in the list.

16Click Next.

Example of rules for selecting source tables

When you define a source for a database ingestion and replication task, you can optionally define table selection rules to select a subset of the source tables in the specified schema. This simple example demonstrates how to use selection rules to select the tables you want.

Assume that 2984 tables are in the source schema. You want to exclude the tables from which you do not need to replicate data.

Define the following rules in the order shown:

The rules are processed from top to bottom.

•Rule 1 includes all source tables in the schema.
•Rule 2 excludes the source tables that have names beginning with "NS1".
•Rule 3 excludes the source tables that have names beginning with "REP_".
•Rule 4 excludes the source tables that end with "MAP_1".

After clicking the Refresh icon, the Tables Selected field shows 2289 tables, which indicates you filtered out 695 tables.

Configuring the target

Configure the target on the Target page of the database ingestion and replication task wizard.

1In the Connection list, select a connection for the target type. The connection type appears in parentheses after the connection name.

You must have previously defined the connection in Administrator for the runtime environment.

The list includes only the connection types that are valid for the load type selected on the Definition page. No connections are listed if you did not select a load type.

Note: After you deploy the database ingestion and replication task, you cannot change the connection without first undeploying the associated job. You must then deploy the task again.

2In the Target section, configure the properties that pertain to your target type.

For descriptions of these properties, see the following topics:

3If you want to rename the target objects that are associated with the selected source tables, define table renaming rules.

For example, you can add a prefix such as TGT_. For more information, see Renaming tables on the target.

4If you want to override the default mappings of source data types to target data types, define data type rules.

This feature is supported only for tasks that have an Oracle (any load type) source and an SQL-based target type. For more information, see Customizing data type mappings.

5Under Custom Properties, you can specify custom properties that Informatica provides to meet your special requirements. To add a property, in the Create Property fields, enter the property name and value. Then click Add Property.

Tip: To delete a property, click the Delete icon button at the right end of the property row in the list.

6Click Next if available, or click Save.

Renaming tables on the target

When you configure a target with an existing schema, you can optionally define rules for renaming the target tables that correspond to the selected source tables.

For target messaging systems, such as Apache Kafka, the rule renames the table name in the output messages.

To create a rule for renaming tables:

1Under Table Renaming Rules, in the Create Rule fields, enter a source table name or a table name mask that includes one or more wildcards. Then enter the corresponding target table name or table name mask.
2Click Add Rule.

The rule appears in the rules list.

You can define multiple table rules. The order of the rules does not matter with regard to how they are processed unless a table matches multiple rules. In this case, the last matching rule determines the name of the table.

To delete a rule, click the Delete icon at the right end of the rule row.

Example:

Assume that you want to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables. Enter the following values:

•For the source, enter only the asterisk (*) wildcard character to specify all of the selected source tables.
•For the target, enter PROD_* to add this prefix to the names of all target tables that match the source tables by name.

Customizing data type mappings

When you configure a target for a database ingestion and replication task, you can optionally define data-type mapping rules to override the default mappings of source data types to target data types.

The default mappings are described in Default Data Type Mappings.

This feature is supported for tasks that have the following source and target combinations:

•Oracle sources with Databricks, Google BigQuery, Microsoft Azure Synapse Analytics, Oracle, SQL Server, and Snowflake targets
•SQL Server sources with Snowflake targets

For example, you can create a data-type rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision, instead of using the default mapping to the Snowflake VARCHAR(255) data type.

To create a data-type mapping rule:

1Expand Data Type Rules.
2In the Create Rule fields, enter a source data type and the target data type that you want to map it to.

In the Source field only, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10).

3Click Add Rule.

The rule appears in the list of rules.

To delete a rule, click the Delete icon at the right end of the rule row.

After you deploy a task with custom mapping rules, you cannot edit the rules until the task is undeployed.

Usage notes:

•In general, binary data types cannot be mapped to character data types.
•If you define multiple data-type rules for the same source data type with the same length or same precision and scale values, you will not be able to save the database ingestion and replication task.
•If you define multiple data-type rules for the same source data type but use the % wildcard to represent the length or precision and scale value in one rule and a specific length or precision and scale value in the second rule, the rule that contains the specific value is processed first, before the rule with the % wildcard. For example, if you map the source data types FLOAT(84) and FLOAT(%), the FLOAT(84) rule is processed first and then the FLOAT(%) rule is processed to cover any other FLOAT source columns with different sizes.
•If a source data type requires a length or precision and scale value, make sure that you set the required attribute by using the % wildcard or a specific value, for example, VARCHAR(%) or VARCHAR(10).
•If you define an invalid mapping, an error message is written to the log. You can then correct the mapping error, with assistance from your DBA if necessary.
•For Oracle sources, you must use the data types that are returned by the following query for the source object:

select dbms_metadata.get_ddl('TABLE', 'YOUR_TABLE_NAME','TABLE_OWNER_NAME') from dual;

•Database Ingestion and Replication does not support the BYTE and CHAR semantics in data-type mappings rules.
•If a source data type has a default precision, you must specify it in your rule. For example, you must use TIMESTAMP(6) instead of TIMESTAMP.

Amazon Redshift target properties

When you define a database ingestion and replication task that has an Amazon Redshift target, you must enter some target properties on the Target tab of the task wizard.

The following table describes the Amazon Redshift target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Bucket	Specifies the name of the Amazon S3 bucket that stores, organizes, and controls access to the data objects that you load to Amazon Redshift.
Data Directory or Task Target Directory	Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Cycle ID	Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_.
Enable Case Transformation	By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
Case Transformation Strategy	If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names: - Same as source. Use the same case as the source table (or object) names and column (or field) names. - UPPERCASE. Use all uppercase. - lowercase. Use all lowercase. The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.

Amazon S3 target properties

When you define a database ingestion and replication task that has an Amazon S3 target, you must enter some target properties on the Target tab of the task wizard.

Under Target, you can enter the following Amazon S3 target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO - PARQUET The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Encryption type	Select the encryption type for the Amazon S3 files when you write the files to the target. Options are: - None - Client Side Encryption - Client Side Encryption with KMS - Server Side Encryption - Server Side Encryption with KMS The default is None, which means no encryption is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Parquet Compression Type	If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are: - None - Gzip - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Add Directory Tags	For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
Task Target Directory	For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
Connection Directory as Parent	Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries: - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. - Specific directory names. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
Schema Directory	Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
Cycle Completion Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
Cycle Contents Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
Use Cycle Partitioning for Data Directory	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Use Cycle Partitioning for Summary Directories	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
List Individual Files in Contents	For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.

Under Advanced, you can enter the following Amazon S3 advanced target properties, which primarily apply to incremental loads:

Property	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected.

Databricks target properties

When you define a database ingestion and replication task that has a Databricks target, you must enter some target properties on the Target tab of the task wizard.

The following table describes the Databricks target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column. Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
Data Directory or Task Target Directory	Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_.
Create Unmanaged Tables	Select this check box if you want the task to create Databricks target tables as unmanaged tables. After you deploy the task, you cannot edit this field to switch to managed tables. By default, this option is cleared and managed tables are created. If you selected either Personal Staging Location or Volume in the Staging Environment field in the selected Databricks target connection, this check box is disabled. You cannot use unmanaged tables in this situation. For more information about Databricks managed and unmanaged tables, see the Databricks documentation.
Unmanaged Tables Parent Directory	If you choose to create Databricks unmanaged tables, you must specify a parent directory in Amazon S3 or Microsoft Azure Data Lake Storage to hold the Parquet files that are generated for each target table when captured DML records are processed. Note: To use Unity Catalog, you must provide an existing external directory.
Staging File Format	Select the format of the staging files in the staging environment specified in the Databricks connection. The files hold data before it's loaded into Databricks tables. Format options are: - CSV - Parquet Default is CSV.

Flat File target properties

When you define a database ingestion and replication task, you must enter some properties for your Flat File target on the Target page of the task wizard.

Note: For flat file targets, these properties apply to initial load jobs only.

Under Target, you can enter the following Flat File target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. Note: For Flat File targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected.
Connection Directory as Parent	For initial load tasks, select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. The parent directory is used in the Data Directory and Schema Directory.
Schema Directory	For initial load tasks, you can specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. This field is optional. The schema is stored in the data directory by default. For incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema. You can use the same placeholders as for the Data Directory field. Ensure the placeholders are enclosed in curly brackets { }.

Under Advanced, you can enter the following advanced target properties:

Property	Description
Add Operation Type	Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target. For initial loads, the job always writes "I" for insert. By default, this check box is cleared.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected.

Google BigQuery target properties

When you define a database ingestion and replication task that has a Google BigQuery target, you must enter some target properties on the Target tab of the task wizard.

The following table describes the Google BigQuery target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column. Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
Bucket	Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage.
Data Directory or Task Target Directory	Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Last Replicated Time	Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected.
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target tables. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target tables. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target table. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target table. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_.
Enable Case Transformation	By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
Case Transformation Strategy	If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names: - Same as source. Use the same case as the source table (or object) names and column (or field) names. - UPPERCASE. Use all uppercase. - lowercase. Use all lowercase. The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.

Google Cloud Storage target properties

When you define a database ingestion and replication task that has a Google Cloud Storage target, you must enter some target properties on the Target tab of the task wizard.

Under Target, you can enter the following Google Cloud Storage target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO - PARQUET The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Parquet Compression Type	If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are: - None - Gzip - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Add Directory Tags	For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
Bucket	Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage.
Task Target Directory	For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries: - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. - Specific directory names. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
Schema Directory	Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
Cycle Completion Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
Cycle Contents Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
Use Cycle Partitioning for Data Directory	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Use Cycle Partitioning for Summary Directories	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
List Individual Files in Contents	For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.

Under Advanced, you can enter the following Google Cloud Storage advanced target properties, which are primarily for incremental load jobs:

Field	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected.

Kafka target properties

When you define a database ingestion and replication task, you must enter some properties for your Kafka target on the Target page of the task wizard.

These properties apply to incremental load operations only.

The following table describes the Kafka target properties that appear under Target:

Property	Description
Use Table Name as Topic Name	Indicates whether Database Ingestion and Replication writes messages that contain source data to separate topics, one for each source table, or writes all messages to a single topic. Select this check box to write messages to separate table-specific topics. The topic names match the source table names, unless you add the source schema name, a prefix, or a suffix in the Include Schema Name, Table Prefix, or Table Suffix properties. By default, this check box is cleared. With the default setting, you must specify the name of the single topic to which all messages are written in the Topic Name property.
Include Schema Name	When Use Table Name as Topic Name is selected, this check box appears and is selected by default. This setting adds the source schema name in the table-specific topic names. The topic names then have the format schemaname_tablename. If you do not want to include the schema name, clear this check box.
Table Prefix	When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a prefix to add to the table-specific topic names. For example, if you specify myprefix_, the topic names have the format myprefix_tablename. If you omit the underscore (_) after the prefix, the prefix is prepended to the table name.
Table Suffix	When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a suffix to add to the table-specific topic names. For example, if you specify _mysuffix, the topic names have the format tablename_mysuffix. If you omit the underscore (_) before the suffix, the suffix is appended to the table name.
Topic Name	If you do not select Use table name as topic name, you must enter the name of the single Kafka topic to which all messages that contain source data will be written.
Output Format	Select the format of the output file. Options are: - CSV - AVRO - JSON The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. If your Kafka target uses Confluent Schema Registry to store schemas for incremental load jobs, you must select AVRO as the format.
JSON Format	If JSON is selected as the output format, select the level of detail of the output. Options are: - Concise. This format records only the most relevant data in the output, such as the operation type and the column names and values. - Verbose. This format records detailed information, such as the table name and column types.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - Binary - JSON - None The default value is Binary. If you have a Confluent Kafka target that uses Confluent Schema Registry to store schemas, select None. Otherwise, Confluent Schema Registry does not register the schema. Do not select None if you are not using Confluent Scheme Registry.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. If a source schema change is expected to alter the target, the Avro schema definition file is regenerated with a unique name that includes a timestamp, in the following format: schemaname_tablename_YYYYMMDDhhmmss.txt This unique naming pattern ensures that older schema definition files are preserved for audit purposes.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.

Under Advanced, you can enter the following advanced target properties:

Property	Description
Add Operation Type	Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target. The job writes "I" for insert, "U" for update, or "D" for delete. By default, this check box is selected.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. By default, this check box is not selected.
Async Write	Controls whether to use synchronous delivery of messages to Kafka. - Clear this check box to use synchronous delivery. Kafka must acknowledge each message as received before Database Ingestion and Replication sends the next message. In this mode, Kafka is unlikely to receive duplicate messages. However, performance might be slower. - Select this check box to use asynchronous delivery. Database Ingestion and Replication sends messages as soon as possible, without regard for the order in which the changes were retrieved from the source. By default, this check box is selected.
Producer Configuration Properties	Specify a comma-separated list of key=value pairs to enter Kafka producer properties for Apache Kafka, Confluent Kafka, Amazon Managed Streaming for Apache Kafka (MSK), or Kafka-enabled Azure Event Hubs targets. If you have a Confluent target that uses Confluent Schema Registry to store schemas, you must specify the following properties: schema.registry.url=url, key.serializer=org.apache.kafka.common.serialization.StringSerializer, value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer You can specify Kafka producer properties in either this field or in the Additional Connection Properties field in the Kafka connection. If you enter the producer properties in this field, the properties pertain to the database ingestion jobs associated with this task only. If you enter the producer properties for the connection, the properties pertain to jobs for all tasks that use the connection definition, unless you override the connection-level properties for specific tasks by also specifying properties in the Producer Configuration Properties field. For information about Kafka producer properties, see the Apache Kafka, Confluent Kafka, Amazon MSK, or Azure Event Hubs documentation.

Microsoft Azure Data Lake Storage target properties

When you define a database ingestion and replication task that has a Microsoft Azure Data Lake Storage target, you must enter some target properties on the Target page of the task wizard.

Under Target, you can enter the following Microsoft Azure Data Lake Storage target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO - PARQUET The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Parquet Compression Type	If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are: - None - Gzip - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Add Directory Tags	For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
Task Target Directory	For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
Connection Directory as Parent	Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries: - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. - Specific directory names. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
Schema Directory	Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
Cycle Completion Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
Cycle Contents Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
Use Cycle Partitioning for Data Directory	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Use Cycle Partitioning for Summary Directories	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
List Individual Files in Contents	For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.

Under Advanced, you can enter the following advanced target properties, which primarily pertain to incremental load and combined load jobs:

Field	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected.

Microsoft Azure Synapse Analytics target properties

When you define a database ingestion and replication task, you must enter some properties for your Microsoft Azure Synapse Analytics target on the Target page of the task wizard.

These properties apply to initial load, incremental load, and combined initial and incremental load operations.

The following table describes the target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables. The schema name that is specified in the connection properties is displayed by default. Because this field is case sensitive, ensure that you entered the schema name in the connection properties in the correct case.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Last Replicated Time	Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_.

Microsoft Fabric OneLake target properties

When you define a database ingestion and replication task that has a Microsoft Fabric OneLake target, you must enter some target properties on the Target page of the task wizard.

Under Target, you can enter the following Microsoft Fabric OneLake target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO - PARQUET The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Parquet Compression Type	If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are: - None - Gzip - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Add Directory Tags	For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
Task Target Directory	For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries: - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. - Specific directory names. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
Schema Directory	Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
Cycle Completion Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
Cycle Contents Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
Use Cycle Partitioning for Data Directory	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Use Cycle Partitioning for Summary Directories	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
List Individual Files in Contents	For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.

Microsoft SQL Server target properties

When you define a database ingestion and replication task, you must enter some properties for your Microsoft SQL Server target on the Target page of the task wizard.

The following table describes the target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column. Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Last Replicated Time	Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected.
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Cycle ID	Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_.

Oracle target properties

When you define a database ingestion and replication task, you must enter some properties for your Oracle target on the Target page of the task wizard. The properties vary slightly by load type.

The following table describes the Oracle target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, and generated ascending sequence number. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.

The following table describes the advanced target properties that you can set under Advanced. All properties except Add Last Replicated Time and Prefix for Metadata Columns are available only if you set Apply Mode to Audit.

Field	Description
Add Last Replicated Time	Select this check box to add a metadata column that records the timestamp in UTC format at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected.
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. The job writes "I" for insert, "U" for update, or "D" for delete. By default, this check box is selected.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target table. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target tables. The sequence number reflects the change stream position of the operation. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. By default, this check box is not selected.
Add Cycle ID	Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_.
Enable Case Transformation	By default, target table names and column names are generated in the same case as the corresponding source names. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
Case Transformation Strategy	If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names: - Same as source. Use the same case as the source table (or object) names and column (or field) names. - UPPERCASE. Use all uppercase. - lowercase. Use all lowercase. The default value is Same as source.

Oracle Cloud Object Storage target properties

When you define a database ingestion and replication task that has an Oracle Cloud Object Storage target, you must enter some target properties on the Target tab of the task wizard.

Under Target, you can enter the following Oracle Cloud Object Storage target properties:

Property	Description
Output Format	Select the format of the output file. Options are: - CSV - AVRO - PARQUET The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
Add Headers to CSV File	If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
Avro Format	If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are: - Avro-Flat. This Avro schema format lists all Avro fields in one record. - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields. - Avro-Nested. This Avro schema format organizes each type of information in a separate record. The default value is Avro-Flat.
Avro Serialization Format	If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are: - None - Binary - JSON The default value is Binary.
Avro Schema Directory	If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced.
File Compression Type	Select a file compression type for output files in CSV or AVRO output format. Options are: - None - Deflate - Gzip - Snappy The default value is None, which means no compression is used.
Avro Compression Type	If AVRO is selected as the output format, select an Avro compression type. Options are: - None - Bzip2 - Deflate - Snappy The default value is None, which means no compression is used.
Parquet Compression Type	If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are: - None - Gzip - Snappy The default value is None, which means no compression is used.
Deflate Compression Level	If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
Task Target Directory	For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
Add Directory Tags	For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
Connection Directory as Parent	Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
Data Directory	For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries: - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target. - Specific directory names. - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase. Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries: - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. - Specific directory names. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
Schema Directory	Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
Cycle Completion Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
Cycle Contents Directory	For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
Use Cycle Partitioning for Data Directory	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
Use Cycle Partitioning for Summary Directories	For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
List Individual Files in Contents	For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.

Under Advanced, you can enter the following advanced target properties to add metadata columns for each delete operation or each DML change recorded in the audit table.

Field	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected.
Add Before Images	Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected.

PostgreSQL target properties

When you define a database ingestion and replication task, you must enter some properties for your PostgreSQL target on the Target page of the task wizard.

The following table describes the target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column. Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: The Audit and Soft Deletes apply modes are supported for jobs that have an Oracle source. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_.

Snowflake Data Cloud target properties

When you define a database ingestion and replication task, you must enter some properties for your Snowflake Data Cloud target on the Target page of the task wizard. The properties vary slightly by load type.

The following table describes the Snowflake target properties that appear under Target:

Property	Description
Target Creation	The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
Schema	Select the target schema in which Database Ingestion and Replication creates the target tables.
Stage	The name of internal staging area that holds the data read from the source before the data is written to the target tables. This name must not include spaces. If the staging area does not exist, it will be automatically created. Note: This field is not available if you selected the Superpipe option in the Advanced Target Properties.
Apply Mode	For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are: - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target. - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column. Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.

The following table describes advanced target properties that appear under Advanced:

Property	Description
Add Last Replicated Time	Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp, except for Snowflake targets that use the Superpipe option where minutes and seconds might vary slightly. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected.
Add Operation Type	Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes.
Add Operation Time	Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Owner	Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
Add Operation Transaction Id	Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected.
Add Operation Sequence	Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Before Images	Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected.
Add Cycle ID	Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected. Note: If you select this option, you can't also select the Superpipe option for the Snowflake target.
Prefix for Metadata Columns	Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_.
Superpipe	Select this check box to use the Snowpipe Streaming API to quickly stream rows of data directly to Snowflake Data Cloud target tables with low latency instead of first writing the data to stage files. This option is available for all load types. When you configure the target connection, select KeyPair authentication. By default, this check box is selected. Deselect it if you want to write data to intermediate stage files.
Merge Frequency	When Superpipe is selected, you can optionally set the frequency, in seconds, at which change data rows are merged and applied to the Snowflake target tables. The merge frequency affects how often the stream change data is merged to the Snowflake base table. A Snowflake view joins the stream change data with the base table. Set this value to balance the costs of merging data to the base table with the performance of view join processing. This field applies to incremental load and combined initial and incremental load tasks. Valid values are 60 through 604800 seconds. Default is 3600 seconds.
Enable Case Transformation	By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
Case Transformation Strategy	If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names: - Same as source. Use the same case as the source table (or object) names and column (or field) names. - UPPERCASE. Use all uppercase. - lowercase. Use all lowercase. The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.

Configuring schedule and runtime options

On the Schedule and Runtime Options page of the database ingestion and replication task wizard, you can specify a schedule for running initial load jobs periodically and configure runtime options for jobs of any load type.

1Under Advanced, optionally edit the Number of Rows in Output File value to specify the maximum number of rows that the database ingestion and replication task writes to an output data file.

Note: Advanced options are not displayed for jobs that have an Apache Kafka target.

For incremental load operations and combined initial and incremental load operations, change data is flushed to the target either when this number of rows is reached or when the flush latency period expires and the job is not in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is internally set to 10 seconds and cannot be changed.

Valid values are 1 through 100000000. The default value for Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Infrastructure (OCI) Object Storage targets is 1000 rows. For the other targets, the default value is 100000 rows.

Note: For Microsoft Azure Synapse Analytics targets, the data is first sent to a Microsoft Azure Data Lake Storage staging file before being written to the target tables. After data is written to the target, the entire contents of the table-specific directory that includes the staging files are emptied. For Snowflake targets, the data is first stored in an internal stage area before being written to the target tables.

2For initial load jobs only, optionally clear the File Extension Based on File Type check box if you want the output data files for Flat File, Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, or Oracle Cloud Object Storage targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.

Note: For incremental load jobs with these target types, this option is not available. Database Ingestion and Replication always uses output file-name extensions based on file type.

3For database ingestion and replication incremental load tasks that have Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Fabric OneLake, or Oracle Cloud Object Storage targets, configure the following apply cycle options:

Option	Description
Apply Cycle Interval	Specifies the amount of time that must elapse before a database ingestion and replication job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank. The default value is 15 minutes.
Apply Cycle Change Limit	Specifies the total number of records in all tables of a database ingestion and replication job that must be processed before the job ends an apply cycle. When this record limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target. The default value is 10000 records. Note: During startup, jobs might reach this limit more frequently than the apply cycle interval if they need to catch up on processing a backlog of older data.
Low Activity Flush Interval	Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before a database ingestion and replication job ends an apply cycle. When this time limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target. If you do not specify a value for this option, a database ingestion and replication job ends apply cycles only after either the Apply Cycle Change Limit or Apply Cycle Interval limit is reached. No default value is provided.

4Under Schema Drift Options, if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.

Schema drift options are supported for the following source - target combinations and load types:

Source	Load Type	Target
Db2 for i	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Db2 for LUW	Incremental Combined initial and incremental	Snowflake
Db2 for z/OS, except Db2 11	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, Snowflake, and SQL Server
Microsoft SQL Server	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
Oracle	Incremental Combined initial and incremental	Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
PostgreSQL	Incremental Combined initial and incremental	Incremental loads: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, and Snowflake Combined initial and incremental loads: Oracle, PostgreSQL, and Snowflake

The types of supported DDL operations are:

- Add Column
- Modify Column
- Drop Column
- Rename Column

Note: The Modify Column and Rename Column options are not supported and not displayed for database ingestion jobs that have Google BigQuery targets.

The following table describes the schema drift options that you can set for a DDL operation type:

Option	Description
Ignore	Do not replicate DDL changes that occur on the source database to the target. For Amazon Redshift, Kafka, Microsoft Azure Synapse Analytics, PostgreSQL, Snowflake and SQL Server targets, this option is the default option for the Drop Column and Rename Column operation types. For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, and Oracle Cloud Object Storage targets that use the CSV output format, the Ignore option is disabled. For the AVRO output format, this option is enabled.
Replicate	Replicate the DDL operation to the target. For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets, this option is the default option for all operation types. For other targets, this option is the default option for the Add Column and Modify Column operation types.
Stop Job	Stop the entire database ingestion job.
Stop Table	Stop processing the source table on which the DDL change occurred. When one or more of the tables are excluded from replication because of the Stop Table schema drift option, the job state changes to Running with Warning. Important: The database ingestion and replication job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you need to stop the job and resynchronize the source and target objects that the job stopped processing. After the job is stopped, you can use the Resume With Options > Resync option to resynchronize tables that are in the Error state because of the Stop Table operation. For more information, see Overriding schema drift options when resuming a database ingestion and replication job.

5For incremental load jobs that have an Apache Kafka target, configure the following checkpointing options:

Option	Description
Checkpoint All Rows	Indicates whether a database ingestion and replication job performs checkpoint processing for every message that is sent to the Kafka target. Note: If this check box is selected, the Checkpoint Every Commit, Checkpoint Row Count, and Checkpoint Frequency (secs) options are ignored.
Checkpoint Every Commit	Indicates whether a database ingestion and replication job performs checkpoint processing for every commit that occurs on the source.
Checkpoint Row Count	Specifies the maximum number of messages that a database ingestion and replication job sends to the target before adding a checkpoint. If you set this option to 0, a database ingestion and replication job does not perform checkpoint processing based on the number of messages. If you set this option to 1, a database ingestion and replication jobs add a checkpoint for each message.
Checkpoint Frequency (secs)	Specifies the maximum number of seconds that must elapse before a database ingestion and replication job adds a checkpoint. If you set this option to 0, a database ingestion and replication job does not perform checkpoint processing based on elapsed time.

6Under Schedule, if you want to run job instances for an initial load task based on an existing schedule instead of manually starting the job after it is deployed from one of the monitoring interfaces, select Run this task based on a schedule and then select a predefined schedule. The default option is Do not run this task based on a schedule.

This field is unavailable for incremental load and combined initial and incremental load tasks.

You can view and edit the schedule options in Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task.

If the schedule criteria for running the job is met but the previous job run is still active, Database Ingestion and Replication skips the new job run.

7Under Taskflow, if you want to make the database ingestion and replication task available in Data Integration to add to a taskflow, select Execute in Taskflow. You can then include the task as an event source in a taskflow and add transformations to transform the ingested data. Available for initial load and incremental load tasks with Snowflake targets that don't use the Superpipe option

8Under Custom Properties, you can specify custom properties that Informatica provides to meet your special requirements. To add a property, in the Create Property fields, enter the property name and value. Then click Add Property.

Tip: To delete a property, click the Delete icon button at the right end of the property row in the list.

9Click Save.