Property | Description |
---|---|
Name | A name for the database ingestion and replication task. Task names can contain Latin alphanumeric characters, spaces, periods (.), commas (,), underscores (_), plus signs (+), and hyphens (-). Task names cannot include other special characters. Task names are not case sensitive. Maximum length is 50 characters. Note: If you include spaces in the database ingestion and replication task name, after you deploy the task, the spaces do not appear in the corresponding job name. |
Location | The project or project\folder that will contain the task definition. The default is the currently selected project or project subfolder in Explore. If a project or project subfolder is not selected, the default is the Default project. |
Runtime Environment | The runtime environment in which you want to run the task. The runtime environment must be a Secure Agent group that consists of one or more Secure Agents. A Secure Agent is a lightweight program that runs tasks and enables secure communication. For database ingestion and replication tasks, the Cloud Hosted Agent is not supported and does not appear in the Runtime Environment list. Serverless runtime environments are also not supported. Tip: Click the Refresh icon to refresh the list of runtime environments. |
Description | An optional description for the task. Maximum length is 4,000 characters. |
Load Type | The type of load operation that the database ingestion and replication task performs. Options are:
Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data. |
Field | Description |
---|---|
Replication Slot Name | The unique name of a PostgreSQL replication slot. A slot name can contain Latin alphanumeric characters in lowercase and the underscore (_) character. Maximum length is 63 characters. Important: Each database ingestion and replication task must use a different replication slot. |
Replication Plugin | The PostgreSQL replication plug-in. Options are:
|
Publication | If you selected pgoutput as the replication plug-in, specify the publication name that this plug-in uses. Note: This field is not displayed if you selected wal2json as the replication plug-in. |
Method | Supported Sources | Description |
---|---|---|
CDC Tables | SQL Server only | Read data changes directly from the SQL Server CDC tables. For SQL Server sources, this method provides the best replication performance and highest reliability of results. |
Log-based | Oracle and SQL Server | Capture Inserts, Updates, Deletes, and column DDL changes in near real time by reading the database transaction logs. For Oracle sources, data changes are read from the Oracle redo logs. For SQL Server sources, data changes are read from the SQL Server transaction log and the enabled SQL Server CDC tables. Exception: For Azure SQL Database sources, data changes are read from CDC tables only. |
Query-based | Db2 for LUW, Oracle, and SQL Server | Capture Inserts and Updates by using a SQL WHERE clause that points to a CDC query column. The query column is used to identify the rows that contain the changes made to the source tables since the beginning of the CDC interval. For Db2 for LUW sources in incremental load and initial and incremental load jobs, this capture method is the only available option. |
Field | Description |
---|---|
status | Indicates whether Database Ingestion and Replication excludes the source table or column from processing because it has an unsupported type. Valid values are:
|
schema_name | Specifies the name of the source schema. |
table_name | Specifies the name of the source table. |
object_type | Specifies the type of the source object. Valid values are:
|
column_name | Specifies the name of the source column. This information appears only if you selected the Columns check box. |
comment | Specifies the reason why a source object of an unsupported type is excluded from processing even though it matches the selection rules. |
Property | Source and Load Type | Description |
---|---|---|
Disable Flashback | Oracle sources - Initial loads | Select this check box to disable Database Ingestion and Replication use of Oracle Flashback when fetching data from the database. The use of Oracle Flashback requires users to be granted the EXECUTE ON DBMS_FLASHBACK privilege, which is not necessary for initial loads. This check box is selected by default for new initial load tasks. For existing initial load tasks, this check box is cleared by default, which causes Oracle Flashback to remain enabled. For tasks that have partitioning enabled, this check box is automatically selected and unavailable for editing. |
Include LOBs | Oracle sources:
Incremental loads and combined loads can use either the Log-based or Query-based CDC method. However, jobs that use theLog-based CDC method do not replicate data from LONG, LONG RAW, and XML columns to the generated target columns. Db2 for LUW sources:
PostgreSQL sources:
SQL Server sources:
| Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target. LOB data types:
LOB data might be truncated, primarily depending on the maximum size that the target allows. Target-side truncation points:
Source-side truncation considerations:
|
Enable Persistent Storage | All sources except Db2 for LUW (query-based CDC), MongoDB, Oracle (query-based CDC), PostgreSQL, SAP HANA, SAP HANA Cloud, and SQL Server (query-based CDC) - Incremental loads and combined initial and incremental loads. For Db2 for LUW, Oracle, and SQL Server sources that use the query-based CDC method, this field is not displayed because persistent storage is enabled by default and cannot be changed. For MongoDB, PostgreSQL, SAP HANA, and SAP HANA Cloud change data sources, the field is not displayed because persistent storage is enabled by default and cannot be changed. | Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed. Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job. |
Enable Partitioning | Oracle sources - Initial loads and combined initial and incremental loads SQL Server sources - Initial loads and combined initial and incremental loads | Select this check box to enable partitioning of source objects. When an object is partitioned, the database ingestion and replication job processes the records read from each partition in parallel. For Oracle sources, database ingestion and replication determines the range of partitions by using the ROWID as the partition key. Also, when you select the Enable Partitioning check box, the Disable Flashback check box is automatically selected. For SQL Server sources, partitioning is based on the primary key. Note: In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase. |
Number of Partitions | Oracle sources - Initial loads and combined initial and incremental loads SQL Server sources - Initial loads and combined initial and incremental loads | If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2. |
Initial Start Point for Incremental Load | All sources - Incremental loads | Set this field if you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs. Options are:
For MySQL sources, this option is not available. The default is Latest Available. |
Fetch Size | MongoDB - Initial loads and incremental loads | For a MongoDB source, the number of records that a database ingestion and replication job must read at a single time from the source. Valid values are 1 to 2147483647. The default is 5000. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Bucket | Specifies the name of the Amazon S3 bucket that stores, organizes, and controls access to the data objects that you load to Amazon Redshift. |
Data Directory or Task Target Directory | Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job. |
Property | Description |
---|---|
Enable Case Transformation | By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option. |
Case Transformation Strategy | If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Encryption type | Select the encryption type for the Amazon S3 files when you write the files to the target. Options are:
The default is None, which means no encryption is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Parquet Compression Type | If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Add Directory Tags | For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default. |
Task Target Directory | For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields. |
Connection Directory as Parent | Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties. |
Schema Directory | Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema. |
Cycle Completion Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed. |
Cycle Contents Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents. |
Use Cycle Partitioning for Data Directory | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure. |
Use Cycle Partitioning for Summary Directories | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories. |
List Individual Files in Contents | For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle. |
Property | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Data Directory or Task Target Directory | Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job. |
Property | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_. |
Create Unmanaged Tables | Select this check box if you want the task to create Databricks target tables as unmanaged tables. After you deploy the task, you cannot edit this field to switch to managed tables. By default, this option is cleared and managed tables are created. For more information about Databricks managed and unmanaged tables, see the Databricks documentation. |
Unmanaged Tables Parent Directory | If you choose to create Databricks unmanaged tables, you must specify a parent directory in Amazon S3 or Microsoft Azure Data Lake Storage to hold the Parquet files that are generated for each target table when captured DML records are processed. Note: To use Unity Catalog, you must provide an existing external directory. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. Note: For Flat File targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. |
Connection Directory as Parent | For initial load tasks, select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. The parent directory is used in the Data Directory and Schema Directory. |
Schema Directory | For initial load tasks, you can specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. This field is optional. The schema is stored in the data directory by default. For incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema. You can use the same placeholders as for the Data Directory field. Ensure the placeholders are enclosed in curly brackets { }. |
Property | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target. For initial loads, the job always writes "I" for insert. By default, this check box is cleared. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Bucket | Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage. |
Data Directory or Task Target Directory | Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job. |
Property | Description |
---|---|
Add Last Replicated Time | Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected. |
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target tables. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target tables. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target table. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target table. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_. |
Enable Case Transformation | By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option. |
Case Transformation Strategy | If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Parquet Compression Type | If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Add Directory Tags | For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default. |
Bucket | Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage. |
Task Target Directory | For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties. |
Schema Directory | Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema. |
Cycle Completion Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed. |
Cycle Contents Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents. |
Use Cycle Partitioning for Data Directory | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure. |
Use Cycle Partitioning for Summary Directories | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories. |
List Individual Files in Contents | For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle. |
Field | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected. |
Property | Description |
---|---|
Use Table Name as Topic Name | Indicates whether Database Ingestion and Replication writes messages that contain source data to separate topics, one for each source table, or writes all messages to a single topic. Select this check box to write messages to separate table-specific topics. The topic names match the source table names, unless you add the source schema name, a prefix, or a suffix in the Include Schema Name, Table Prefix, or Table Suffix properties. By default, this check box is cleared. With the default setting, you must specify the name of the single topic to which all messages are written in the Topic Name property. |
Include Schema Name | When Use Table Name as Topic Name is selected, this check box appears and is selected by default. This setting adds the source schema name in the table-specific topic names. The topic names then have the format schemaname_tablename. If you do not want to include the schema name, clear this check box. |
Table Prefix | When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a prefix to add to the table-specific topic names. For example, if you specify myprefix_, the topic names have the format myprefix_tablename. If you omit the underscore (_) after the prefix, the prefix is prepended to the table name. |
Table Suffix | When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a suffix to add to the table-specific topic names. For example, if you specify _mysuffix, the topic names have the format tablename_mysuffix. If you omit the underscore (_) before the suffix, the suffix is appended to the table name. |
Topic Name | If you do not select Use table name as topic name, you must enter the name of the single Kafka topic to which all messages that contain source data will be written. |
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. If your Kafka target uses Confluent Schema Registry to store schemas for incremental load jobs, you must select AVRO as the format. |
JSON Format | If JSON is selected as the output format, select the level of detail of the output. Options are:
|
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. If you have a Confluent Kafka target that uses Confluent Schema Registry to store schemas, select None. Otherwise, Confluent Schema Registry does not register the schema. Do not select None if you are not using Confluent Scheme Registry. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. If a source schema change is expected to alter the target, the Avro schema definition file is regenerated with a unique name that includes a timestamp, in the following format: schemaname_tablename_YYYYMMDDhhmmss.txt This unique naming pattern ensures that older schema definition files are preserved for audit purposes. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Property | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target. The job writes "I" for insert, "U" for update, or "D" for delete. By default, this check box is selected. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. By default, this check box is not selected. |
Async Write | Controls whether to use synchronous delivery of messages to Kafka.
By default, this check box is selected. |
Producer Configuration Properties | Specify a comma-separated list of key=value pairs to enter Kafka producer properties for Apache Kafka, Confluent Kafka, Amazon Managed Streaming for Apache Kafka (MSK), or Kafka-enabled Azure Event Hubs targets. If you have a Confluent target that uses Confluent Schema Registry to store schemas, you must specify the following properties: schema.registry.url=url, key.serializer=org.apache.kafka.common.serialization.StringSerializer, value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer You can specify Kafka producer properties in either this field or in the Additional Connection Properties field in the Kafka connection. If you enter the producer properties in this field, the properties pertain to the database ingestion jobs associated with this task only. If you enter the producer properties for the connection, the properties pertain to jobs for all tasks that use the connection definition, unless you override the connection-level properties for specific tasks by also specifying properties in the Producer Configuration Properties field. For information about Kafka producer properties, see the Apache Kafka, Confluent Kafka, Amazon MSK, or Azure Event Hubs documentation. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Parquet Compression Type | If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Add Directory Tags | For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default. |
Task Target Directory | For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields. |
Connection Directory as Parent | Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties. |
Schema Directory | Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema. |
Cycle Completion Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed. |
Cycle Contents Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents. |
Use Cycle Partitioning for Data Directory | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure. |
Use Cycle Partitioning for Summary Directories | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories. |
List Individual Files in Contents | For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle. |
Field | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. The schema name that is specified in the connection properties is displayed by default. Because this field is case sensitive, ensure that you entered the schema name in the connection properties in the correct case. |
Property | Description |
---|---|
Add Last Replicated Time | Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Parquet Compression Type | If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Add Directory Tags | For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default. |
Task Target Directory | For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties. |
Schema Directory | Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema. |
Cycle Completion Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed. |
Cycle Contents Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents. |
Use Cycle Partitioning for Data Directory | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure. |
Use Cycle Partitioning for Summary Directories | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories. |
List Individual Files in Contents | For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Property | Description |
---|---|
Add Last Replicated Time | Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected. |
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. Do not include special characters in the prefix. Otherwise, task deployment will fail. The default value is INFA_. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Field | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. The job writes "I" for insert, "U" for update, or "D" for delete. By default, this check box is selected. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target table. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target tables. The sequence number reflects the change stream position of the operation. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_. |
Property | Description |
---|---|
Output Format | Select the format of the output file. Options are:
The default value is CSV. Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field. |
Add Headers to CSV File | If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file. |
Avro Format | If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
The default value is Avro-Flat. |
Avro Serialization Format | If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
The default value is Binary. |
Avro Schema Directory | If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern: schemaname_tablename.txt Note: If this directory is not specified, no Avro schema definition file is produced. |
File Compression Type | Select a file compression type for output files in CSV or AVRO output format. Options are:
The default value is None, which means no compression is used. |
Avro Compression Type | If AVRO is selected as the output format, select an Avro compression type. Options are:
The default value is None, which means no compression is used. |
Parquet Compression Type | If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
The default value is None, which means no compression is used. |
Deflate Compression Level | If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0. |
Task Target Directory | For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties. This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields. |
Add Directory Tags | For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default. |
Connection Directory as Parent | Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory. This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory. |
Data Directory | For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
Note: Placeholder values are not case sensitive. Examples: myDir1/{SchemaName}/{TableName} myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp} myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp} The default directory pattern is {TableName)_{Timestamp}. For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example. The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties. |
Schema Directory | Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional. For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }. If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)} Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema. |
Cycle Completion Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed. |
Cycle Contents Directory | For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents. |
Use Cycle Partitioning for Data Directory | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory. If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure. |
Use Cycle Partitioning for Summary Directories | For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories. |
List Individual Files in Contents | For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory. If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date. If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle. |
Field | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target. For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert. By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target. For initial loads, the job always writes the current date and time. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target. For initial loads, the job always writes "INFA" as the owner. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. For initial loads, the job always writes "1" as the ID. By default, this check box is not selected. |
Add Before Images | Select this check box to include UNDO data in the output that a job writes to the target. For initial loads, the job writes nulls. By default, this check box is not selected. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: The Audit and Soft Deletes apply modes are supported for jobs that have an Oracle source. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Property | Description |
---|---|
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_. |
Property | Description |
---|---|
Target Creation | The only available option is Create Target Tables, which generates the target tables based on the source tables. Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances. |
Schema | Select the target schema in which Database Ingestion and Replication creates the target tables. |
Stage | The name of internal staging area that holds the data read from the source before the data is written to the target tables. This name must not include spaces. If the staging area does not exist, it will be automatically created. Note: This field is not available if you selected the Superpipe option in the Advanced Target Properties. |
Apply Mode | For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes. Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target. The default value is Standard. Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard. |
Property | Description |
---|---|
Add Last Replicated Time | Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp, except for Snowflake targets that use the Superpipe option where minutes and seconds might vary slightly. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target. By default, this check box is not selected. |
Add Operation Type | Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when the Apply Mode option is set to Audit or Soft Deletes. In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column. In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values. By default, this check box is selected. You cannot deselect it if you are using soft deletes. |
Add Operation Time | Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Owner | Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. This property is not available for jobs that have a MongoDB or PostgreSQL source. Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method. |
Add Operation Transaction Id | Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations. This field is available only when Apply Mode is set to Audit or Soft Deletes. By default, this check box is not selected. |
Add Operation Sequence | Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Add Before Images | Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null. This field is available only when Apply Mode is set to Audit. By default, this check box is not selected. |
Prefix for Metadata Columns | Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns. The default value is INFA_. |
Superpipe | Select this check box to use the Snowpipe Streaming API to quickly stream rows of data directly to Snowflake Data Cloud target tables with low latency instead of first writing the data to stage files. This option is available for all load types. When you configure the target connection, select KeyPair authentication. By default, this check box is selected. Deselect it if you want to write data to intermediate stage files. |
Merge Frequency | When Superpipe is selected, you can optionally set the frequency, in seconds, at which change data rows are merged and applied to the Snowflake target tables. This field applies to incremental load and combined initial and incremental load tasks. Valid values are 60 through 604800. Default is 3600 seconds. |
Enable Case Transformation | By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option. Note: This check box is not available if you selected the Superpipe option. You cannot enable case transformation if you are using the Superpipe option for Snowflake. |
Case Transformation Strategy | If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
The default value is Same as source. Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case. |
Option | Description |
---|---|
Apply Cycle Interval | Specifies the amount of time that must elapse before a database ingestion and replication job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank. The default value is 15 minutes. |
Apply Cycle Change Limit | Specifies the total number of records in all tables of a database ingestion and replication job that must be processed before the job ends an apply cycle. When this record limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target. The default value is 10000 records. Note: During startup, jobs might reach this limit more frequently than the apply cycle interval if they need to catch up on processing a backlog of older data. |
Low Activity Flush Interval | Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before a database ingestion and replication job ends an apply cycle. When this time limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target. If you do not specify a value for this option, a database ingestion and replication job ends apply cycles only after either the Apply Cycle Change Limit or Apply Cycle Interval limit is reached. No default value is provided. |
Source | Load Type | Target |
---|---|---|
Db2 for i | Incremental Combined initial and incremental | Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server |
Db2 for LUW | Incremental Combined initial and incremental | Snowflake |
Db2 for z/OS, except Db2 11 | Incremental Combined initial and incremental | Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, Snowflake, and SQL Server |
Microsoft SQL Server | Incremental Combined initial and incremental | Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server |
Oracle | Incremental Combined initial and incremental | Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server |
PostgreSQL | Incremental Combined initial and incremental | Incremental loads: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, and Snowflake Combined initial and incremental loads: Oracle, PostgreSQL, and Snowflake |
Option | Description |
---|---|
Ignore | Do not replicate DDL changes that occur on the source database to the target. For Amazon Redshift, Kafka, Microsoft Azure Synapse Analytics, PostgreSQL, Snowflake and SQL Server targets, this option is the default option for the Drop Column and Rename Column operation types. For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, and Oracle Cloud Object Storage targets that use the CSV output format, the Ignore option is disabled. For the AVRO output format, this option is enabled. |
Replicate | Replicate the DDL operation to the target. For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets, this option is the default option for all operation types. For other targets, this option is the default option for the Add Column and Modify Column operation types. |
Stop Job | Stop the entire database ingestion job. |
Stop Table | Stop processing the source table on which the DDL change occurred. When one or more of the tables are excluded from replication because of the Stop Table schema drift option, the job state changes to Running with Warning. Important: The database ingestion and replication job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options > Resync option. For more information, see Overriding schema drift options when resuming a database ingestion and replication job. |
Option | Description |
---|---|
Checkpoint All Rows | Indicates whether a database ingestion and replication job performs checkpoint processing for every message that is sent to the Kafka target. Note: If this check box is selected, the Checkpoint Every Commit, Checkpoint Row Count, and Checkpoint Frequency (secs) options are ignored. |
Checkpoint Every Commit | Indicates whether a database ingestion and replication job performs checkpoint processing for every commit that occurs on the source. |
Checkpoint Row Count | Specifies the maximum number of messages that a database ingestion and replication job sends to the target before adding a checkpoint. If you set this option to 0, a database ingestion and replication job does not perform checkpoint processing based on the number of messages. If you set this option to 1, a database ingestion and replication jobs add a checkpoint for each message. |
Checkpoint Frequency (secs) | Specifies the maximum number of seconds that must elapse before a database ingestion and replication job adds a checkpoint. If you set this option to 0, a database ingestion and replication job does not perform checkpoint processing based on elapsed time. |