Database Ingestion and Replication > Database Ingestion and Replication > Configuring a database ingestion and replication task
  

Configuring a database ingestion and replication task

In Data Integration, use the database ingestion and replication task wizard to configure a database ingestion and replication task.
For a database ingestion and replication initial load or incremental load task that has a Snowflake target, you can enable the task to integrate with Data Integration taskflows. For more information, see Integrating Database Ingestion and Replication Tasks with Data Integration Taskflows.
To configure a task, click the Ingest panel on the Home page and then complete the following configuration steps in the task wizard:
  1. 1Choose a runtime environment, if you haven't set up a default runtime environment.
  2. 2Select a destination connection, or configure a new connection.
  3. 3Select a source connection, or configure a new connection.
  4. 4Specify task details for the source and target.
  5. 5Configure trim transformations or row-level filters. (optional)
  6. 6Finalize the task definition by entering a task name, task definition location, runtime environment, and some optional properties. Then save the task..
Click Next or Back to navigate from one page to another. At any point, you can click Save to save the information that you have entered so far.
After you complete all wizard pages, save the information and then click Deploy to make the task available as an executable job to the Secure Agent.

Before you begin

Before you begin, complete the following prerequisite tasks in Administrator:
Also, if you plan to perform incremental load operations with Oracle sources, ensure that the ORACLE_HOME environment variable is defined on the Secure Agent system.

Starting the ingestion and replication task wizard

If the latest task wizard is enabled for your organization, you can start the wizard from the Home page.
Start the wizard in one of the following ways:
Note: File Ingestion and Replication and Streaming Ingestion and Replication use the pre-existing wizard.

Primary cloud data warehouse setup

From the Data Integration Home page, you can configure the primary cloud data warehouse destination where you normally load data.
When you do this, the application ingestion and replication tasks and database ingestion and replication tasks that you create in the new wizard are automatically configured to load data to this destination. You can still change the destination if you need to.
The cloud data warehouse that you choose applies to the organization that you're currently logged into. If you have access to multiple organizations, you can configure a different primary cloud data warehouse for each organization and sub-organization.
The setup steps vary based on whether you've already configured a primary cloud data warehouse. If you've already configured one, you can change or deselect it.

Configuring a primary cloud data warehouse

Configure a primary cloud data warehouse from the Home page.
    1On the Home page, click Yes, let's go in the Do you use a cloud data warehouse as your primary destination? panel.
    2On the Destination page, select your cloud data warehouse type, for example, Snowflake Data Cloud or Databricks Delta, and click Next.
    3On the Connect page, select a connection, or click New and enter the connection properties.
    4Click Connect.

Changing or unselecting a primary cloud data warehouse

If you’ve already configured a primary cloud data warehouse, you can change or unselect it. Change or unselect a primary cloud data warehouse from the Home page.
    1On the Home page, click the cloud data warehouse type in the upper right corner and select Change primary cloud data warehouse.
    2If you want to change your primary cloud data warehouse, select I have a primary cloud data warehouse.
    3To change the cloud data warehouse type, complete the following steps:
    1. aClick Change next to Type.
    2. bOn the Destination page, select the data warehouse type, and then click Next.
    3. cOn the Connect page, select a connection, or click New and enter the connection properties.
    4. dClick Connect.
    4To change the connection, complete the following steps:
    1. aClick Change next to Connection.
    2. bOn the Connect page, select a connection, or click New and enter the connection properties.
    3. cClick Connect.
    5If you no longer wish to use a primary cloud data warehouse, select I don’t have a primary cloud data warehouse, and click Save.

Choose a runtime environment

The first thing you must do after starting the task wizard is to select the runtime environment to use for retrieving the source and target metadata required to define the task.
Note: A runtime environment must have previously been configured with one or more Secure Agents in Administrator.
    1In the Choose Runtime Environment dialog box, select the runtime environment you want to use.
    Select Set as default if you want to use this runtime environment as the default environment for all tasks you create. Otherwise, leave the check box cleared.
    2Click OK.
    Note: When you finalize the task definition on the Let's Go page, you'll be prompted to enter the runtime environment for running the task. You can use this same runtime environment or select another one.

Configure the destination connection

On the Destination page, select an existing destination connection or add a new one.
This page displays boxes for destination connections that you previously defined from the task wizard or from Administrator.
Note: To add a new connection from the Destination page of the new wizard, you must have previously created at least one connection in Administrator.
Perform one of the following actions:
To manage your connections, go to Administrator.
Tip: As you proceed through the wizard, you can click Save to save your task entries under the generated task name at the top of the page to the Default location. On the last page of the wizard, you'll be able to enter a custom name and location for the task.

Configure the source connection

On the Source page, select an existing source connection or add a new one.
Note: To add a new connection from the Source page, you must have previously created at least one connection in Administrator.
Perform one of the following actions:

Task Details: Configure how to replicate data from the source

In Step 1 of Task Details, configure the data source.
Under Source Properties, set the required basic source properties. Under Source Tables, select the source tables and columns from which to replicate data. Then, under Advanced Source Properties, set optional advanced source properties as needed. See the property descriptions for your source type:

Configure a Db2 for i source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Load type
    Description
    Load Type
    All
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    All
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    Journal Name
    Incremental and initial and incremental loads
    The name of the Db2 for i journal that records the changes made to the source tables.
    2Under Source Tables, select the source tables and columns that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and columns and define selection rules. If you first manually select tables and columns on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables and columns selected by those rules are displayed as selected on the Selected Tables tab. Details appear when you expand the table in the Selection Rules tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    CDC Script
    Incremental and initial and incremental loads
    Generate a script for enabling CDC on source tables and then run or download the script. The only available option is Enable CDC for all columns.
    For Db2 for i sources, the script enables journaling on the source tables.
    Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    Enable Persistent Storage
    Incremental and initial and incremental loads
    Note: The property is not available if you are using a serverless runtime environment.
    Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
    Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
    Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
    If you select this check box and also select Stage CDC Data, this check box is cleared and becomes unavailable.
    Job Exit Token
    Incremental and initial and incremental loads
    A unique identifier that's sent to the journal receiver exit on the IBM i machine to prevent the journal receivers from being deleted while change data capture is processing them. The exit program locks the journal receivers for the duration of time that they're in use for CDC.
    Initial Start Point for Incremental Load
    Incremental loads
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For Db2 for i, the start of the current journal.
    • - Latest Available. The latest available position in the database log or structure.
    • - Position. The Db2 for i sequence in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail.
    • For Db2 for i, do not use the default value of 0.
      If you select the Stage CDC Data option, this option is not available.
    • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    • If you select the Stage CDC Data option, this option is not available.
    The default is Latest Available.
    Stage CDC Data
    Incremental and initial and incremental loads
    If you want to read data from the source database in a single pass and then write the data to common storage so that that it can be read by multiple tasks that process the same database, select this check box. The staged data can then be read by the tasks that are in the staging group. For a log-based source, the tasks can process different tables with different schemas.
    Staging Group
    Incremental and initial and incremental loads
    If you want to use a CDC staging group to stage CDC data for multiple tasks, either select a previously defined group from the list or click New to create one and proceed to step 4.
    4If you chose to create a new staging group, complete the following fields on the Overview tab of the New Staging Group dialog box:
    Property
    Description
    Group Name
    Accept the generated group name, which has the format Log_group_cdc_yyyymmdd<number>, or enter a custom name.
    Location
    The project in which you want to store the staging group definition, if you don't want to use the Default location.
    Staging Location Connection
    Select a connection to the cloud staging location, which can be in Amazon S3, Google Cloud Storage, or Microsoft Azure Data Lake Storage Gen2.
    Runtime Environment
    Select the runtime environment to use for running the CDC staging task.
    For a log-based source, ensure that the Secure Agent on which the task will run can access all of the source logs.
    Enable Alternate Connection for Reading Logs
    Do not select this option for Db2 for i sources even though it's available.
    Journal Name
    Enter the name of the journal that records the changes. You must use the same journal name for all tasks in the staging group.
    Job Exit Token
    A unique identifier for each CDC job in the group that's sent to the journal receiver exit on the IBM i machine to prevent the journal from being deleted while the reader is still reading it. The exit program locks the journal for the duration of time that it's in use by the reader. You must use the same job exit token value for all tasks in the staging group.
    Row Flush Threshold
    The maximum number of rows that can be written to the files that store data temporarily before the data is transferred to cloud storage. The data is flushed to cloud storage either when this number of rows is reached or when the flush interval expires.
    Default is 50000 rows.
    Flush Interval
    During periods of low change activity on the source, the number of minutes and seconds that a job waits for more change data before flushing the data in the temporary files to cloud storage. The data is flushed either when this interval expires or when the row flush threshold is met.
    Default is 30 seconds.
    Log Start Point
    The Db2 for i journal from which the CDC staging job starts reading change records the first time it runs. Options are:
    • - Latest Available
    • - Earliest Available
    • - Position.Then enter an appropirate position in the change stream for your source type.
    • - Specific Date and Time. Then specify the date and time.
    Note: These options are similar to those described for Initial Start Point for Incremental Load in the source properties.
    Default is Latest Available.
    Staging Data Retention Period
    The number of days to retain data in cloud storage. Valid values are 0-365. Default is 14 days.
    After the retention period expires, the data is purged and is no longer available for subsequent restarts.
    If you want to add source and target custom properties for the CDC staging jobs in the group, click the Custom Properties tab. Then enter each property name and value. Typically, you use custom properties only at the direction of Informatica Global Customer Support or Informatica technical staff.
    When done, click Save to save the staging group and return to the Task Details, Step 1 page.
    Note: If you edit the staging group later from a deployed task in the group, you can change only the Row Flush Threshold, Flush Interval, Staging Data Retention Period, and custom property values.
    5Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    If you configured a CDC staging group, any custom properties that you enter here for the source are ignored. Instead, enter custom properties for the staging group.
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed to Step 2 of Task Details.

Configure a Db2 for LUW source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    1. aUnder Change Data Capture Method, complete the following fields if you're defining an incremental load or combined initial and incremental load task:
    1. aComplete the following optional advanced properties as needed for your load type and environment:
    2. Property
      Load Type
      Description
      List Tables by Rule Type
      All
      Generate and download a list of the source tables that match the table selection criteria.
      If you used rule-based table selection, you can select the type of selection rules to use. Options are:
      • - Include Rules Only
      • - Exclude Rules Only
      • - Include And Exclude Rules
      Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
      Click the Download icon to download the list.
      Include LOBs
      All load types to Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, or Snowflake targets.
      Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target.
      LOB data types for Db2 for LUW source: BLOB, CLOB, DBCLOB, LONG VARCHAR, LONG VARCHAR FOR BIT, LONG VARGRAPHIC, and XML
      LOB data might be truncated on the target. For more information, see About LOB truncation.
      Initial Start Point for Incremental Load
      Incremental loads
      If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
      • - Latest Available. The latest available position in the database log or structure.
      • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
      The default is Latest Available.
    4Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Configure a Db2 for z/OS source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    CDC Script
    Incremental and Initial and incremental loads
    Generate a script for enabling CDC on source tables and then run or download the script. The only available option is Enable CDC for all columns.
    For Db2 for z/OS sources, the script sets DATA CAPTURE CHANGES for source tables and certain Db2 catalog tables needed for CDC. After DATA CAPTURE CHANGES is set for one job, all other jobs recognize that the attribute is enabled in Db2 for the catalog tables needed because the Db2 catalog tables are a set of tables shared by all users of Db2.
    Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    Enable Persistent Storage
    Incremental and Initial and incremental loads
    Note: The property is not available if you are using a serverless runtime environment.
    Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
    Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
    Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
    If you select this check box and also select Stage CDC Data, this check box is cleared and becomes unavailable.
    Initial Start Point for Incremental Load
    Incremental loads
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Latest Available. The latest available position in the database log or structure.
    • - Position. The Db2 for z/OS LRSN in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which causes the latest available point to be used.
    • If you select the Stage CDC Data option, this option is not available.
    • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    • If you select the Stage CDC Data option, this option is not available.
    The default is Latest Available.
    Stage CDC Data
    Incremental and Initial and incremental loads
    If you want to read data from the source database in a single pass and then write the data to common storage so that that it can be read by multiple tasks that process the same database, select this check box. The staged data can then be read by the tasks that are in the staging group. For a log-based source, the tasks can process different tables with different schemas.
    Staging Group
    Incremental and Initial and incremental loads
    If you want to use a CDC staging group to stage CDC data for multiple tasks, either select a previously defined group from the list or click New to create one and proceed to step 4.
    4If you chose to create a new staging group, complete the following fields on the Overview tab of the New Staging Group dialog box:
    Property
    Description
    Group Name
    Accept the generated group name, which has the format Log_group_cdc_yyyymmdd<number>, or enter a custom name.
    Location
    The project in which you want to store the staging group definition, if you don't want to use the Default location.
    Staging Location Connection
    Select a connection to the cloud staging location, which can be in Amazon S3, Google Cloud Storage, or Microsoft Azure Data Lake Storage Gen2.
    Runtime Environment
    Select the runtime environment to use for running the CDC staging task.
    For a log-based source, ensure that the Secure Agent on which the task will run can access all of the source logs.
    Enable Alternate Connection for Reading Logs
    Do not select this option for Db2 for z/OS sources even though it's available.
    Row Flush Threshold
    The maximum number of rows that can be written to the files that store data temporarily before the data is transferred to cloud storage. The data is flushed to cloud storage either when this number of rows is reached or when the flush interval expires.
    Default is 50000 rows.
    Flush Interval
    During periods of low change activity on the source, the number of minutes and seconds that a job waits for more change data before flushing the data in the temporary files to cloud storage. The data is flushed either when this interval expires or when the row flush threshold is met.
    Default is 30 seconds.
    Log Start Point
    The position in the source logs from which the CDC staging job starts reading change records the first time it runs. Options are:
    • - Latest Available
    • - Position.Then enter an appropirate position in the change stream for your source type.
    • - Specific Date and Time. Then specify the date and time.
    Note: These options are similar to those described for Initial Start Point for Incremental Load in the source properties.
    Default is Latest Available.
    Staging Data Retention Period
    The number of days to retain data in cloud storage. Valid values are 0-365. Default is 14 days.
    After the retention period expires, the data is purged and is no longer available for subsequent restarts.
    If you want to add source and target custom properties for the CDC staging jobs in the group, click the Custom Properties tab. Then enter each property name and value. Typically, you use custom properties only at the direction of Informatica Global Customer Support or Informatica technical staff.
    When done, click Save to save the staging group and return to the Task Details, Step 1 page.
    Note: If you edit the staging group later from a deployed task in the group, you can change only the Row Flush Threshold, Flush Interval, Staging Data Retention Period, and custom property values.
    5Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    If you configured a CDC staging group, any custom properties that you enter here for the source are ignored. Instead, enter custom properties for the staging group.
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed to Step 2 of Task Details.

Configure a Microsoft SQL Server source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Load type
    Description
    Load Type
    All
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    All
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    The schema name that is specified in the connection properties is displayed by default.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables and columns that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and columns and define selection rules. If you first manually select tables and columns on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables and columns selected by those rules are displayed as selected on the Selected Tables tab. Details appear when you expand the table in the Selection Rules tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    1. aUnder Change Data Capture Method, select a capture method if you're defining an incremental load or combined initial and incremental load task:
      1. aIn the CDC Method field, select one of the following options to indicate the method to use for capturing source changes:
      2. Method
        Description
        CDC Tables
        Read data changes directly from the SQL Server CDC tables.
        This method is the default option for SQL Server because it provides the best replication performance and highest reliability of results.
        Log-based
        Capture Inserts, Updates, and Deletes in near real time by reading the database transaction logs. Also capture DDL changes based on schema drift settings.
        For SQL Server sources, data changes are read from the SQL Server transaction log and the enabled SQL Server CDC tables. Exception: For Azure SQL Database sources, data changes are read from CDC tables only.
        Query-based
        Capture Inserts and Updates by using a SQL WHERE clause that points to a CDC query column. The query column is used to identify the rows that contain the changes made to the source tables since the beginning of the CDC interval.
        Note: This option is not availabe if you are using a serverless runtime environment.
        Note: If you plan to use a CDC staging group to stage change data for multiple tasks in cloud storage, you must use the CDC Tables or Log-based method. You can override the CDC method when you define the staging group.
      3. bIf you selected the Query-based option, also complete the following fields:
        • CDC Query Column Type. The column data type of the CDC query column in the source table. The column data type must combine the date and time. For SQL Server, the only available option is Datetime.
        • CDC Query Column Name. The case-sensitive name of the CDC query column in the source table. The column must be present in the source table. Maximum length is 70 characters.
        • CDC Interval. The frequency of a query-based change data capture cycle, expressed in days, hours, and minutes. You must enter a positive number in at least one of the interval fields. Otherwise, an error will be issued when you try to save the task. The default value is 5 minutes.
    1. aComplete the following optional advanced properties as needed for your load type and environment:
    2. Property
      Load Type
      Description
      CDC Script
      Incremental and initial and incremental loads
      Generate a script for enabling CDC on source tables and then run or download the script. Options are:
      • - Enable CDC for all columns. Enables CDC for all columns in the selected source tables.
      • You must use this option for a SQL Server source.
        Note: For source tables without a primary key, including any tables with unique indexes, CDC is enabled for all columns by default, regardless of which option is selected.
      • - Enable CDC for primary key columns. Enables CDC only for primary key columns in the selected source tables.
      For Microsoft SQL Server sources, the script runs the sys.sp_cdc_enable_db and sys.sp_cdc_enable_table stored procedures to enable CDC on the source database and tables. For RDS for SQL Server, the script runs the msdb.dbo.rds_cdc_enable_db procedure to enable CDC on the source database, and runs the sys.sp_cdc_enable_table script to track CDC for tables.
      Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
      Note: This field is not displayed if you selected the Query-based CDC method:
      Capture Filegroup
      Incremental and initial and incremental loads
      The name of the filegroup to be used for the change table that is created for the capture. If you leave this field empty, the change table is located in the default filegroup of the database.
      Gating Role
      Incremental and initial and incremental loads
      The name of the database role that is used to gate access to change data. If you leave this field empty, the database does not use the gating role.
      List Tables by Rule Type
      All
      Generate and download a list of the source tables that match the table selection criteria.
      If you used rule-based table selection, you can select the type of selection rules to use. Options are:
      • - Include Rules Only
      • - Exclude Rules Only
      • - Include And Exclude Rules
      Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
      Click the Download icon to download the list.
      Include LOBs
      Initial loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets.
      Incremental loads to Kafka-enabled Azure Event Hubs, Databricks, PostgreSQL, Snowflake, and SQL Server targets. Disabled if you selected the Query-based CDC method.
      Combined initial and incremental loads to Databricks, PostgreSQL, Snowflake, and SQL Server targets. Disabled if you selected the Query-based CDC method.
      Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target.
      LOB data types for SQL Server sources: GEOGRAPHY, GEOMETRY, IMAGE, NTEXT, NVARCHAR(MAX), TEXT, VARBINARY(MAX), VARCHAR(MAX), and XML
      LOB data might be truncated on the target. For more information, see About LOB truncation.
      Enable Persistent Storage
      Incremental and initial and incremental loads.
      Note: The property is not available if you are using a serverless runtime environment.
      For SQL Server sources that use the query-based CDC method, this field is not displayed because persistent storage is enabled by default and cannot be changed.
      Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
      Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
      Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
      If you select this check box and also select Stage CDC Data, this check box is cleared and becomes unavailable.
      Enable Partitioning
      Initial and initial and incremental loads
      Select this check box to enable partitioning of source objects. When an object is partitioned, the database ingestion and replication job processes the records read from each partition in parallel.
      For SQL Server sources, partitioning is based on the primary key.
      Note: In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase.
      Number of Partitions
      Initial and initial and incremental loads
      If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2.
      Initial Start Point for Incremental Load
      Incremental loads
      If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
      • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For SQL Server, the earliest available record in the active transaction log. This option is not available if the CDC Method is set to Query-based.
      • - Latest Available. The latest available position in the database log or structure.
      • - Position. The SQL Server LSN in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which causes the earliest available point to be used. A non-zero LSN that predates the beginning of the active transaction log causes data to be read from the CDC tables instead of from the transaction log.
      • This option is not available if the CDC Method is set to Query-based.
        If you select the Stage CDC Data option, this option is not available.
      • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
      • If you select the Stage CDC Data option, this option is not available.
      The default is Latest Available.
      Stage CDC Data
      Incremental and initial and incremental loads
      If you want to read data from the source database in a single pass and then write the data to common storage so that that it can be read by multiple tasks that process the same database, select this check box. The staged data can then be read by the tasks that are in the staging group. For a log-based source, the tasks can process different tables with different schemas.
      Staging Group
      Incremental and initial and incremental loads
      If you want to use a CDC staging group to stage CDC data for multiple tasks, either select a previously defined group from the list or click New to create one and proceed to step 4.
    4If you chose to create a new staging group, complete the following fields on the Overview tab of the New Staging Group dialog box:
    Property
    Description
    Group Name
    Accept the generated group name, which has the format Log_group_cdc_yyyymmdd<number>, or enter a custom name.
    Location
    The project in which you want to store the staging group definition, if you don't want to use the Default location.
    Staging Location Connection
    Select a connection to the cloud staging location, which can be in Amazon S3, Google Cloud Storage, or Microsoft Azure Data Lake Storage Gen2.
    Runtime Environment
    Select the runtime environment to use for running the CDC staging task.
    For a log-based source, ensure that the Secure Agent on which the task will run can access all of the source logs.
    Enable Alternate Connection for Reading Logs
    Select this check box if you want to use a source connection other than the one selected on the Source page to read data from source logs. For example, if you perform initial loads and CDC, you might want to select a connection with additional privileges required for CDC.
    Alternate Connection for Reading Logs
    Select the alternate connection for reading logs.
    CDC Method
    For the tasks in the staging group, you can use either CDC Tables or Log-based method for a SQL Server source. You cannot use the Query-based method.
    If you previously set a CDC Method under source properties, you can use this field to override it for the CDC staging job.
    Include LOBs
    Select this check box if you want all CDC tasks in the staging group to capture and stage LOB data for their tables that are in the source database.
    Note: If you previously selected this option under Advanced source properties, initial load jobs will read the LOB data too.
    If you want to enable LOB data capture after running the CDC staging task, either Redeploy the task or create another staging group with this option enabled.
    Row Flush Threshold
    The maximum number of rows that can be written to the files that store data temporarily before the data is transferred to cloud storage. The data is flushed to cloud storage either when this number of rows is reached or when the flush interval expires.
    Default is 50000 rows.
    Flush Interval
    During periods of low change activity on the source, the number of minutes and seconds that a job waits for more change data before flushing the data in the temporary files to cloud storage. The data is flushed either when this interval expires or when the row flush threshold is met.
    Default is 30 seconds.
    Log Start Point
    The position in the source logs from which the CDC staging job starts reading change records the first time it runs. Options are:
    • - Latest Available
    • - Earliest Available.
    • - Position.Then enter an appropirate position in the change stream for your source type.
    • - Specific Date and Time. Then specify the date and time.
    Note: These options are similar to those described for Initial Start Point for Incremental Load in the source properties.
    Default is Latest Available.
    Staging Data Retention Period
    The number of days to retain data in cloud storage. Valid values are 0-365. Default is 14 days.
    After the retention period expires, the data is purged and is no longer available for subsequent restarts.
    If you want to add source and target custom properties for the CDC staging jobs in the group, click the Custom Properties tab. Then enter each property name and value. Typically, you use custom properties only at the direction of Informatica Global Customer Support or Informatica technical staff.
    When done, click Save to save the staging group and return to the Task Details, Step 1 page.
    Note: If you edit the staging group later from a deployed task in the group, you can change only the Row Flush Threshold, Flush Interval, Staging Data Retention Period, and custom property values.
    5Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Property
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Unload JDBC Partitioning Technique
    The technique to use for partitioning source table rows for initial load processing or the unload phase of a combined load job.
    Valid values are:
    • - uniform. Select all of the primary key values from the source table and distribute those keys across the partitions evenly, producing partitions of equal or near equal size.
    • - heuristic. Estimate the primary key ranges to assign to each partition by calculating the difference between the low and high primary key values and using the number of partitions requested. With this technique, ranges are calculated more quickly because the query to select primary key values is avoided. However, the sizes of the partitions might vary widely if primary keys are not distributed evenly between high and low values.
    • For example, if 26 partitions are requested and the low primary key starts with A and the high primary key starts with Z, you might expect one partition for each set of rows with a primary key from A through Z. However, if 90% of the primary keys start with Q, R,or S, 90% of the rows are assigned to just three partitions.
    Default value is uniform.
    Unload Source Max Parallel Partition
    The maximum number of partition threads that can be used to query the source for data in parallel during initial load processing or the unload phase of combined jobs.
    Use this property to control the number of source partition queries that can be executed against the source at the same time. For example, if a table contains data in 100 partitions, all 100 partitions are queried at the same time by default. However, you can use this property to reduce the number of concurrent queries.
    Default value is equal to the total number of partitions.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    If you configured a CDC staging group, any custom properties that you enter here for the source are ignored. Instead, enter custom properties for the staging group.
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed to Step 2 of Task Details.

Configure a MongoDB source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source collections to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source collections on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Database
    The source database that includes the source collections.
    The list includes only the databases that are available in the database accessed with the specified source connection.
    After you specify a database, the Source Collections section appears and lists the collections in the database.
    2Under Source Collections, select the source collections that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source collections and define selection rules. If you first manually select collections on the Selected Collections tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any collections selected by those rules are displayed as selected on the Selected Collectionss tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    List Collections by Rule Type
    All
    Generate and download a list of the source collections that match the collection selection criteria.
    If you used rule-based collection selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Click the Download icon to download the list.
    Fetch Size
    All
    The number of records that a database ingestion and replication job must read at a single time from the MongoDB source. Valid values are 1 to 2147483647. The default is 5000.
    Initial Start Point for Incremental Load
    Incremental loads
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Latest Available. The latest available position in the database log or structure.
    • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    The default is Latest Available.
    4Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Configure a MySQL source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    Enable Persistent Storage
    Incremental and Initial and incremental loads
    Note: The property is not available if you are using a serverless runtime environment.
    Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
    Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
    Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
    Initial Start Point for Incremental Load
    Incremental loads
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For MySQL, the earliest available record in the first binlog file.
    • - Latest Available. The latest available position in the database log or structure.
    The default is Latest Available.
    4Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Configure a Netezza source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. The only available option is:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    Schema
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for initial load jobs with this source:
    Property
    Description
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Configure an Oracle source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Load type
    Description
    Load Type
    All
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    All
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables and columns that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and columns and define selection rules. If you first manually select tables and columns on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables and columns selected by those rules are displayed as selected on the Selected Tables tab. Details appear when you expand the table in the Selection Rules tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    1. aUnder Change Data Capture Method, select a capture method if you're defining an incremental load or combined initial and incremental load task:
      1. aIn the CDC Method field, select one of the following options to indicate the method to use for capturing source changes:
      2. Method
        Description
        Log-based
        Capture Inserts, Updates, and Deletes in near real time by reading the database transaction logs. Also capture DDL changes based on schema drift settings.
        For Oracle sources, data changes are read from the Oracle redo logs.
        Query-based
        Capture Inserts and Updates by using a SQL WHERE clause that points to a CDC query column. The query column is used to identify the rows that contain the changes made to the source tables since the beginning of the CDC interval.
        Note: This option is not availabe if you are using a serverless runtime environment.
        Note: If you plan to use a CDC staging group to stage change data for multiple tasks in cloud storage, you must use the Log-based method.
      3. bIf you selected the Query-based option, also complete the following fields:
        • CDC Query Column Type. The column data type of the CDC query column in the source table. The column data type must combine the date and time. For Oracle, the only available option is Timestamp.
        • CDC Query Column Name. The case-sensitive name of the CDC query column in the source table. The column must be present in the source table. Maximum length is 70 characters.
        • CDC Interval. The frequency of a query-based change data capture cycle, expressed in days, hours, and minutes. You must enter a positive number in at least one of the interval fields. Otherwise, an error will be issued when you try to save the task. The default value is 5 minutes.
    1. aComplete the following optional advanced properties as needed for your load type and environment:
    2. Property
      Load Type
      Description
      CDC Script
      Incremental and initial and incremental loads
      Generate a script for enabling CDC on source tables and then run or download the script. Options are:
      • - Enable CDC for all columns. Enables CDC for all columns in the selected source tables.
      • Note: For source tables without a primary key, including any tables with unique indexes, CDC is enabled for all columns by default, regardless of which option is selected.
      • - Enable CDC for primary key columns. Enables CDC only for primary key columns in the selected source tables.
      • Do not use it for any task that has a Google BigQuery target.
      For Oracle sources, the script enables supplemental logging for all or primary key columns in the selected source tables to log additional information in the redo logs.
      Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
      Note: This field is not displayed if you selected the Query-based CDC method:
      List Tables by Rule Type
      All
      Generate and download a list of the source tables that match the table selection criteria.
      If you used rule-based table selection, you can select the type of selection rules to use. Options are:
      • - Include Rules Only
      • - Exclude Rules Only
      • - Include And Exclude Rules
      Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
      Click the Download icon to download the list.
      Disable Flashback
      Initial loads
      Select this check box to disable Database Ingestion and Replication use of Oracle Flashback when fetching data from the database.
      This check box is selected by default for new initial load tasks. For existing initial load tasks, this check box is cleared by default, which causes Oracle Flashback to remain enabled. For tasks that have partitioning enabled, this check box is automatically selected and unavailable for editing.
      Include LOBs
      All load types to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets.
      Incremental loads and combined loads can use either the Log-based or Query-based CDC method. However, jobs that use the Log-based CDC method do not replicate data from LONG, LONG RAW, and XML columns to the generated target columns.
      Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target.
      LOB data types for Oracle sources: BLOB, CLOB, NCLOB, LONG, LONG RAW, and XML
      LOB data might be truncated on the target. For more information, see About LOB truncation.
      Enable Persistent Storage
      Incremental and initial and incremental loads.
      Note: The property is not available if you are using a serverless runtime environment.
      For sources that use the query-based CDC method, this field is not displayed because persistent storage is enabled by default and cannot be changed.
      Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
      Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
      Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
      If you select this check box and also select Stage CDC Data, this check box is cleared and becomes unavailable.
      Enable Partitioning
      Initial and initial and incremental loads
      Select this check box to enable partitioning of source objects. When an object is partitioned, the database ingestion and replication job processes the records read from each partition in parallel.
      For Oracle sources, Database Ingestion and Replication determines the range of partitions by using the ROWID as the partition key. When you select the Enable Partitioning check box, the Disable Flashback check box is also automatically selected.
      Note: In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase.
      Number of Partitions
      Initial and initial and incremental loads
      If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2.
      Initial Start Point for Incremental Load
      Incremental loads
      If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
      • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For Oracle, this option is not available unless you select Stage CDC Data.
      • - Latest Available. The latest available position in the database log or structure.
      • - Position. An Oracle SCN in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which causes the latest available point to be used.
      • This option is not available if the CDC Method is set to Query-based.
        If you select the Stage CDC Data option, this option is not available.
      • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
      • If you select the Stage CDC Data option, this option is not available.
      The default is Latest Available.
      Stage CDC Data
      Incremental and initial and incremental loads
      If you want to read data from the source database in a single pass and then write the data to common storage so that that it can be read by multiple tasks that process the same database, select this check box. The staged data can then be read by the tasks that are in the staging group. For a log-based source, the tasks can process different tables with different schemas.
      Staging Group
      Incremental and initial and incremental loads
      If you want to use a CDC staging group to stage CDC data for multiple tasks, either select a previously defined group from the list or click New to create one and proceed to step 4.
    4If you chose to create a new staging group, complete the following fields on the Overview tab of the New Staging Group dialog box:
    Property
    Description
    Group Name
    Accept the generated group name, which has the format Log_group_cdc_yyyymmdd<number>, or enter a custom name.
    Location
    The project in which you want to store the staging group definition, if you don't want to use the Default location.
    Staging Location Connection
    Select a connection to the cloud staging location, which can be in Amazon S3, Google Cloud Storage, or Microsoft Azure Data Lake Storage Gen2.
    Runtime Environment
    Select the runtime environment to use for running the CDC staging task.
    For a log-based source, ensure that the Secure Agent on which the task will run can access all of the source logs.
    Enable Alternate Connection for Reading Logs
    Select this check box if you want to use a source connection other than the one selected on the Source page to read data from source logs. For example, if you perform initial loads and CDC, you might want to select a connection with additional privileges required for CDC.
    Alternate Connection for Reading Logs
    Select the alternate connection for reading logs.
    Include LOBs
    Select this check box if you want all CDC tasks in the staging group to capture and stage LOB data for their tables that are in the source database.
    Note: If you previously selected this option under Advanced source properties, initial load jobs will read the LOB data too.
    If you want to enable LOB data capture after running the CDC staging task, either Redeploy the task or create another staging group with this option enabled.
    Row Flush Threshold
    The maximum number of rows that can be written to the files that store data temporarily before the data is transferred to cloud storage. The data is flushed to cloud storage either when this number of rows is reached or when the flush interval expires.
    Default is 50000 rows.
    Flush Interval
    During periods of low change activity on the source, the number of minutes and seconds that a job waits for more change data before flushing the data in the temporary files to cloud storage. The data is flushed either when this interval expires or when the row flush threshold is met.
    Default is 30 seconds.
    Log Start Point
    The position in the source logs from which the CDC staging job starts reading change records the first time it runs. Options are:
    • - Latest Available
    • - Position.Then enter an appropirate position in the change stream for your source type.
    • - Specific Date and Time. Then specify the date and time.
    Note: These options are similar to those described for Initial Start Point for Incremental Load in the source properties.
    Default is Latest Available.
    Staging Data Retention Period
    The number of days to retain data in cloud storage. Valid values are 0-365. Default is 14 days.
    After the retention period expires, the data is purged and is no longer available for subsequent restarts.
    If you want to add source and target custom properties for the CDC staging jobs in the group, click the Custom Properties tab. Then enter each property name and value. Typically, you use custom properties only at the direction of Informatica Global Customer Support or Informatica technical staff.
    When done, click Save to save the staging group and return to the Task Details, Step 1 page.
    Note: If you edit the staging group later from a deployed task in the group, you can change only the Row Flush Threshold, Flush Interval, Staging Data Retention Period, and custom property values.
    5Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Property
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Unload JDBC Partitioning Technique
    The technique to use for partitioning source table rows for initial load processing or the unload phase of a combined load job.
    Valid values are:
    • - uniform. Select all of the primary key values from the source table and distribute those keys across the partitions evenly, producing partitions of equal or near equal size.
    • - heuristic. Estimate the primary key ranges to assign to each partition by calculating the difference between the low and high primary key values and using the number of partitions requested. With this technique, ranges are calculated more quickly because the query to select primary key values is avoided. However, the sizes of the partitions might vary widely if primary keys are not distributed evenly between high and low values.
    • For example, if 26 partitions are requested and the low primary key starts with A and the high primary key starts with Z, you might expect one partition for each set of rows with a primary key from A through Z. However, if 90% of the primary keys start with Q, R,or S, 90% of the rows are assigned to just three partitions.
    Default value is uniform.
    Unload Source Max Parallel Partition
    The maximum number of partition threads that can be used to query the source for data in parallel during initial load processing or the unload phase of combined jobs.
    Use this property to control the number of source partition queries that can be executed against the source at the same time. For example, if a table contains data in 100 partitions, all 100 partitions are queried at the same time by default. However, you can use this property to reduce the number of concurrent queries.
    Default value is equal to the total number of partitions.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    If you configured a CDC staging group, any custom properties that you enter here for the source are ignored. Instead, enter custom properties for the staging group.
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed to Step 2 of Task Details.

Configure a PostgreSQL source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Load Type
    Description
    Load Type
    All
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    All
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    Replication Slot Name
    Incremental and Initial and incremental loads
    The unique name of a PostgreSQL replication slot.
    A slot name can contain Latin alphanumeric characters in lowercase and the underscore (_) character. Maximum length is 63 characters.
    Important: Each database ingestion and replication task must use a different replication slot.
    Replication Plugin
    Incremental and Initial and incremental loads
    The PostgreSQL replication plug-in. Options are:
    • - pgoutput. You can select this option only for PostgreSQL version 10 and later.
    • - wal2json
    Publication
    Incremental and Initial and incremental loads
    If you selected pgoutput as the replication plug-in, specify the publication name that this plug-in uses.
    Note: This field is not displayed if you selected wal2json as the replication plug-in.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    CDC Script
    Incremental and Initial and incremental loads
    Generate a script for enabling CDC on source tables and then run or download the script. The only available option is Enable CDC for all columns.
    For PostgreSQL sources, the script sets REPLICATION IDENTITY FULL on the selected source tables to write all column values to the WAL file. Also creates a replication slot of the type of pgoutput or wal2json. If the slot type is pgoutput, the script also creates the publication and adds tables to it.
    Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    Include LOBs
    Initial loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets.
    Incremental loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka-enabled Azure Event Hubs, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets.
    Combined initial and incremental loads to Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft AzureSynapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, or SQL Server targets.
    Select this check box if the source contains the large-object (LOB) columns from which you want to replicate data to a target.
    LOB data types for PostgreSQL sources: BYTEA, TEXT, and XML plus some other potentially large types such as JSON, JSONB
    LOB data might be truncated on the target. For more information, see About LOB truncation.
    Initial Start Point for Incremental Load
    Incremental loads
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For PostgreSQL, the earliest available record in the replication slot.
    • - Latest Available. The latest available position in the database log or structure.
    • - Position. A PostgreSQL LSN in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which causes the earliest available point to be used.
    • If you select Stage CDC Data, this option is not available unless you have a SAP HANA source and use the Log-based CDC method.
    • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    The default is Latest Available.
    4Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and enter the value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Propery
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Configure a SAP HANA source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Load type
    Description
    Load Type
    All
    The type of load operation that the database ingestion and replication task performs. Options are:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    • - Incremental Load. Propagates source data changes to a target continuously or until the job is stopped or ends. The job replicates the changes that have occurred since the last time the job ran or from a specific start point for the first job run.
    • - Initial and Incremental Loads. Performs an initial load of point-in-time data to the target and then automatically switches to replicating incremental data changes made to the same source tables on a continuous basis.
    • Note: If a change record is captured during the initial unload load phase, it's withheld from apply processing until after the unload phase completes. Any insert rows captured during the unload phase are converted into a pair of delete and insert operations so that only one insert row is applied to the target in the case where the insert occurs in both the unloaded data and the captured change data.
    Schema
    All
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    The schema name that is specified in the connection properties is displayed by default.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values.
    If you're defining an incremental load or combined initial and incremental load task, the Change Data Capture Method field shows the selected CDC method. For SAP HANA sources, this field is read only because the capture type is set in the SAP HANA Database Ingestion connection properties. Options are:
    Complete the following advanced properties as needed:
    Property
    Load Type
    Description
    CDC Script
    Incremental loads and initial and incremental loads
    Generate a script for enabling CDC on source tables and then run or download the script. The only available option is Enable CDC for all columns.
    For SAP HANA and SAP HANA Cloud sources that use Trigger-based CDC, the script creates the required PKLOG, PROCESSED, and _CDC shadow tables. Also creates three triggers and a sequence for each selected source table.
    For SAP HANA sources that use Log-based CDC, creates the TRANSACTIONS and ROWCACHE tables in the database cache that stages the change data captured by the CDC Staging Task.
    Click Execute to run the script if you have the required privileges. Or click the Download icon to download the script so that you can give it to your DBA to run.
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    Enable Persistent Storage
    Incremental loads and initial and incremental loads.
    Note: The property is not available if you are using a serverless runtime environment.
    For SAP HANA and SAP HANA cloud sources that use trigger-based CDC, it's enabled by default and cannot be changed. For SAP HANA sources that use log-based CDC, it's deselected by default but you can select it.
    Select this check box to enable persistent storage of transaction data in a disk buffer so that the data can be consumed continually, even when the writing of data to the target is slow or delayed.
    Benefits of using persistent storage are faster consumption of the source transaction logs, less reliance on log archives or backups, and the ability to still access the data persisted in disk storage after restarting a database ingestion job.
    Persisted data is stored on the Secure Agent. It is not encrypted. The Secure Agent's files and directories are expected to be secured from unwanted access by using native file system access permissions or file system support of encryption natively.
    If you select this check box and also select Stage CDC Data, this check box is cleared and becomes unavailable.
    Enable Partitioning
    Initial loads and initial and incremental loads (Log-based and trigger-based CDC)
    Select this check box to enable partitioning of source objects. When an object is partitioned, the database ingestion and replication job processes the records read from each partition in parallel.
    For SAP HANA sources, Database Ingestion and Replication determines the range of partitions by using the ROWID as the partition key.
    Note: In combined initial and incremental loads, the partitioning of source objects occurs only in the initial load phase.
    Number of Partitions
    Initial loads and initial and incremental loads (Log-based and trigger-based CDC)
    If you enable partitioning of source objects, enter the number of partitions you want to create. The default number is 5. The minimum value is 2.
    Initial Start Point for Incremental Load
    Incremental loads (Log-based and trigger-based CDC)
    If you want to customize the position in the source logs from which the database ingestion and replication job starts reading change records the first time it runs, select one of the following options:
    • - Earliest Available. The earliest available position in the database log or structure where changes are stored. For SAP HANA and SAP HANA Cloud, the earliest available record in the PKLOG table.
    • - Latest Available. The latest available position in the database log or structure.
    • - Position. An SAP HANA sequence value in the change stream from which you want the database ingestion job to start retrieving change records. This value must be equal to or less than the current position value. An invalid value will cause the job to fail. The value of 0 is displayed by default, which causes the earliest available point to be used.
    • - Specific Date and Time. A date and time, in the format MM/DD/YYYY hh:mm AM|PM, that Database Ingestion and Replication uses to determine the position in the change stream from which to start retrieving change records. Database Ingestion and Replication retrieves only the changes that were started after this date and time. If you enter a date and time earlier than the earliest date and time in the available archived logs, the job will fail.
    • If you have an SAP HANA source and use the Log-based CDC method, enter this value in UTC. Log-based change data capture stores all timestamps in UTC.
    The default is Latest Available.
    Stage CDC Data
    Incremental loads and initial and incremental loads (Log-based CDC only)
    If you want to read data from the source database in a single pass and then write the data to common storage so that that it can be read by multiple tasks that process the same database, select this check box. The staged data can then be read by the tasks that are in the staging group. For a log-based source, the tasks can process different tables with different schemas.
    For tasks that have an SAP HANA source and use the log-based CDC method, this check box is selected and unavailable for editing.
    Staging Group
    Incremental loads and initial and incremental loads (Log-based CDC only)
    If you want to use a CDC staging group to stage CDC data for multiple tasks, either select a previously defined group from the list or click New to create one and proceed to step 4.
    For tasks that have an SAP HANA source and use the log-based CDC method, you must create a new group if the source connection isn't already associated with a CDC Staging Task that's been deployed. If another log-based task that uses the same connection is already associated with a deployed staging group, this field displays that group name in read only mode. You can edit the group properties if necessary.
    4If you chose to create a new staging group, complete the following fields on the Overview tab of the New Staging Group dialog box:
    Property
    Description
    Group Name
    Accept the generated group name, which has the format Log_group_cdc_yyyymmdd<number>, or enter a custom name.
    Location
    The project in which you want to store the staging group definition, if you don't want to use the Default location.
    Runtime Environment
    Select the runtime environment to use for running the CDC staging task.
    For a log-based source, ensure that the Secure Agent on which the task will run can access all of the source logs.
    Staging Data Retention Period
    The number of days to retain data in cloud storage.
    For SAP HANA (log-based) sources, this field is read only. It displays the number of days set in the Log Clear property of the SAP HANA Database Ingestion connection. After this period elapses, the data staged in the cache database is compacted.
    If you want to add source and target custom properties for the CDC staging jobs in the group, click the Custom Properties tab. Then under Source Properties or Target Properties, click the Add new row + icon and enter a property name and value. Typically, you use custom properties at the direction of Informatica Global Customer Support or Informatica technical staff.
    When done, click Save to save the staging group and return to the Task Details, Step 1 page.
    Note: If you edit the staging group later from a deployed task in the group, you can change only the Row Flush Threshold, Flush Interval, Staging Data Retention Period, and custom property values.
    5Under Custom Properties, you can specify one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available, depending on the load type:
    Property
    Description
    Read Event Batch Size
    The number of payload events written in batch to the internal event queue during CDC processing.
    When the event queue is implemented as an internal ring buffer, this value is the number of payload events that the reader writes to a single internal buffer slot.
    Note: A batch size that's too small might increase contention between threads. A larger batch size can provide for more parallelism but consume more memory.
    Reader Helper Thread Count
    The number of reader helper threads used during CDC processing to convert change data into a canonical format that can be passed to the target.
    Default value is 3. You can enter a larger value to allow more threads to be available for performing conversion processing in parallel.
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Unload JDBC Partitioning Technique
    The technique to use for partitioning source table rows for initial load processing or the unload phase of a combined load job.
    Valid values are:
    • - uniform. Select all of the primary key values from the source table and distribute those keys across the partitions evenly, producing partitions of equal or near equal size.
    • - heuristic. Estimate the primary key ranges to assign to each partition by calculating the difference between the low and high primary key values and using the number of partitions requested. With this technique, ranges are calculated more quickly because the query to select primary key values is avoided. However, the sizes of the partitions might vary widely if primary keys are not distributed evenly between high and low values.
    • For example, if 26 partitions are requested and the low primary key starts with A and the high primary key starts with Z, you might expect one partition for each set of rows with a primary key from A through Z. However, if 90% of the primary keys start with Q, R,or S, 90% of the rows are assigned to just three partitions.
    Default value is uniform.
    Unload Source Max Parallel Partition
    The maximum number of partition threads that can be used to query the source for data in parallel during initial load processing or the unload phase of combined jobs.
    Use this property to control the number of source partition queries that can be executed against the source at the same time. For example, if a table contains data in 100 partitions, all 100 partitions are queried at the same time by default. However, you can use this property to reduce the number of concurrent queries.
    Default value is equal to the total number of partitions.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    If you configured a CDC staging group, any custom properties that you enter here for the source are ignored. Instead, enter custom properties for the staging group.
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed to Step 2 of Task Details.

Configure a Teradata source

Define source properties for the source that you selected on the Source page.
    1Under Source Properties, configure the basic properties:
    Property
    Description
    Load Type
    The type of load operation that the database ingestion and replication task performs. The only available option is:
    • - Initial Load. Loads data read at a specific point in time from source tables to a target in a batch operation. You can perform an initial load to materialize a target to which incremental change data will be sent.
    Schema
    The source schema that includes the source tables.
    The list includes only the schemas that are available in the database accessed with the specified source connection.
    After you specify a schema, the Source Tables section appears and lists the tables in the schema.
    2Under Source Tables, select the source tables that you want to replicate data from. Use one or both of the following methods:
    You can both manually select source tables and define selection rules. If you first manually select tables on the Selected Tables tab, rules are generated and displayed for those selections on the Selection Rules tab. Similarly, if you first define rules, any tables selected by those rules are displayed as selected on the Selected Tables tab.
    3To configure advanced source properties, toggle on Show Advanced Options at the top of the page. Advanced source properties are optional or have default values. Complete the following optional advanced properties as needed:
    Property
    Load Type
    Description
    List Tables by Rule Type
    All
    Generate and download a list of the source tables that match the table selection criteria.
    If you used rule-based table selection, you can select the type of selection rules to use. Options are:
    • - Include Rules Only
    • - Exclude Rules Only
    • - Include And Exclude Rules
    Select the Include Columns check box to include columns in the list, regardless of which table selection method you used.
    Click the Download icon to download the list.
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for initial load jobs with this source:
    Property
    Description
    Unload Helper Thread Count
    The number of unload helper threads allocated to an initial load job or the unload phase of a combined job to convert the unloaded data rows into a canonical format that can be passed to the writer.
    Default value is 2. If two threads can’t keep up with the incoming volume, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) parallel SQL result sets are open, which usually occurs when source partitioning is enabled, or 2) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed to Step 2 of Task Details.

Task Details: Configure how to replicate data to the target

Configure the data target in Step 2 of Task Details.
    bulletUnder Target Properties, set the required basic target properties. Then toggle on Show Advanced Options at the top of the page to set optional advanced target properties as needed. See the property descriptions for your target type:

Configure an Amazon Redshift target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Amazon Redshift target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Bucket
    Specifies the name of the Amazon S3 bucket that stores, organizes, and controls access to the data objects that you load to Amazon Redshift.
    Data Directory or Task Target Directory
    Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Cycle ID
    Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    The default value is INFA_.
    Enable Case Transformation
    By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
    Case Transformation Strategy
    If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
    • - Same as source. Use the same case as the source table (or object) names and column (or field) names.
    • - UPPERCASE. Use all uppercase.
    • - lowercase. Use all lowercase.
    The default value is Same as source.
    Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure an Amazon S3 target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Amazon S3 target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - PARQUET
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Encryption Type
    Select the encryption type for the Amazon S3 files when you write the files to the target. Options are:
    • - None
    • - Client Side Encryption
    • - Client Side Encryption with KMS
    • - Server Side Encryption
    • - Server Side Encryption with KMS
    The default is None, which means no encryption is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Parquet Compression Type
    If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • - None
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Connection Directory as Parent
    Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory.
    This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
    • - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example.
    • - Specific directory names.
    The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data
    Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
    Schema Directory
    Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }.
    If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)}
    Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target.
    For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
    By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    For initial loads, the job always writes "INFA" as the owner.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    For initial loads, the job always writes "1" as the ID.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure a Databricks target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Amazon Redshift target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column.
    • Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes.
      Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    The default value is Standard.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    Data Directory or Task Target Directory
    Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table.
    This field is available only when the Apply Mode option is set to Audit or Soft Deletes.
    In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete.
    In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values.
    By default, this check box is selected. You cannot deselect it if you are using soft deletes.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    The default value is INFA_.
    Create Unmanaged Tables
    Select this check box if you want the task to create Databricks target tables as unmanaged tables. After you deploy the task, you cannot edit this field to switch to managed tables.
    By default, this option is cleared and managed tables are created.
    If you selected Personal Staging Location in the Staging Environment field in the selected Databricks target connection, this check box is disabled. You cannot use unmanaged tables in this situation.
    For more information about Databricks managed and unmanaged tables, see the Databricks documentation.
    Unmanaged Tables Parent Directory
    If you choose to create Databricks unmanaged tables, you must specify a parent directory in Amazon S3 or Microsoft Azure Data Lake Storage to hold the Parquet files that are generated for each target table when captured DML records are processed.
    Note: To use Unity Catalog, you must provide an existing external directory.
    Note: For volume staging, provide the complete parent directory path.
    Staging File Format
    Select the format of the staging files in the staging environment specified in the Databricks connection. The files hold data before it's loaded into Databricks tables.
    Format options are:
    • - CSV
    • - Parquet
    Default is CSV.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure a Flat File target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Flat File target properties:
    Note: For flat file targets, these properties apply to initial load jobs only.
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    Note: For Flat File targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected.
    Connection Directory as Parent
    For initial load tasks, select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. The parent directory is used in the Data Directory and Schema Directory.
    Schema Directory
    For initial load tasks, you can specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. This field is optional.
    The schema is stored in the data directory by default. For incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema.
    You can use the same placeholders as for the Data Directory field. Ensure the placeholders are enclosed in curly brackets { }.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target.
    For initial loads, the job always writes "I" for insert.
    By default, this check box is cleared.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    For initial loads, the job always writes "INFA" as the owner.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    For initial loads, the job always writes "1" as the ID.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a Google BigQuery target

When you define a database ingestion and replication task that has a Google BigQuery target, you must enter some target properties on the Target tab of the task wizard.
    1Under Target Properties, define the following required Google BigQuery target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    • - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column.
    • Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes.
      Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target.
    The default value is Standard.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    Bucket
    Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage.
    Data Directory or Task Target Directory
    Specifies the subdirectory where Database Ingestion and Replication stores output files for jobs associated with the task. This field is called Data Directory for an initial load job or Task Target Directory for an incremental load or combined initial and incremental load job.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Last Replicated Time
    Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
    By default, this check box is not selected.
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when the Apply Mode option is set to Audit or Soft Deletes.
    In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column.
    In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values.
    By default, this check box is selected. You cannot deselect it if you are using soft deletes.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target tables.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target tables.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target table. The sequence number reflects the change stream position of the operation.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target table. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    Do not include special characters in the prefix. Otherwise, task deployment will fail.
    The default value is INFA_.
    Enable Case Transformation
    By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
    Case Transformation Strategy
    If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
    • - Same as source. Use the same case as the source table (or object) names and column (or field) names.
    • - UPPERCASE. Use all uppercase.
    • - lowercase. Use all lowercase.
    The default value is Same as source.
    Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure a Google Cloud Storage target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Google Cloud Storage target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - PARQUET
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Parquet Compression Type
    If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • - None
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Bucket
    Specifies the name of an existing bucket container that stores, organizes, and controls access to the data objects that you load to Google Cloud Storage.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
    • - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example.
    • - Specific directory names.
    The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data
    Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
    Schema Directory
    Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }.
    If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)}
    Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Field
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target.
    For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
    By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    For initial loads, the job always writes "INFA" as the owner.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    For initial loads, the job always writes "1" as the ID.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a Kafka target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Kafka target properties:
    Note: These properties apply to incremental load operations only.
    Property
    Description
    Use Table Name as Topic Name
    Indicates whether Database Ingestion and Replication writes messages that contain source data to separate topics, one for each source table, or writes all messages to a single topic.
    Select this check box to write messages to separate table-specific topics. The topic names match the source table names, unless you add the source schema name, a prefix, or a suffix in the Include Schema Name, Table Prefix, or Table Suffix properties.
    By default, this check box is cleared. With the default setting, you must specify the name of the single topic to which all messages are written in the Topic Name property.
    Include Schema Name
    When Use Table Name as Topic Name is selected, this check box appears and is selected by default. This setting adds the source schema name in the table-specific topic names. The topic names then have the format schemaname_tablename.
    If you do not want to include the schema name, clear this check box.
    Table Prefix
    When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a prefix to add to the table-specific topic names. For example, if you specify myprefix_, the topic names have the format myprefix_tablename. If you omit the underscore (_) after the prefix, the prefix is prepended to the table name.
    Table Suffix
    When Use Table Name as Topic Name is selected, this property appears so that you can optionally enter a suffix to add to the table-specific topic names. For example, if you specify _mysuffix, the topic names have the format tablename_mysuffix. If you omit the underscore (_) before the suffix, the suffix is appended to the table name.
    Topic Name
    If you do not select Use table name as topic name, you must enter the name of the single Kafka topic to which all messages that contain source data will be written.
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - JSON
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    If your Kafka target uses Confluent Schema Registry to store schemas for incremental load jobs, you must select AVRO as the format.
    JSON Format
    If JSON is selected as the output format, select the level of detail of the output. Options are:
    • - Concise. This format records only the most relevant data in the output, such as the operation type and the column names and values.
    • - Verbose. This format records detailed information, such as the table name and column types.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - Binary
    • - JSON
    • - None
    The default value is Binary.
    If you have a Confluent Kafka target that uses Confluent Schema Registry to store schemas, select None. Otherwise, Confluent Schema Registry does not register the schema. Do not select None if you are not using Confluent Scheme Registry.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    If a source schema change is expected to alter the target, the Avro schema definition file is regenerated with a unique name that includes a timestamp, in the following format:
    schemaname_tablename_YYYYMMDDhhmmss.txt
    This unique naming pattern ensures that older schema definition files are preserved for audit purposes.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that includes the source SQL operation type in the output that the job propagates to the target.
    The job writes "I" for insert, "U" for update, or "D" for delete.
    By default, this check box is selected.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    By default, this check box is not selected.
    Async Write
    Controls whether to use synchronous delivery of messages to Kafka.
    • - Clear this check box to use synchronous delivery. Kafka must acknowledge each message as received before Database Ingestion and Replication sends the next message. In this mode, Kafka is unlikely to receive duplicate messages. However, performance might be slower.
    • - Select this check box to use asynchronous delivery. Database Ingestion and Replication sends messages as soon as possible, without regard for the order in which the changes were retrieved from the source.
    By default, this check box is selected.
    Producer Configuration Properties
    Specify a comma-separated list of key=value pairs to enter Kafka producer properties for Apache Kafka, Confluent Kafka, Amazon Managed Streaming for Apache Kafka (MSK), or Kafka-enabled Azure Event Hubs targets.
    If you have a Confluent target that uses Confluent Schema Registry to store schemas, you must specify the following properties:
    schema.registry.url=url,
    key.serializer=org.apache.kafka.common.serialization.StringSerializer,
    value.serializer=io.confluent.kafka.serializers.KafkaAvroSerializer
    You can specify Kafka producer properties in either this field or in the Additional Connection Properties field in the Kafka connection.
    If you enter the producer properties in this field, the properties pertain to the database ingestion jobs associated with this task only. If you enter the producer properties for the connection, the properties pertain to jobs for all tasks that use the connection definition, unless you override the connection-level properties for specific tasks by also specifying properties in the Producer Configuration Properties field.
    For information about Kafka producer properties, see the Apache Kafka, Confluent Kafka, Amazon MSK, or Azure Event Hubs documentation.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a Microsoft Azure Data Lake Storage target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Microsoft Azure Data Lake Storage target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - PARQUET
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Parquet Compression Type
    If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • - None
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Connection Directory as Parent
    Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory.
    This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
    • - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example.
    • - Specific directory names.
    The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data
    Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
    Schema Directory
    Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }.
    If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)}
    Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Field
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target.
    For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
    By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    For initial loads, the job always writes "INFA" as the owner.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    For initial loads, the job always writes "1" as the ID.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a Microsoft Azure Synapse Analytics target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Microsoft Azure Synapse Analytics target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables. The schema name that is specified in the connection properties is displayed by default. Because this field is case sensitive, ensure that you entered the schema name in the connection properties in the correct case.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Last Replicated Time
    Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
    By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    Do not include special characters in the prefix. Otherwise, task deployment will fail.
    The default value is INFA_.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure a Microsoft Fabric OneLake target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Microsoft Fabric OneLake target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - PARQUET
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Parquet Compression Type
    If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • - None
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
    • - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example.
    • - Specific directory names.
    The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data
    Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
    Schema Directory
    Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }.
    If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)}
    Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any optional advanced target properties that you want to use.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a Microsoft SQL Server target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Microsoft SQL Server target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    • - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column.
    • Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes.
      Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target.
    The default value is Standard.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Last Replicated Time
    Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
    By default, this check box is not selected.
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when the Apply Mode option is set to Audit or Soft Deletes.
    In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete.
    In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values.
    By default, this check box is selected. You cannot deselect it if you are using soft deletes.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Cycle ID
    Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    Do not include special characters in the prefix. Otherwise, task deployment will fail.
    The default value is INFA_.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure an Oracle target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Oracle target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, and generated ascending sequence number. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    The default value is Standard.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Field
    Description
    Add Last Replicated Time
    Select this check box to add a metadata column that records the timestamp in UTC format at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
    By default, this check box is not selected.
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table.
    The job writes "I" for insert, "U" for update, or "D" for delete.
    By default, this check box is selected.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target table.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the target tables. The sequence number reflects the change stream position of the operation.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    By default, this check box is not selected.
    Add Cycle ID
    Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    The default value is INFA_.
    Enable Case Transformation
    By default, target table names and column names are generated in the same case as the corresponding source names. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
    Case Transformation Strategy
    If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
    • - Same as source. Use the same case as the source table (or object) names and column (or field) names.
    • - UPPERCASE. Use all uppercase.
    • - lowercase. Use all lowercase.
    The default value is Same as source.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure an Oracle Cloud Object Storage target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Oracle Cloud Object Storage target properties:
    Property
    Description
    Output Format
    Select the format of the output file. Options are:
    • - CSV
    • - AVRO
    • - PARQUET
    The default value is CSV.
    Note: Output files in CSV format use double-quotation marks ("") as the delimiter for each field.
    Add Headers to CSV File
    If CSV is selected as the output format, select this check box to add a header with source column names to the output CSV file.
    Avro Format
    If you selected AVRO as the output format, select the format of the Avro schema that will be created for each source table. Options are:
    • - Avro-Flat. This Avro schema format lists all Avro fields in one record.
    • - Avro-Generic. This Avro schema format lists all columns from a source table in a single array of Avro fields.
    • - Avro-Nested. This Avro schema format organizes each type of information in a separate record.
    The default value is Avro-Flat.
    Avro Serialization Format
    If AVRO is selected as the output format, select the serialization format of the Avro output file. Options are:
    • - None
    • - Binary
    • - JSON
    The default value is Binary.
    Avro Schema Directory
    If AVRO is selected as the output format, specify the local directory where Database Ingestion and Replication stores Avro schema definitions for each source table. Schema definition files have the following naming pattern:
    schemaname_tablename.txt
    Note: If this directory is not specified, no Avro schema definition file is produced.
    File Compression Type
    Select a file compression type for output files in CSV or AVRO output format. Options are:
    • - None
    • - Deflate
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Avro Compression Type
    If AVRO is selected as the output format, select an Avro compression type. Options are:
    • - None
    • - Bzip2
    • - Deflate
    • - Snappy
    The default value is None, which means no compression is used.
    Parquet Compression Type
    If the PARQUET output format is selected, you can select a compression type that is supported by Parquet. Options are:
    • - None
    • - Gzip
    • - Snappy
    The default value is None, which means no compression is used.
    Deflate Compression Level
    If Deflate is selected in the Avro Compression Type field, specify a compression level from 0 to 9. The default value is 0.
    Add Directory Tags
    For incremental load and combined initial and incremental load tasks, select this check box to add the "dt=" prefix to the names of apply cycle directories to be compatible with the naming convention for Hive partitioning. This check box is cleared by default.
    Task Target Directory
    For incremental load and combined initial and incremental load tasks, the root directory for the other directories that hold output data files, schema files, and CDC cycle contents and completed files. You can use it to specify a custom root directory for the task. If you enable the Connection Directory as Parent option, you can still optionally specify a task target directory to use with the parent directory specified in the connection properties.
    This field is required if the {TaskTargetDirectory} placeholder is specified in patterns for any of the following directory fields.
    Connection Directory as Parent
    Select this check box to use the directory value that is specified in the target connection properties as the parent directory for the custom directory paths specified in the task target properties. For initial load tasks, the parent directory is used in the Data Directory and Schema Directory. For incremental load and combined initial and incremental load tasks, the parent directory is used in the Data Directory, Schema Directory, Cycle Completion Directory, and Cycle Contents Directory.
    This check box is selected by default. If you clear it, for initial loads, define the full path to the output files in the Data Directory field. For incremental loads, optionally specify a root directory for the task in the Task Target Directory.
    Data Directory
    For initial load tasks, define a directory structure for the directories where Database Ingestion and Replication stores output data files and optionally stores the schema. To define directory pattern, you can use the following types of entries:
    • - The placeholders {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the initial load job starts to transfer data to the target.
    • - Specific directory names.
    • - The toUpper() and toLower() functions, which force the values for an associated (placeholder) to uppercase or lowercase.
    Note: Placeholder values are not case sensitive.
    Examples:
    myDir1/{SchemaName}/{TableName}
    myDir1/myDir2/{SchemaName}/{YYYY}/{MM}/{TableName}_{Timestamp}
    myDir1/{toLower(SchemaName)}/{TableName}_{Timestamp}
    The default directory pattern is {TableName)_{Timestamp}.
    For incremental load and combined initial and incremental load tasks, define a custom path to the subdirectory that contains the cdc-data data files. To define the directory pattern, you can use the following types of entries:
    • - The placeholders {TaskTargetDirectory}, {SchemaName}, {TableName), {Timestamp}, {YY}, {YYYY}, {MM}, and {DD}, where {YY}, {YYYY}, {MM}, and {DD} are for date elements. The {Timestamp} values are in the format yyyymmdd_hhmissms. The generated dates and times in the directory paths indicate when the CDC cycle started.
    • If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, as shown in the preceding example.
    • - Specific directory names.
    The default directory pattern is {TaskTargetDirectory}/data/{TableName}/data
    Note: For Amazon S3, Flat File, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Object Store targets, Database Ingestion and Replication uses the directory specified in the target connection properties as the root for the data directory path when Connection Directory as Parent is selected. For Google Cloud Storage targets, Database Ingestion and Replication uses the Bucket name that you specify in the target properties for the ingestion task. For Microsoft Fabric OneLake targets, the parent directory is the path specified in the Lakehouse Path field in the Microsoft Fabric OneLake connection properties.
    Schema Directory
    Specify a custom directory in which to store the schema file if you want to store it in a directory other than the default directory. For initial loads, previously used values if available are shown in a drop-down list for your convenience. This field is optional.
    For initial loads, the schema is stored in the data directory by default. For incremental loads and combined initial and incremental loads, the default directory for the schema file is {TaskTargetDirectory}/data/{TableName}/schema
    You can use the same placeholders as for the Data Directory field. Ensure that you enclose placeholders with curly brackets { }.
    If you include the toUpper or toLower function, put the placeholder name in parentheses and enclose the both the function and placeholder in curly brackets, for example: {toLower(SchemaName)}
    Note: Schema is written only to output data files in CSV format. Data files in Parquet and Avro formats contain their own embedded schema.
    Cycle Completion Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle completed file. Default is {TaskTargetDirectory}/cycle/completed.
    Cycle Contents Directory
    For incremental load and combined initial and incremental load tasks, the path to the directory that contains the cycle contents files. Default is {TaskTargetDirectory}/cycle/contents.
    Use Cycle Partitioning for Data Directory
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under each data directory.
    If this option is not selected, individual data files are written to the same directory without a timestamp, unless you define an alternative directory structure.
    Use Cycle Partitioning for Summary Directories
    For incremental load and combined initial and incremental load tasks, causes a timestamp subdirectory to be created for each CDC cycle, under the summary contents and completed subdirectories.
    List Individual Files in Contents
    For incremental load and combined initial and incremental load tasks, lists individual data files under the contents subdirectory.
    If Use Cycle Partitioning for Summary Directories is cleared, this option is selected by default. All of the individual files are listed in the contents subdirectory unless you can configure custom subdirectories by using the placeholders, such as for timestamp or date.
    If Use Cycle Partitioning for Data Directory is selected, you can still optionally select this check box to list individual files and group them by CDC cycle.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Field
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target.
    For incremental loads, the job writes "I" for insert, "U" for update, or "D" for delete. For initial loads, the job always writes "I" for insert.
    By default, this check box is selected for incremental load and initial and incremental load jobs, and cleared for initial load jobs.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target.
    For initial loads, the job always writes the current date and time.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target.
    For initial loads, the job always writes "INFA" as the owner.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    For initial loads, the job always writes "1" as the ID.
    By default, this check box is not selected.
    Add Orderable Sequence
    Select this check box to add a metadata column that records a combined epoch value and an incremental numeric value for each change operation that the job inserts into the target tables. The sequence value is always ascending, but not guaranteed to be sequential and gaps may exist. The sequence value is used to identify the order of activity in the target records.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to include UNDO data in the output that a job writes to the target.
    For initial loads, the job writes nulls.
    By default, this check box is not selected.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    5Click Next to proceed, or click Save.

Configure a PostgreSQL target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required PostgreSQL target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    • - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column.
    • Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes.
      Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target.
    The default value is Standard.
    Note: The Audit and Soft Deletes apply modes are supported for jobs that have an Oracle source.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the target table.
    This field is available only when the Apply Mode option is set to Audit or Soft Deletes.
    In Audit mode, the job writes "I" for insert, "U" for update, or "D" for delete.
    In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts and updates. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values.
    By default, this check box is selected. You cannot deselect it if you are using soft deletes.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    The default value is INFA_.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Configure a Snowflake Data Cloud target

Define target properties for the destination that you selected on the Destination page.
    1Under Target Properties, define the following required Snowflake target properties:
    Property
    Description
    Target Creation
    The only available option is Create Target Tables, which generates the target tables based on the source tables.
    Note: After the target table is created, Database Ingestion and Replication intelligently handles the target tables on subsequent job runs. Database Ingestion and Replication might truncate or re-create the target tables depending on the specific circumstances.
    Schema
    Select the target schema in which Database Ingestion and Replication creates the target tables.
    Stage
    The name of internal staging area that holds the data read from the source before the data is written to the target tables. This name must not include spaces. If the staging area does not exist, it will be automatically created.
    Note: This field is not available if you selected the Superpipe option in the Advanced Target Properties.
    Apply Mode
    For incremental load and combined initial and incremental load jobs, indicates how source DML changes, including inserts, updates, and deletes, are applied to the target. Options are:
    • - Standard. Accumulate the changes in a single apply cycle and intelligently merge them into fewer SQL statements before applying them to the target. For example, if an update followed by a delete occurs on the source row, no row is applied to the target. If multiple updates occur on the same column or field, only the last update is applied to the target. If multiple updates occur on different columns or fields, the updates are merged into a single update record before being applied to the target.
    • - Soft Deletes. Apply source delete operations to the target as soft deletes. A soft delete marks the deleted row as deleted without actually removing it from the database. For example, a delete on the source results in a change record on the target with "D" displayed in the INFA_OPERATION_TYPE column.
    • Consider using soft deletes if you have a long-running business process that needs the soft-deleted data to finish processing, to restore data after an accidental delete operation, or to track deleted values for audit purposes.
      Note: If you use Soft Deletes mode, you must not perform an update on the primary key in a source table. Otherwise, data corruption can occur on the target.
    • - Audit. Apply an audit trail of every DML operation made on the source tables to the target. A row for each DML change on a source table is written to the generated target table along with the audit columns you select under the Advanced section. The audit columns contain metadata about the change, such as the DML operation type, time, owner, transaction ID, generated ascending sequence number, and before image. Consider using Audit apply mode when you want to use the audit history to perform downstream computations or processing on the data before writing it to the target database or when you want to examine metadata about the captured changes.
    The default value is Standard.
    Note: This field does not appear if you selected Query-based as the CDC method on the Source page of the task wizard.
    2To view advanced properties, toggle on Show Advanced Options. Then under Advanced Target Properties, define any of the following optional advanced target properties that you want to use:
    Property
    Description
    Add Last Replicated Time
    Select this check box to add a metadata column that records the timestamp at which a record was inserted or last updated in the target table. For initial loads, all loaded records have the same timestamp, except for Snowflake targets that use the Superpipe option where minutes and seconds might vary slightly. For incremental and combined initial and incremental loads, the column records the timestamp of the last DML operation that was applied to the target.
    By default, this check box is not selected.
    Add Operation Type
    Select this check box to add a metadata column that records the source SQL operation type in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when the Apply Mode option is set to Audit or Soft Deletes.
    In Audit mode, the job writes "I" for inserts, "U" for updates, "E" for upserts, or "D" for deletes to this metadata column.
    In Soft Deletes mode, the job writes "D" for deletes or NULL for inserts, updates, and upserts. When the operation type is NULL, the other "Add Operation..." metadata columns are also NULL. Only when the operation type is "D" will the other metadata columns contain non-null values.
    By default, this check box is selected. You cannot deselect it if you are using soft deletes.
    Add Operation Time
    Select this check box to add a metadata column that records the source SQL operation timestamp in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Owner
    Select this check box to add a metadata column that records the owner of the source SQL operation in the output that the job propagates to the target database or inserts into the audit table on the target system.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    This property is not available for jobs that have a MongoDB or PostgreSQL source.
    Note: This property is not supported for jobs that have a SQL Server source and use the CDC Tables capture method.
    Add Operation Transaction Id
    Select this check box to add a metadata column that includes the source transaction ID in the output that the job propagates to the target for SQL operations.
    This field is available only when Apply Mode is set to Audit or Soft Deletes.
    By default, this check box is not selected.
    Add Operation Sequence
    Select this check box to add a metadata column that records a generated, ascending sequence number for each change operation that the job inserts into the audit table on the target system. The sequence number reflects the change stream position of the operation.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Before Images
    Select this check box to add _OLD columns with UNDO "before image" data in the output that the job inserts into the target tables. You can then compare the old and current values for each data column. For a delete operation, the current value will be null.
    This field is available only when Apply Mode is set to Audit.
    By default, this check box is not selected.
    Add Cycle ID
    Select this check box to add a metadata column that includes the cycle ID of each CDC cycle in each target table. A cycle ID is a number that's generated by the CDC engine for each successful CDC cycle. If you integrate the job with Data Integration taskflows, the job can pass the minimum and maximum cycle IDs in output fields to the taskflow so that the taskflow can determine the range of cycles that contain new CDC data. This capability is useful if data from multiple cycles accumulates before the previous taskflow run completes. By default, this check box is not selected.
    Note: If you select this option, you can't also select the Superpipe option for the Snowflake target.
    Prefix for Metadata Columns
    Add a prefix to the names of the added metadata columns to easily identify them and to prevent conflicts with the names of existing columns.
    The default value is INFA_.
    Superpipe
    Select this check box to use the Snowpipe Streaming API to quickly stream rows of data directly to Snowflake Data Cloud target tables with low latency instead of first writing the data to stage files. This option is available for all load types.
    When you configure the target connection, select KeyPair authentication.
    By default, this check box is selected. Deselect it if you want to write data to intermediate stage files.
    Note: If you enable the Superpipe option for a task that uses the Soft Deletes apply mode, make sure the source tables contain a primary key.
    Merge Frequency
    When Superpipe is selected, you can optionally set the frequency, in seconds, at which change data rows are merged and applied to the Snowflake target tables.
    The merge frequency affects how often the stream change data is merged to the Snowflake base table. A Snowflake view joins the stream change data with the base table. Set this value to balance the costs of merging data to the base table with the performance of view join processing.
    This field applies to incremental load and combined initial and incremental load tasks. Valid values are 60 through 604800 seconds. Default is 3600 seconds.
    Enable Case Transformation
    By default, target table names and column names are generated in the same case as the corresponding source names, unless cluster-level or session-level properties on the target override this case-sensitive behavior. If you want to control the case of letters in the target names, select this check box. Then select a Case Transformation Strategy option.
    Case Transformation Strategy
    If you selected Enable Case Transformation, select one of the following options to specify how to handle the case of letters in generated target table (or object) names and column (or field) names:
    • - Same as source. Use the same case as the source table (or object) names and column (or field) names.
    • - UPPERCASE. Use all uppercase.
    • - lowercase. Use all lowercase.
    The default value is Same as source.
    Note: The selected strategy will override any cluster-level or session-level properties on the target for controlling case.
    3Under Table Renaming Rules, if you want to rename the target objects that are associated with the selected source tables, define renaming rules. Click the + (Add new row) icon and enter a source table name or name mask and enter a corresponding target table name or name mask. To define a mask, include one or more the asterisk (*) wildcards. Then press Enter.
    For example, to add the prefix "PROD_" to the names of target tables that correspond to all selected source tables, enter the * wildcard for the source table and enter PROD_* for the target table.
    You can enter multiple rules.
    Notes:
    4Under Data Type Rules, if you want to override the default mappings of source data types to target data types, define data type rules. Click the + (Add new row) icon and enter a source data type and corresponding target data type. Then press Enter.
    This feature is supported for tasks that have the following source and target combinations:
    For example, you can create a rule that maps Oracle NUMBER columns that have no precision to Snowflake target NUMBER() columns that also have no precision.
    Also, in the Source Data Type value, you can include the percent (%) wildcard to represent the data type precision, scale, or size, for example, NUMBER(%,4), NUMBER(8,%), or NUMBER(%). Use the wildcard to cover all source columns that have the same data type but use different precision, scale, or size values, instead of specifying each one individually. For example, enter FLOAT(%) to cover FLOAT(16), FLOAT(32), and FLOAT(84). You cannot enter the % wildcard in the target data type. A source data type that uses the % wildcard must map to a target data type that uses specific precision, scale, or size value. For example, you could map the source data type FLOAT(%) to a target data type specification such as NUMBER(38,10)
    5Under Custom Properties, you can enter one or more custom properties that Informatica provides to improve performance or to meet your special requirements. To add a property, click the + icon to add a row. In the Property Name field, select a property and then enter a property value, or select the Custom option and manuallly enter both the property name and value.
    The following table describes the properties that are available for this target:
    Property
    Description
    Writer Distributor Count
    The number of distributors that can run on separate threads in parallel to process data during an initial load job or the unload phase of a combined load job when the Writer Unload Multiple Distributors custom property is set to true. Using parallel distributor threads can improve job performance, particularly for high-volume data transfers.
    Default value is 3. If your system has ample resources, Informatica recommends that you set this parameter to 8.
    Writer Helper Thread Count
    The number of writer helper threads that are used to convert incoming change data rows or initial unload rows to the output format configured for the target, such as Avro, CSV, or Parquet.
    Default value is 2. If two threads can’t keep up with the incoming volume of data, you can increase the number of threads. Consider increasing the number of threads in the following situations: 1) the incoming volume is high, 2) multiple writer distributors are in use, or 3) some rows are very large or wide, which increases conversion time.
    Writer Unload Multiple Distributors
    Indicates whether multiple distributor threads can be used to process data in parallel during initial load jobs and the unload phase of combined load jobs. The distributors perform work such as uploading data files to staging areas and flushing data to the target. Set this property to true to use multiple distributor threads.
    Default value is false.
    Custom
    Select this option to manually enter the name of a property and its value. Use this option to enter properties that Informatica Global Customer Support or a technical staff member has provided to you for a special case. Available for any supported load type.
    Custom properties are intended to address performance or special processing needs. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_).
    Tip: To delete a custom property after you've entered it, click the Delete icon at the right end of the property row.
    6Click Next to proceed, or click Save.

Transform the data

You can apply trim transformations to selected tables and columns to remove spaces to the left or right of character column values. You can also define row-level filter rules that contain column conditions to filter out data rows retrieved from source tables before the data is applied to the target.
Note: If you edit row-level filters in the task for a deployed job, you must Redeploy the job afterwards for the updated filters to take effect.
    1On the Transform Data page, select the tables and columns to which you want to assign a transformation.
    You can use the Find box to search for items based on table name or column name. Select Table Names or Columns next the Find box and then enter all or part of the name in the box. If found, the string is highlighted in the names of the listed items.
    2To add a trim transformation, click Add Transformation (not the down-pointing arrow next to it).
    Note: You can apply trim transformations and row-level filters to the same tables and columns.
    The How do you want to transform your data? dialog box appears.
    3Click the + (Add a new row) icon to add a row. Then, in the Transformation Type list, select one of the following options:
    Click the Save icon to add the entry.
    4Click Next to go to the Summary tab where you can review your transformation settings.
    5If the settings are correct on the Summary tab, click Save to save them and return to the initial Transform Data page.
    6To add another trim transformation type for a different table or set of tables, repeat steps 1 through 5.
    When you return to the initial Transform Data page, the list shows the tables to which the trim transformations were applied. For example:
    List of tables after multiple trim transformations are applied.
    Tip: You can remove a transformation assignment on the Transform Data page. Select the table with the unwanted transformation and click Clear All.
    7To add row-level filters to tables and columns, select the tables and then click the down arrow next to Add Transformation and select Add Row Filter.
    The Add Row Filter option is available only for Oracle, Microsoft SQL Server, Db2 for i, Db2 for LUW, and Db2 for z/OS source tables. The task can use any load type.
    Select the Add Row Filter option to add row-level filters to tables and columns.
    The How do you want to filter your data? dialog box appears.
    8Select the table and filter type to apply the filter conditions.
    1. aSelect a table that you want to assign a filter to.
    2. bSelect one of the following filter types:
    3. The default option is Basic.
    9To add a Basic filter, complete the following substeps:
    1. aClick the + (Add a new row) icon to add a row.
    2. bUnder Column Name, select a column.
    3. Columns with unsupported data types for row filtering are marked as "Not supported."
    4. cUnder Operator, select an operator type to use with the value.
    5. dUnder Value, select or enter a value, depending on the column type. Then click the Save icon on the right end of the row to save the condition.
    6. For example, the following image shows a basic filter with concitions set for two columns:
      The following table describes the values that are valid for each column data type supported for filtering:
      Column data type
      Description
      INTEGER
      Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -2147483648 and 2147483647.
      LONG
      Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.
      BIGINT
      Enter a numeric value. You can use "+" and "-" only once before the number. Maximum length is 50 digits.
      BIGDEC
      Enter a numeric value. You can use "+" and "-" only once before the number. A decimal is allowed. Maximum length is 50 digits.
      STRING
      Enter text.
      DATE
      Use the date picker to select the date.
      TIME
      Enter the value in the format HH:MM:SS.MS, with milliseconds being optional and up to a maximum length is 9 digits.
      For example, 13:14:15.123456789
      DATETIME
      Use the date picker to select the date and time.
      OFFSET_DATETIME
      Use the date picker to select the date, time, and time zone.
      Note: Database Ingestion and Replication does not support BOOLEAN, BINARY, BLOB, CLOB, and graphic column data types.
    7. eClick Validate to test the syntax of the specified condition.
    8. fTo add another Basic condition for the same column or a different column in the table, repeat steps a through e.
    9. The AND operator is used to combine the multiple conditions.
    10. gClick Save to validate and save the changes.
    11. hWhen done defining Basic filter conditions, click OK to return to the Transform Data page.
    10To define an Advanced filter that consists of multiple conditions combined with the AND or OR operator, manually enter the conditions in the box.
    Note: If you entered a Basic filter conditions for a column and then switched to the Advanced filter, the Basic condition is displayed so that you can add to it to make a more complex filter.
    1. aUnder Column Name, select a column and click the > arrow.
    2. The column name appears in the Filter Condition box.
      Note: For combined load tasks, do not include columns that you expect will be updated during CDC processing. If the column is updated, it might become ineligible for replication and cause unpredictable results. In this case, you'd need to Resync the job.
    3. bIn the Filter Condition box, type one or more conditions for the selected column. Manually enter conditions using the supported syntax and the appropriate operators, which can vary based on the column data type. You can also nest conditions using parentheses. See Syntax for row-level filtering. When done, click the Save icon on the right end of the row to save the advanced filter.
    4. The following table describes the values that are valid for each column data type supported for filtering:
      Column data type
      Description
      INTEGER
      Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -2147483648 and 2147483647.
      LONG
      Enter a numeric value. You can use "+" and "-" only once before the number. The value must be between -9,223,372,036,854,775,808 and 9,223,372,036,854,775,807.
      BIGINT
      Enter a number value. You can use "+" and "-" only once before the number. Maximum length is 50 digits.
      BIGDEC
      Enter a numeric value. You can use "+" and "-" only once before the number. A decimal is allowed. Maximum length is 50 digits.
      STRING
      Enter an input attribute in single quotes (').
      DATE
      Enter the value in the format YYYY-MM-DD. Enter the input attribute in single quotes (').
      TIME
      Enter the value in the format HH:MM:SS.MS, with milliseconds (MS) being optional and up to a maximum length is 9 digits. Enter the input attribute in single quotes (').
      For example, 13:14:15.123456789
      DATETIME
      Enter the date and time in the following format:
      YYYY-MM-DDTHH:MM:SS:MS
      For example, 2024-12-31T03:04:05.123456789
      Enter the input attribute in single quotes (').
      OFFSET_DATETIME
      Enter the date, time, and time zone in the following format:
      YYYY-MM-DDTHH:MM:SS.MS+05:00
      For example, 2024-03-15T10:03:04.123456789+05:00
      Enter the input attribute in single quotes (').
      Notes:
    5. cClick Validate to test the syntax of the specified conditions.
    6. Note: Switching from Advanced to Basic filter type after creating or editing an Advanced filter condition, deletes all changes to the filter condition, even if you saved it.
    7. dClick Save to validate and save the changes and then click OK to return to the Transform Data page.
    8. Note: Do not modify any column included in the filter after the task has been deployed. If you do so, the row-level filtering might not work properly.
    The Filters column on the Transform Data page shows the applied filters as hyperlinks. Clicking the link opens the selected filter in edit mode. Tables with an advanced filter display Advanced next to their filter conditions in the Filters column.
    On the Transform Data page, clicking the Clear All button at the top right hand corner removes all filters, including trim transformations and row-level filters from the selected tables.
    11When done, click Next.

Syntax for row-level filtering

If you create Advanced row-level filters when you define a database ingesition and replication task, ensure that you enter the filter conditions using the correct syntax. Otherwise, filter validation is likely to fail.
Operators
In an Advanced filter, you can use the following operators within a condition, depending on the column data type:
Operator
Description
=
Equals
!=
Does not equal
>
Greater than
>=
Greater than or equal to
<
Less than
<=
Less than or equal to
IS NULL
Contains a null
IS NOT NULL
Cannot contain a null
BETWEEN x AND y
Greater than or equal to x and less than or equal to y
NOT BETWEEN %s AND %s
Not greater than or equal to x and less than or equal to y
LIKE
A comparative operative for string columns only.
Example: LIKE '%06%7__' . This condition matches against the following values: 06789, A06X789, AB06XY789", "06X789, and A06789. However, it does not match against these values: A06789Z, A0678, A6789, "".
NOT LIKE
A comparative operative for string columns only.
IN
True if the operand is equal to one of a list of expressions
NOT IN
True if the operand is NOT equal to one of a list of expressions
+ - / *
Numeric computation operators for addition, subtraction, division, and multiplication
Syntax rules
In Advanced filters, use with the following syntax rules:

Finalize the task definition

Almost done! On the Let's Go! page, complete a few more properties. Then you can Save and Deploy the task.
    1At the top of the page, view the number of source tables that have been selected for processing:
    Last page with some final options and Save button.
    2Under General Properties, set the following properties:
    Property
    Description
    Task Name
    Enter a name that you want to use to identify the database ingestion and replication task, if you do not want to use the generated name. Using a descriptive name will make finding the task easier later.
    Task names can contain Latin alphanumeric characters, spaces, periods (.), commas (,), underscores (_), plus signs (+), and hyphens (-). Task names cannot include other special characters. Task names are not case sensitive. Maximum length is 50 characters.
    Note: If you include spaces in the task name, after you deploy the task, the spaces do not appear in the corresponding job name.
    Location
    The project or project\folder in Explore that will contain the task definition. If you do not specify a project, the "Default" project is used.
    Runtime Environment
    Select the runtime environment that you want to use to run the task. By default, the runtime environment that you initially entered when you began defining the task is displayed. You can use this runtime environment or select another one.
    Tip: To refresh the list of runtime environments, click Refresh.
    The runtime environment can be a Secure Agent group that consists of one or more Secure Agents. A Secure Agent is a lightweight program that runs tasks and enables secure communication.
    Alternatively, for selected cloud source types, you can use a serverless runtime environment hosted on Microsoft Azure.
    Note: You cannot choose a serverless runtime environment if a local runtime environment was previously selected.
    The Cloud Hosted Agent is not supported.
    Select Set as default to use the specified runtime environment as your default environment for all tasks you create. Otherwise, leave this check box cleared.
    Description
    Optionally, enter a description you want to use for the task.
    Maximum length is 4,000 characters.
    Schedule
    If you want to run an initial load task based on a schedule instead of manually starting it, select Run this task based on a schedule. Then select a schedule that was previously defined in Administrator.
    The default option is Do not run this task based on a schedule.
    Note: This field is not available for incremental load and combined initial and incremental load tasks.
    To view and edit the schedule options, go to Administrator. If you edit the schedule, the changes will apply to all jobs that use the schedule. If you edit the schedule after deploying the task, you do not need to redeploy the task.
    If the schedule criteria for running the job is met but the previous job run is still active, Database Ingestion and Replication skips the new job run.
    Execute in Taskflow
    Select this check box to make the task available in Data Integration to add to a taskflow as an event source.You can then include transformations in the taskflow to transform the ingested data. Available for initial load and incremental load tasks with Snowflake targets that don't use the Superpipe option.
    3To display advanced properties, toggle on Show Advanced Options.
    Advanced properties on the final Let's Go page.
    4In the Number of Rows in Output File field, specify the maximum number of rows that the database ingestion and replication task writes to an output data file, if you do not want to use the default value.
    Note: The Number of Rows in Output File field is not displayed for jobs that have an Apache Kafka target or if you use the Superpipe option for a Snowflake target.
    For incremental load operations and combined initial and incremental load operations, change data is flushed to the target either when this number of rows is reached or when the flush latency period expires and the job is not in the middle of processing a transaction. The flush latency period is the time that the job waits for more change data before flushing data to the target. The latency period is internally set to 10 seconds and cannot be changed.
    Valid values are 1 through 100000000. The default value for Amazon S3, Microsoft Azure Data Lake Storage Gen2, and Oracle Cloud Infrastructure (OCI) Object Storage targets is 1000 rows. For the other targets, the default value is 100000 rows.
    Note: For Microsoft Azure Synapse Analytics targets, the data is first sent to a Microsoft Azure Data Lake Storage staging file before being written to the target tables. After data is written to the target, the entire contents of the table-specific directory that includes the staging files are emptied. For Snowflake targets, the data is first stored in an internal stage area before being written to the target tables.
    5For initial load jobs only, optionally clear the File Extension Based on File Type check box if you want the output data files for Flat File, Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, or Oracle Cloud Object Storage targets to have the .dat extension. This check box is selected by default, which causes the output files to have file-name extensions based on their file types.
    Note: For incremental load jobs with these target types, this option is not available. Database Ingestion and Replication always uses output file-name extensions based on file type.
    6For database ingestion and replication incremental load tasks that have Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Fabric OneLake, or Oracle Cloud Object Storage targets, configure the following apply cycle options:
    Option
    Description
    Apply Cycle Interval
    Specifies the amount of time that must elapse before a database ingestion and replication job ends an apply cycle. You can specify days, hours, minutes, and seconds or specify values for a subset of these time fields leaving the other fields blank.
    The default value is 15 minutes.
    Apply Cycle Change Limit
    Specifies the total number of records in all tables of a database ingestion and replication job that must be processed before the job ends an apply cycle. When this record limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target.
    The default value is 10000 records.
    Note: During startup, jobs might reach this limit more frequently than the apply cycle interval if they need to catch up on processing a backlog of older data.
    Low Activity Flush Interval
    Specifies the amount of time, in hours, minutes, or both, that must elapse during a period of no change activity on the source before a database ingestion and replication job ends an apply cycle. When this time limit is reached, the database ingestion and replication job ends the apply cycle and writes the change data to the target.
    If you do not specify a value for this option, a database ingestion and replication job ends apply cycles only after either the Apply Cycle Change Limit or Apply Cycle Interval limit is reached.
    No default value is provided.
    7Under Schema Drift Options, if the detection of schema drift is supported for your source and target combination, specify the schema drift option to use for each of the supported types of DDL operations.
    Schema drift options are supported for incremental load and combined initial and incremental load jobs with the following source - target combinations:
    Source
    Target
    Db2 for i
    Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
    Db2 for LUW
    Snowflake
    Db2 for z/OS, except Db2 11
    Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, Snowflake, and SQL Server
    Microsoft SQL Server
    Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
    Oracle
    Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server
    PostgreSQL
    Incremental loads: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka (incremental loads only), Microsoft Azure Data Lake Storage, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, and Snowflake
    Combined initial and incremental loads: Oracle, PostgreSQL, and Snowflake
    The following types DDL operations are supported:
    Note: The Modify Column and Rename Column options are not supported and not displayed for database ingestion and replication jobs that have Google BigQuery targets.
    The following table describes the schema drift options that you can set for a DDL operation type:
    Option
    Description
    Ignore
    Do not replicate DDL changes that occur on the source database to the target. For Amazon Redshift, Kafka, Microsoft Azure Synapse Analytics, PostgreSQL, Snowflake and SQL Server targets, this option is the default option for the Drop Column and Rename Column operation types.
    For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, and Oracle Cloud Object Storage targets that use the CSV output format, the Ignore option is disabled. For the AVRO output format, this option is enabled.
    Replicate
    Replicate the DDL operation to the target. For Amazon S3, Google Cloud Storage, Microsoft Azure Data Lake Storage, Microsoft Fabric OneLake, and Oracle Cloud Object Storage targets, this option is the default option for all operation types. For other targets, this option is the default option for the Add Column and Modify Column operation types.
    Stop Job
    Stop the entire database ingestion and replication job.
    Stop Table
    Stop processing the source table on which the DDL change occurred. When one or more of the tables are excluded from replication because of the Stop Table schema drift option, the job state changes to Running with Warning.
    Important: The database ingestion and replication job cannot retrieve the data changes that occurred on the source table after the job stopped processing it. Consequently, data loss might occur on the target. To avoid data loss, you will need to resynchronize the source and target objects that the job stopped processing. Use the Resume With Options > Resync option.
    8For incremental load jobs that have an Apache Kafka target, configure the following Checkpoint Options:
    Option
    Description
    Checkpoint All Rows
    Indicates whether a database ingestion and replication job performs checkpoint processing for every message that is sent to the Kafka target.
    Note: If this check box is selected, the Checkpoint Every Commit, Checkpoint Row Count, and Checkpoint Frequency (secs) options are ignored.
    Checkpoint Every Commit
    Indicates whether a database ingestion and replication job performs checkpoint processing for every commit that occurs on the source.
    Checkpoint Row Count
    Specifies the maximum number of messages that a database ingestion and replication job sends to the target before adding a checkpoint. If you set this option to 0, the job does not perform checkpoint processing based on the number of messages. If you set this option to 1, the job adds a checkpoint for each message.
    Checkpoint Frequency (secs)
    Specifies the maximum number of seconds that must elapse before a database ingestion and replication job adds a checkpoint. If you set this option to 0, a database ingestion and replication job does not perform checkpoint processing based on elapsed time.
    9Under Custom Properties, you can specify one or more custom properties that Informatica provides to meet your special requirements. To add a property, in the Create Property field, enter the property name and value. Then click Add Property.
    Specify these properties only at the direction of Informatica Global Customer Support. Usually, these properties address unique environments or special processing needs. You can specify multiple properties, if necessary. A property name can contain only alphanumeric characters and the following special characters: periods (.), hyphens (-), and underscores (_)
    10Click Save to save the task.
    11Click Deploy to deploy a job instance for the task, or click View to view or edit the task.
    You can run a job that has the status of Deployed from the My Jobs page.