Database Ingestion and Replication sources - preparation and usage
Before you configure database ingestion and replication tasks for initial load, incremental load, or combined initial and incremental operations, prepare the source database and review any usage considerations for your sources to avoid unexpected results.
Db2 for i sources
To use Db2 for i sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•For incremental load jobs and combined initial and incremental load jobs, grant the appropriate level of authority for accessing the Db2 journal and file objects that Database Ingestion and Replication uses to the user ID that runs database ingestion and replication jobs. The following table lists these objects and their Db2 authority requirements:
Object
Authority
Journal
*USE *OBJEXIST
Journal Library
*USE
Journal Receiver
*USE
Journal Receiver Library
*USE
File
*USE
File Library
*USE
•For incremental load jobs, journaling must be active on each database physical file that corresponds to a selected source table. Also, each journal must be configured with the IMAGES(*BOTH) option to store both before and after images of change data.
If journaling is not active on a physical file for a source table, when you define a database ingestion and replication task, you can generate a CDC script that activates it. The script issues the following command, which activates journaling and sets the IMAGES option to BOTH:
If journaling is already active for a physical file for a source table, the CDC script output contains the following comment:
Table 'table_name' is skipped because journaling is already enabled.
•By default, Database Ingestion and Replication uses the DataDirect JDBC for IBM Db2 driver to connect to the Db2 for i database. Informatica recommends that the first user who creates and tests a Db2 for i connection to the source database has DBA authority on the database. This authority is needed for the driver to create and upload the packages that it uses for Db2 access and to grant the EXECUTE privilege on the packages to PUBLIC. If a DBA user does not perform the first connection test, you must grant *USE authority on the CRTSQLPKG command for creating the packages and grant *CHANGE authority on the library in which the packages are created.
•To use SSL data encryption for Db2 for i sources, when you configure the Db2 for i connection properties, select JTOpen in the JDBC Driver field and select SSL in the Encryption Method field.
Also, add the required certificates to the Informatica Cloud Secure Agent JRE cacerts keystore in one of the following locations:
After you add the certificates, restart the Secure Agent to ensure the changes are reflected in the agent services app-truststore.jks files for the latest instances.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key.
•When you define a database ingestion and replication task, on the Source page, specify a journal name that is associated with the source tables that are enabled for journaling.
Important: Ensure that the case and spelling of the name in the Journal Name field and table selection rules match the journal and table name values in the Db2 source catalog.
•Schema drift options are available for Db2 for i sources in database ingestion and replication incremental load and combined initial and incremental load jobs.
If you set the Add Column option to Replicate and then add a column with a default value to a Db2 for i source table, Database Ingestion and Replication adds the default value to the newly added table rows to the target. However, existing rows on the target are not updated to reflect the default values. To get the default value populated to the existing target rows, perform another initial load to re-materialize the target.
•Database Ingestion and Replication does not support the following Db2 for i data types:
- BLOB
- CLOB
- DATALINK
- DBCLOB
- GRAPHIC
- LONG VARGRAPHIC
- VARGRAPHIC
- XML
Database ingestion and replication jobs propagate nulls for columns that have these data types.
For information about the default mappings of supported source data types to target data types, see Default Data Type Mappings.
•To prevent the system from deleting Db2 journal receivers while database ingestion and replication incremental load and combined initial and incremental load jobs are reading changes from the receivers, Database Ingestion and Replication provides a journal receiver exit program. The exit program locks the journal receivers during CDC processing. To use the exit program, you must install it and set a few custom properties. For more information, see Database Ingestion and Replication journal receiver exit and Installing the Database Ingestion and Replication journal receiver exit.
Change capture mechanism for Db2 for i sources
Database Ingestion and Replication provides a single change capture mechanism and architecture for capturing change data from a Db2 source on IBM i and applying the data to a target.
The Secure Agent must run on a Linux or Windows box, separate from the IBM i source system. The network bandwidth between the IBM i system and the Secure Agent box must be robust. Informatica recommends network transfer rates that can handle100s of gigabits or one or more gigabytes of log data. If the network transfer rate is not capable of delivering the log data to the Secure Agent at speeds equal to or greater than the rate at which Db2 produces log data of CDC interest, the database ingestion job will not be able to provide the data to the target in a timely manner. If the data throughput does not meet your SLA, consider changing the hardware to increase the ethernet bandwidth between the IBM i system and the Secure Agent box.
The following image shows the Db2 for i change capture components and data flow:
1The log collector, which runs in the DBMI Agent service under the Secure Agent, sends a request to the Db2 DISPLAY LOG command for Db2 for i journaled data.
Each request includes a starting RBA along with a list of tables of CDC interest for database ingestion.
2The command requests the data for the source tables from the Db2 journals and journal receiver.
3The command returns the journal entries containing the data to the UOW (Unit Of Work) Manager.
4The UOW Manager sends the journaled data in committed transaction order to the log parser.
5The log parser parses the DML changes from the committed transactions. Before sending the data to the CDC writer, the log parser transforms it into a canonical form of the Db2 journal data that can be consumed by the DBMI Agent task and applied to the target.
Note: Because this resource-intensive activity occurs on the Secure Agent box, CPU consumption on the IBM i system is minimized.
6The CDC writer applies the formatted data to the target.
Database Ingestion and Replication journal receiver exit
Database Ingestion and Replication provides an exit program to prevent the deletion of Db2 for i journal receivers that are in use for change data capture processing. Use the journal receiver exit to prevent potential errors related to journal receiver deletion during CDC.
The exit program locks the journal receivers while they're being read by database ingestion and replication incremental load or combined load jobs. If the exit program finds a lock record for a journal receiver during CDC processing, the exit program returns a response that prevents the system from deleting the journal receiver.
When the database ingestion and replication job switches to the next journal receiver on the chain, Database Ingestion and Replication removes the lock record for the journal receiver that was last read and adds a lock record for the next journal receiver to be read. The system will not remove any journal receivers that are created subsequent to the currently locked receiver.
The control information for the journal receiver exit is stored in a Db2 for i table on the source system. The table contains the following columns:
•JOURNAL_RECEIVER_LIBARY. The name of the journal receiver library.
•JOURNAL RECEIVER_NAME. The name of a journal receiver.
•DBMI_MAPPING_NAME. A unique lock token name.
•LOCKED_BY_USER. The user ID of the connected user for which the lock was acquired.
•TIME_LOCKED. The time at which the lock was created.
Each combination of JOURNAL_RECEIVER_LIBRARY and JOURNAL_RECEIVER_NAME entries identifies a journal receiver instance that will be locked so that it can't be deleted during change data capture.
To use the journal receiver exit, perform the following steps:
When pwx.cdcreader.iseries.option.useJournalReceiverExit is set to true, you must specify a token string up to 256 characters in length that is unique for each database ingestion and replication job instance.
If the token string is not unique across all job instances, unpredictable results can occur, especially when multiple jobs are accessing the same journal receivers.
Also, ensure that the token value remains the same after a job Resume or Redeploy operation.
Set this property to true to enable Database Ingestion and Replication to check the maintenance level on the Db2 machine to determine if it's capable of doing journal receiver queries.
Installing the Database Ingestion and Replication journal receiver exit
Use the Data Ingestion and Replication journal receiver exit to prevent the deletion of Db2 for i journal receivers while they're in use for change data capture. To manually install journal receiver exit, complete the following steps:
1Unzip the V01_Exit_Install file to a local folder on your computer.
2Run the SQL statements in the SQL_1.txt file to check if the system has enough space for the installation. A return value of 1 means the system has enough space.
3Run the SQL statements in the SQL_2.txt file to create the schema and the default journal and journal receiver for journal receiver exit point handling.
4Run the FTP commands in the FTP_3.txt file to install the journal receiver exit.
Note: Before starting FTP, navigate to the directory with the IBMi_SaveFile_V01.savf file, which contains journal receiver exit program.
5Run the SQL statements in the SQL_4.txt file to add the exit point to the IBM i system.
Note: Before running the SQL, replace <userId> with a valid user ID for the system.
6Use the SQL statements in the SQL_5.txt file to create the Db2 objects used for locking journal receivers to prevent them from being deleted.
Db2 for LUW sources
To use Db2 for Linux, UNIX, and Windows (LUW) sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•Database Ingestion and Replication requires users to have the SELECT privilege on several system catalog tables and views. Use the following grant statement:
GRANT SELECT ON <catalog_table> TO <dbmi_user>
Issue this grant for each of the following catalog tables or views:
- SYSCAT.COLUMNS
- SYSCAT.DATAPARTITIONEXPRESSION
- SYSCAT.DATAPARTITIONS
- SYSCAT.INDEXCOLUSE
- SYSCAT.INDEXES
- SYSCAT.SCHEMATA
- SYSCAT.TABLESSYSCAT.TABLES
- SYSIBM.COLUMNS
- SYSIBM.SYSDUMMY1
- SYSIBM.SYSPLAN
- SYSIBM.SYSVERSIONS
•To create and execute the Db2 package required for Database Ingestion and Replication to successfully connect to the Db2 database and issue SQL requests, ensure that the BINDADD privilege is granted to the dbmi_user:
GRANT BINDADD ON DATABASE TO <dbmi_user>
Usage considerations
•Database ingestion and replication incremental load and combined initial and incremental load jobs that have a Db2 for LUW source must use the Query-based capture method. Query-based change data capture uses a SQL statement with a WHERE clause that references a common CDC query column to identify the rows with insert and update changes. Configuration of the source database is limited to adding the CDC query column to each source table. Users must have at least read only access to the source tables. The CDC query column type must be equivalent to timestamp, without a time zone. Currently, this CDC functionality has been tested with Snowflake targets only.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key.
•Database ingestion and replication jobs can replicate data from columns that have LOB data types to Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, or Snowflake targets. For information about the default mappings of supported source data types to target data types, see Default Data Type Mappings.
Db2 for z/OS sources
To use Db2 for z/OS sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•Database ingestion and replication incremental load jobs with a Db2 for z/OS source use a stored procedure to call the Db2 Instrumentation Facility Interface (IFI) to read change data from Db2 logs on the z/OS source system. Database Ingestion and Replication delivers the stored procedure libraries and JCL in the Db2 for zOS Database Ingestion connector package ZIP file. You must receive the stored procedure libraries into an APF-authorized library and then customize the JCL for your environment. The stored procedure runs in a Workload Manager (WLM) address space on the Db2 source system. For more information about z/OS system requirements, stored procedure setup, and required permissions, see Installing and configuring the stored procedure for Db2 for z/OS CDC.
•When you define a Db2 source in a database ingestion and replication incremental load task, you must select the option Enable CDC for all columns in the CDC Script field. Database Ingestion and Replication generates a script for enabling Db2 DATA CAPTURE CHANGES on the source tables and on certain Db2 catalog tables that are used for CDC. After this attribute is set for one job, all other jobs recognize that the attribute is enabled in Db2 for the catalog tables needed. You can either execute the CDC script from the user interface if you have sufficient privileges or ask a Db2 DBA who has SYSDBA authority to run the script.
•Make sure the Database Ingestion and Replication user has the Db2 for z/OS privileges that are required for the database ingestion and replication load type to be performed. For more information, see Db2 for z/OS privileges.
•Database Ingestion and Replication uses the Progress DataDirect JDBC IBM DB2 driver to connect to a Db2 for z/OS source. For a new implementation of Database Ingestion and Replication with Db2 for z/OS sources, the user who first runs an incremental job must have SYSADM or SYSDBA authority to establish JDBC connectivity.
•If you perform both initial loads and incremental loads and want to use different users with different Db2 for z/OS privileges for these job types, perform the following steps:
1Create separate Db2 for z/OS Database Ingestion connections for initial loads and incremental loads. In the connection properties, specify a user who has the Db2 privileges required for an initial load job or incremental load job that uses the connection.
2To restrict access to connections to certain users, you can set explicit user permissions on a connection asset in Administrator. To use the connection for running a job, the user must have the Execute permission. This practice prevents a user who has a lower level of privileges from running a job that requires a higher level of permissions.
3When you create a database ingestion and replication that uses the initial load or incremental load type, select the connection that specifies a user with the required Db2 privileges and asset permission.
Usage considerations
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key.
•The default setting for z/OS is that schema drift is enabled when you run the ALTER statements for the catalog tables. Once schema drift is enabled for one job, it is enabled for all jobs. If jobs are running when the ALTER statements are running, the running jobs do not have schema drift enabled until they are subsequently stopped and resumed.
•Database Ingestion and Replication does not support schema drift for Db2 11 for z/OS sources.
•Database Ingestion and Replication does not support the following Db2 for z/OS data types:
- BLOB
- CLOB
- DBCLOB
- XML
Database ingestion and replication jobs propagate nulls for columns that have these data types.
Installing and configuring the stored procedure for Db2 for z/OS CDC
To perform Db2 for z/OS CDC processing for incremental load jobs, Database Ingestion and Replication provides a stored procedure that runs on the z/OS source system. The stored procedure calls the Db2 Instrumentation Facility Interface (IFI) to collect change data from the Db2 logs.
z/OS system requirements
Before you begin, verify that the Db2 for z/OS source system meets the following requirements:
•The z/OS system has the recommended operating system version of 2.3 or later.
•The Db2 for z/OS source uses Db2 version 11 or 12.
•Ensure that a Workload Manager (WLM) address space exists for executing the Database Ingestion and Replication stored procedure.
If you have not set up a Db2 WLM address space, see the following IBM documentation for more information:
•Ensure that the library where the stored procedure for Db2 for z/OS CDC will be received exists and is APF-authorized on the z/OS system.
Install the Stored Procedure Libraries and Customize the JCL
On your client machine, perform the following steps:
1Verify that the Db2 for zOS Database Ingestion connector is available.
The Db2 for zOS Database Ingestion connector package contains the Db2 for z/OS stored procedure libraries. When the connector is enabled for the Secure Agent Group (runtime environment), the connector package .zip file is downloaded to the downloads folder in the installation location. The package name has the format package-DB2ZMI.nnnnn, where nnnnn is an incremented package version number. If multiple package versions exist, use the latest one.
2Unzip the package-DB2ZMI.nnnnn.zip file. The stored procedure files are added to the Db2WLMStoredProcedure folder under the package name.
3Use FTP to transfer the #STPINST file in the Db2WLMStoredProcedure folder to a sequential file, PDS, or PDSE on the z/OS system.
Notes:
- Transfer the file without binary mode set.
- Add a High Level Qualifier (HLQ) if needed to meet system requirements.
4Use FTP or another file transfer method to transfer each of the following files to a separate data set on the z/OS system:
- DBMI.ZOS.DBRMLIB.XMI
- DBMI.ZOS.LOADLIB.XMI
- DBMI.ZOS.USERJCL.XMI
Notes:
- Add or edit the HLQ if needed to meet system requirements.
- Transfer the files in binary mode.
- Ensure that each data set has the DCB attributes of LRECL=80, BLKSIZE=3120, and RECFM=FB.
You might need to pre-allocate a data set to specify the required attribute values.
On the z/OS system, use TSO to receive the transmit (XMI) data sets into an APF-authorized library and then edit the stored procedure JCL member. Also, set Db2 user privileges and add a row to the resource limit table if used.
1Receive the DBRMLIB transmit data set:
RECEIVE INDATASET(DBMI.ZOS.DBRMLIB.XMI)
Notes:
- If you specified a HLQ when transferring the data set to z/OS, include the HLQ.
- When you see the message INMR906A Enter restore parameters or 'DELETE' or 'END' +, enter your APF-authorized library:
DA(your.library_name) UNIT(unit) VOLUME(volume)
The UNIT() and VOLUME() operands are optional. Include them if your installation does not put RECEIVE files on a work unit or volume by default.
2Receive the LOADLIB transmit data set:
RECEIVE INDATASET(DBMI.ZOS.LOADLIB.XMI)
See the Notes in Step 1.
3Receive the USERJCL transmit data set:
RECEIVE INDATASET(DBMI.ZOS.USERJCL.XMI)
See the Notes in Step 1.
4Customize the #STPINST file that contains the stored procedure JCL for your environment.
The JCL creates the stored procedure and a global temporary table that holds the results of the requests to the Db2 IFI for data. It also binds the stored procedure package.
Notes:
- Based on the comments at the top of the JCL file, replace variables in the JCL with the values appropriate for your z/OS environment, including the Db2 subsystem name (!DSN!), stored procedure schema name (!SCHEMA!), stored procedure name (!STRPRC!), WLM environment name (!WLMENV!), and the name of the DBRMLIB transmit data set (!DBRMLIB! received in Step 1.
- If you used a HLQ for the received data sets, include the HLQ in the JCL.
- The WLM environment name is specified in the procedure APPLENV parameter or in the EXEC PARM of the WLM address space.
- You can use the received LOADLIB library after it is APF-authorized, or copy the contents of the library to your own APF-authorized library.
- The STEPLIB concatenation in the WLM address space must contain only APF-authorized libraries for the Db2 IFI to run.
5Customize the JCL in members of the received USERJCL data set. For more information, see Db2 for z/OS USERJCL Data Set.
6Ensure that the required Db2 privileges have been granted before you run the stored procedure JCL job. For more information, see Db2 for z/OS privileges.
7Add a row in the Db2 DSNRLSTxx resource limit table for database ingestion packages to ensure processor resources are adequate for stored procedure processing. Otherwise, incremental load jobs might abend. Add the row with the following columns:
- An AUTHID column with the authentication ID that the database ingestion and replication task uses OR an RLFPDG column with the same package name as the database ingestion and replication task OR both of these columns.
- An ASUTIME column defined with NULL or with a resource limit that is greater than the default limit.
Then issue the Db2 -START RLIMIT command for For the changes to the resource limit table to take effect.
Db2 for z/OS USERJCL Data Set
The downloaded USERJCL data set is a partitioned data set (PDS) or extended partitioned data set (PDSE) that contains a JCL member for running a job that collects information that is not readily available otherwise when using Informatica-supplied WLM Db2 stored procedure.
You can fully customize member JCL in the USERJCL PDS or PDSE. Informatica recommends that you create a copy of a delivered JCL member under another name and then customize the copy so that original member is retained intact for reference.
If you install multiple stored procedures on multiple Db2 subsystems, you can install only one USERJCL library and create members in it that are tailored for each Db2 subsystem. Alternatively, you can create a separate library for a specific Db2 subsystem. Make sure that the database ingestion and replication job contains the correct library and member information for the Db2 subsystem.
The USERJCL PDS or PDSE contains the following member:
LOGINV member
LOGINV contains JCL for a job that obtains a Db2 log inventory list. If you set the Initial Start Point for Incremental Load property to Specific Date and Time for an ingestion task in the task wizard, the inventory list is used to determine the start point in the logs when the database ingestion and replication job runs for the first time or restarts. The log inventory list provides Database Ingestion and Replication with a starting RBA or LSN and an ending RBA or LRSN, along with the starting and ending timestamps of all active and archive logs for the Db2 subsystem. Database Ingestion and Replication uses this information to select the appropriate log archive to search for the requested start point. If the Initial Start Point for Incremental Load property is set to any option other than Specific Date and Time, you do not need to install the USERJCL library.
Contents of the LOGINV member:
//<USERID>I JOB 'LOG INVENTORY',MSGLEVEL=(1,1),MSGCLASS=X, // NOTIFY=&SYSUID,CLASS=A,REGION=0M //* ------------------------------------------------------- //* //* PLEASE CHANGE DSN? TO THE DB2 SUBSYSTEM OF YOUR CHOICE. //* //* THIS JCL MEMBER CAN BE USED TO RUN A LOG INVENTORY //* LIST. DBMI WILL REQUEST THIS IF A DBMI JOB IS TRYING TO //* RETART USING TIME. THE LOG INVERNTORY GIVES DBMI THE //* ABILITY TO CORRELATE RBA/LSRN TO ACTUAL LOG RESTART //* POSITIONS. DBMI PARSES THE SYSPRINT OUTPUT TO GET THE //* REQUIRED INFORMATION. //* //* //* SUBSTITUTION TAGS //* _________________ //* //* SUBSTITUTION TAGS ARE USED TO INSERT DATA INTO THE JOB //* BY DBMI BEFORE IT IS SUBMITTED. YOU MAY COPY THIS JCL //* INTO ANOTHER MEMBER AND MODIFY ITS CONTENTS. SUBSTITUTION //* TAGS MAY ALSO BE REMOVED AND HARD CODED FOR INDIVIDUAL //* JOB NEEDS. //* //* //* <USERID> WILL BE REPLACED WITH THE USER ID. //* //* ------------------------------------------------------- //LOGINV EXEC PGM=DSNJU004 //STEPLIB DD DISP=SHR,DSN=DSN?10.SDSNLOAD // DD DISP=SHR,DSN=DSN?10.DSNC.RUNLIB.LOAD //SYSUT1 DD DISP=SHR,DSN=DSN?.BSDS01 //SYSPRINT DD SYSOUT=* //SYSSUMRY DD SYSOUT=* //
To submit USERJCL member jobs and retrieve output from submitted jobs, Database Ingestion and Replication executes Db2-supplied stored procedures in batch as required. Make sure that the Database Ingestion and Replication user has the required privileges to execute these stored procedures. For more information about Db2-supplied procedures, see https://www.ibm.com/docs/en/db2-for-zos/12?topic=sql-procedures-that-are-supplied-db2.
A USERJCL member is processed in the following manner:
1For the LOGINV member, if the Initial Start Point for Incremental Load advanced property is set to Specific Date and Time on the Source page of the database ingestion and replication task wizard, Database Ingestion and Replication checks whether the following custom properties are specified for the source:
- pwx.cdcreader.ZOS.Db2JobsDSN, which specifies the name of the installed USERJCL PDS or PDSEd
- pwx.cdcreader.ZOS.Db2JobLLOGINVMember, which specifies the LOGINV member name that will be used for a database ingestion and replication job.
2Database Ingestion and Replication reads the LOGINV member by using the Db2-supplied stored procedure ADMIN_DS_BROWSE.
3Database Ingestion and Replication substitutes any tags required for the job execution.
4Database Ingestion and Replication submits the job to the Db2 for z/OS database by using the Db2-supplied stored procedure ADMIN_SUBMIT_JOB.
5Database Ingestion and Replication queries the status of the submitted jobs by using the Db2-supplied stored procedure ADMIN_JOB_QUERY.
6After the job completes, Database Ingestion and Replication obtains the job output by using the Db2-supplied stored procedure ADMIN_JOB_FETCH.
Db2 for z/OS privileges
To deploy and run a database ingestion and replication task that has a Db2 for z/OS source, the source connection must specify a Database Ingestion and Replication user who has the privileges required for the ingestion load type.
Privileges for initial load processing
•For initial load jobs, grant the following privilege to the user specified in the Db2 for zOS Database Ingestion connection properties:
SELECT on schema.table TO user
Where schema.table identifies a source table.
•To run queries against Db2 catalog tables during source unload processing, grant the SELECT privilege on the following catalog tables:
- SYSIBM.SYSCOLUMNS
- SYSIBM.SYSINDEXES
- SYSIBM.SYSKEYS
- SYSIBM.SYSTABLEPART
- SYSIBM.SYSTABLES
- SYSIBM.SYSTABLESPACE
Privileges for incremental load processing
For incremental load jobs, grant the following privileges.
Note: If you do not have the authority required to issue these grants, ask a Db2 administrator who has SYSDBA authority or a higher authority level to issue them.
•To generate the CDC script that enables the database CDC option from the user interface, issue the following grant to the interface user:
GRANT ALTER TABLE schema.table DATA CAPTURE CHANGES TO user <--for each source table COMMIT;
•For incremental change data processing, grant the following privileges to the user specified in the Db2 for zOS Database Ingestion connection properties:
- To make the change data in the Db2 logs available to the stored procedure:
GRANT ALTER TABLE schema.table DATA CAPTURE CHANGES TO user <--for each source table COMMIT;
- To enable the user to obtain required information from the Db2 Instrumentation Facility Interface (IFI):
GRANT MONITOR2 TO user; COMMIT;
- To grant authorities on the global temporary table to the user:
GRANT INSERT ON !SCHEMA!.!STRPRC!_RS_TBL to user; GRANT DELETE ON !SCHEMA!.!STRPRC!_RS_TBL to user; GRANT EXECUTE !SCHEMA!.!STRPRC! to user; COMMIT;
Where !SCHEMA! and !STRPRC! are variables in the job JCL, which represent the stored procedure schema name and procedure name respectively.
The first two privileges allow the user to read and delete the contents of the Global Temporary Table to which the stored procedure is writing the log data. The third privilege allows the user to run the stored procedure.
Note: If the GRANT INSERT ON or GRANT DELETE ON statement fails with SQLCode -526, use the following GRANT instead:
GRANT ALL ON !SCHEMA!.!STRPRC!_RS_TBL to user;
- To enable the stored procedure to bind its Db2 plan, grant the following privileges to PUBLIC:
GRANT BIND, EXECUTE ON PLAN !STRPRC! TO PUBLIC; COMMIT;
- To run queries against Db2 catalog tables during source unload processing, grant the SELECT privilege on the following catalog tables:
▪ SYSIBM.SYSCOLUMNS
▪ SYSIBM.SYSDUMMY1
▪ SYSIBM.SYSINDEXES
▪ SYSIBM.SYSKEYS
▪ SYSIBM.SYSTABLEPART
▪ SYSIBM.SYSTABLES
▪ SYSIBM.SYSTABLESPACE
Permissions Required for Executing the Stored Procedure on z/OS
Ensure that the following Db2 permissions are granted before you run the stored procedure JCL job:
•Ensure that the user who executes the stored procedure job has SYSADM authority, or ask the Db2 for z/OS DBA to run it.
•For the stored procedure to run, you must grant the following Db2 permissions to the procedure schema name specified in the #STPINST JCL file:
- SELECT authority on Db2 catalog tables:
GRANT SELECT ON SYSIBM.* TO schema;
- EXECUTE authority on the package name specified in the JCL.
GRANT EXECUTE ON PACKAGE package_name TO schema;
Additionally, grant INSERT and DELETE authority on the schema and stored procedure name that are specified in the JCL for the global temporary table:
GRANT INSERT, DELETE ON schema.stored_procedure_name_RS_TBL TO user
Change capture mechanism for Db2 for z/OS sources
Database Ingestion and Replication provides a single change capture mechanism and architecture for capturing change data from a Db2 source on z/OS and applying the data to a target. This architecture uses multithreaded processing to optimize the performance of collecting data and of parsing and transforming the data into a format the target accepts.
The Secure Agent must run on a Linux or Windows box, separate from the Db2 z/OS source system. The network bandwidth between the z/OS system and the Secure Agent box must be robust. Informatica recommends network transfer rates that can handle100s of gigabytes or one or more gigabits of log data. If the network transfer rate is not capable of delivering the log data to the Secure Agent at speeds equal to or greater than the rate at which Db2 produces log data of CDC interest, the database ingestion and replication job will not be able to provide the data to the target in a timely manner. If the data throughput does not meet your SLA, consider changing the hardware to increase the ethernet bandwidth between the z/OS system and the Secure Agent box.
The following image shows the Db2 for z/OS change capture components and data flow:
1The multithreaded log collector, which runs in the DBMI Agent service under the Secure Agent, issues multiple concurrent requests for Db2 log data to the Db2 stored procedure.
Each request includes a starting RBA, or a starting LRSN if the logs are in a data-sharing environment, along with a list of tables of CDC interest for database ingestion.
While processing a sequence of log data, the log collector can request the next sequence of log data.
2The Db2 Instrumentation Facility Interface ( IFI) API extracts data for the selected source tables of CDC interest from active and archive logs. The IFI then transfers the data, in raw, native form, to the Informatica Db2 stored procedure in z/OS Workload Manager (WLM).
3The Db2 stored procedure returns the UOWs with captured data to the UOW Manager.
4The UOW Manager sends the UOWs in commit order to the log parser.
5The multithreaded log parser concurrently parses DML changes from the committed UOWs. The result is raw native Db2 log data in the format that the target type expects.
Note: Because this resource-intensive activity occurs on the Secure Agent box, CPU consumption on the z/OS system is minimized.
6The CDC writer applies the formatted data to the target.
Microsoft SQL Server, RDS for SQL Server, Azure SQL Database, and Azure Managed Instance sources
To use Microsoft SQL Server sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations. SQL Server source types include on-premises SQL Server, Relational Database Service (RDS) for SQL Server, Azure SQL Database, and Azure SQL Managed Instance.
•For incremental change data capture (CDC) operations with SQL Server sources, Database Ingestion and Replication provides multiple change capture methods. The preparation of your SQL Server source database depends on the CDC method you use.
The available change capture methods are:
- Log-based change data capture with the transaction log and CDC tables. Database Ingestion and Replication reads data changes from the SQL Server transaction log and the enabled SQL Server CDC tables. This method requires users to have extended privileges. SQL Server CDC must be enabled on the source tables.
- Change data capture with CDC tables only. Users must have at least SELECT permission on the source and CDC tables. SQL Server CDC must be enabled on the source tables.
- Query-based change data capture. Change data capture uses a SQL statement with a WHERE clause that references a common CDC query column to identify the rows with insert and update changes. Configuration of the source database is limited to adding the CDC query column to each source table. Users must have at least read only access to the source tables.
•For incremental load jobs that use log-based CDC with transaction logs, ensure that the database user you specify in the SQL Server source connection has the db_owner role and the VIEW ANY DEFINITION privilege. To grant these privileges, use one of the following sets of SQL statements, depending on your SQL Server source type.
For SQL Server on-premises sources, including Azure SQL Managed Instance:
USE master; CREATE DATABASE <database>; CREATE LOGIN <login_name> WITH PASSWORD = '<password>'; CREATE USER <user> FOR LOGIN <login_name>; GRANT SELECT ON master.sys.fn_dblog TO <user>; GRANT VIEW SERVER STATE TO <login_name>; GRANT VIEW ANY DEFINITION TO <login_name>;
USE <db>; CREATE USER <user> FOR LOGIN <login_name>; EXEC sp_addrolemember 'db_owner', '<user>'; EXEC sys.sp_cdc_enable_db
For RDS for SQL Server:
USE master; CREATE DATABASE <database>; CREATE LOGIN <login> WITH PASSWORD = '<password>';
USE <database>; EXEC msdb.dbo.rds_cdc_enable_db '<database>'; CREATE USER <user> FOR LOGIN <login_name>;
USE master; GRANT VIEW SERVER STATE TO <login_name >; GRANT VIEW ANY DEFINITION TO <login_name >;
USE <database>; EXEC sp_addrolemember 'db_owner', '<user>';
•For database ingestion and replication incremental load and initial and incremental load jobs that have SQL Server sources and use log-based CDC, you must enable SQL Server Change Data Capture (CDC) on the source database.
- For on-premises SQL Server sources, run the sys.sp_cdc_enable_db stored procedure in the database context. You must have the sysadmin role.
- For Amazon Relational Database Service (RDS) for SQL Server sources, log in as the master user and run the msdb.dbo.rds_cdc_enable_db 'database_name' stored procedure.
After SQL Server CDC is enabled, SQL Server writes additional information to the transaction log and CDC tables, which Database Ingestion and Replication uses during incremental CDC processing.
Alternatively, when you create a database ingestion and replication task, you have the option of generating a script that enables CDC on the database and on all columns in the selected source tables. To execute the CDC script, you need to have the sysadmin role.
Restriction: Database Ingestion and Replication cannot enable CDC for tables that contain more than 1019 columns.
•For incremental load or combined initial and incremental load operations that use query-based CDC, the source table must contain the CDC query column that is used to indicate the changed rows. You must add the query column to the source tables before creating the database ingestion and replication task. The CDC query column type must be equivalent to timestamp, without a time zone. The supported SQL Server data types for the query column are DATETIME and DATETIME2.
Usage considerations
•Database ingestion and replication jobs support all load types for on-premises SQL Server, Amazon RDS for SQL Server, Azure SQL Managed Instance, and Azure SQL Database sources.
•Database Ingestion and Replication provides the following alternative capture methods for SQL Server sources in incremental load or combined initial and increment load jobs:
- CDC Tables. Capture change data from the SQL Server CDC tables only.
- Log-based. Captures change data from the SQL Server transaction log and CDC tables.
Note: If you select this capture method for jobs that have Azure SQL Database sources, change data is read from CDC tables only, not from the transaction log.
- Query-based. Capture Inserts and Updates by using a SQL WHERE clause that points to a CDC query column.
The CDC Tables option provides the best replication performance and highest reliability of results.
If you use the Log-based method for multiple jobs that run against the same database simultaneously, performance can be significantly degraded.
•When you enable CDC on the SQL Server database for a task that uses Log-based CDC, SQL Server automatically creates a capture job and a cleanup job that will be executed by the SQL Server Agent. The capture job is responsible for populating the SQL Server CDC tables. The cleanup job is responsible for cleaning up records from the CDC tables. The default value for data retention in the CDC table is 72 hours, or 3 days. You can check the current retention period by running the sys.sp_cdc_help_jobs stored procedure and checking the retention value in the results. If you expect a downtime greater than 3 days, you can adjust the retention in the sys.sp_cdc_change_job stored procedure or in the SQL Server Agent cleanup job. You can also suspend the cleanup job.
•Database Ingestion and Replication supports SQL Server page compression and row compression of source data.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key. Exception: For SQL Server query-based CDC, a primary key is required in each source table.
•For log-based CDC with the transaction log, Database Ingestion and Replication requires read-write access on the source database. If you use SQL Server Always On availability groups, this requirement means that Database Ingestion and Replication can capture change data from the read-write primary replica but not from the read-only secondary replica.
•If a Microsoft SQL Server source uses the Always Encrypted method to encrypt column data, the CDC script that is generated from the CDC Script field on the Source page in the database ingestion and replication task fails to run. This problem is caused by a SQL Server limitation. This problem does not occur with Transparent Data Encryption (TDE).
•Database Ingestion and Replication supports schema drift options for Microsoft SQL Server sources in database ingestion and repoliclationincremental load jobs. The following limitations apply:
- Microsoft SQL Server does not support renaming tables and columns for which Change Data Capture (CDC) is enabled.
- Microsoft SQL Server does not support changing primary keys for CDC tables.
- When Database Ingestion and Replication reads change data directly from the CDC tables, the CDC tables do not change once they are created. DDL changes that occur on the source tables are replicated as nulls in the CDC tables. To replicate the DDL changes to the CDC tables, set the pwx.custom.sslr_cdc_manage_instances custom property to 1 on the Source page of the task wizard. This custom property enables the modification of the CDC tables to reflect DDL changes on the source tables and enhance DML capture. You must have the db_owner role to enable the active management of the CDC tables.
•If source table partition changes cause rowset IDs to change, Database Ingestion and Replication can process the changes to enable database ingestion and replication jobs to continue capturing DML changes from the tables.
•For incremental load and initial and incremental load jobs that use the query-based CDC method, the following limitations apply:
- A primary key is required in each selected source table. If a primary key is not present in a source table, change data capture ignores the table and continues processing with the rest of the selected source tables. If none of the source tables have a primary key, the job will fail.
- Query-based CDC does not capture Delete operations.
- All Insert and Update operations are treated as Upserts and displayed in the monitoring interface and logs as Updates.
- Data replication from large-object (LOB) columns is not supported. If the source table contains LOB columns, Database Ingestion and Replication propagates nulls for these columns.
- If a Daylight Savings Time change or a time zone change is detected at the start of a particular cycle or when the job resumes from a failed or stopped state, Database Ingestion and Replication will resume and process the changes that occurred in that cycle.
•Database ingestion and replication jobs can replicate data from Microsoft SQL Server large-object (LOB) columns if you select Include LOBs under Advanced on the Source page of the task wizard.
The supported target types depend on the load type:
- For initial load jobs: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server.
- For incremental load load jobs: Azure Event Hubs, Databricks, PostgreSQL, Snowflake, and SQL Server.
- For combined initial and incremental load jobs: Databricks, PostgreSQL, Snowflake, and SQL Server.
LOB data types are GEOGRAPHY, GEOMETRY, IMAGE, VARBINARY(MAX), VARCHAR(MAX), NVARCHAR(MAX), TEXT, NTEXT, and XML. LOB data might be truncated before being written to the target. The truncation point, depends on the data type, target type, and load type. For more information, see the "Include LOBs" description in Configuring the source.
•Database Ingestion and Replication does not replicate data from SQL Server non-persisted computed columns. For initial load jobs and for incremental load and combined initial and incremental load jobs that use the log-based or query-based CDC method, persisted computed columns are replicated to the target. For incremental load and combined initial and incremental load jobs that capture changes from CDC Tables only, persisted computed columns are replicated as NULLs or as an empty value, depending on column nullability.
•Database ingestion and replication initial load jobs that have a SQL Server source and a SQL Server target and include a sql_variant source column convert the sql_variant data to hexadecimal format on the target. To convert data from hexadecimal format to varbinary format, run the following query:
SELECT <column_name>, CONVERT(varbinary,<column_name>) from <table_name>;
Replace <column_name> and <table_name> with the actual target column and table names.
•The SQL Server Hierarchyid data type is not supported in database ingestion and replication incremental load and combined initial and incremental load jobs that have a Snowflake target. Database Ingestion and Replication will propagate nulls for columns that have this data type. For more information, see Default Data Type Mappings.
•If you use a Secure Agent group with multiple agents and the active agent goes down unexpectedly, database ingestion and replication initial load, incremental load, and combined initial and incremental load jobs can automatically switch over to another available agent in the Secure Agent group. The automatic switchover occurs after the 15 minute heartbeat interval elapses. Automatic switchovers are supported for initial load jobs with any source type and a target type other than Kafka. They're also supported for incremental load and combined load jobs that have a SQL Server source, subject to following limitations:
- The job cannot have a Kafka target.
- The job cannot have persistent storage enabled.
- The job cannot use the Query-based CDC method to capture changes from the SQL Server source.
•If a SQL Server source is in an Always On availability group, database ingestion and replication incremental load and combined load jobs can capture change data from transaction logs or CDC tables on the primary node or a secondary node. Also, if a node becomes unavailable, database ingestion and replication jobs can fail over to a primary or secondary database in an availability replica to continue processing, provided that you've configured the SQL Server connection to point to an availability group listener. An availability group listener is a virtual network name (VNN) that Database Ingestion and Replication can use to access a database in an availability replica of an availability group, without having to know the SQL Server physical instance name.
•After a database ingestion and replication job with an SQL Server source has run, if you select additional source columns for replication and redeploy the task, the job does not immediately re-create the target table with the additional columns or replicate data for them. However, an incremental load or combined initial and incremental load job will add the newly selected columns to the target and replicate data to them when it processes the next new DML change record, provided that you set the schema drift Add Column option to Replicate. An initial load job will add the newly selected columns to the target and replicate data to them the next time the job runs.
•SQL Server memory-optimized tables are supported as sources in database ingestion and replication incremental load and combined initial and incremental load jobs that use the Query-based CDC method. However, memory-optimized tables are not supported as sources in jobs that use the CDC Tables or Log-based CDC method.
•If your SQL Server sources include persisted computed columns, database ingestion and replication incremental load and combined load jobs can replicate the computed column expression values, regardless of which CDC method you use.
Change capture mechanisms for SQL Server sources
Database Ingestion and Replication provides multiple change capture mechanisms for capturing change data from a SQL Server source and applying the data to a target.
Log-based change data capture with transaction log and CDC tables
Database ingestion and replication incremental load jobs that use log-based CDC capture DML and DDL changes by parsing records from the active transaction log and by reading change records directly from the CDC tables. Change data is read from the active transaction log if the required restart point (LSN) is available there. If the capture start point pre-dates records in the transaction log or in certain other situations, Database Ingestion and Replication automatically transitions to reading change data from the CDC tables. After reading changes from the CDC tables, Database Ingestion and Replication switches back to reading changes from the active transaction log in a transparent manner.
The following components are involved in log-based change capture processing:
•The Informatica Intelligent Cloud Services Secure Agent, which can run on a Linux or Windows box that is local to or remote from the SQL Server instance.
•The dbo.$InfaXact table, which is associated with an open transaction and temporarily stores some information about the currently running capture job. If the table does not exist, the change capture process creates it.
•The Log Reader and its subcomponents, which are required to parse and read DML and DDL change records from the transaction log.
•The SQL Server fn_dblog() function, to which the Log Reader loops calls to retrieve log records from the active part of the transaction log file for the source database.
•The SQL Server CDC tables: cdc.lsn_time_mapping, cdc.ddl_history, and cdc.<capture instance>_CT. SQL Server creates these tables in the CDC schema when the source database is enabled for CDC. One or two cdc.<capture instance>_CT tables are created for each CDC-enabled source table to store DML changes captured by the native log reader.
•The CDC Capture-Instance Table Reader, which reads change records from the cdc. <capture_instance>_CT tables.
•The CDC Writer, which writes the changes to the target.
The following image shows the SQL Server log-based change data capture components and data flow:
1The Log Reader process reads log records to capture DML and DDL changes in committed transactions.
- The Transaction Manager subcomponent interacts with the dbo.$InfaXact table associated with the open transaction.
- The Log Parser subcomponent loops calls to the fn_dblog() function to read log records from the active transaction log for the selected source tables that are enabled for CDC.
Capture processing continues until the committed transactions are complete, the capture process is stopped or interrupted by a fatal error, or a switch to reading change records from a cdc.<capture_instance>_CT table is triggered.
2In certain situations, Database Ingestion and Replication automatically switches to reading changes from a cdc. <capture instance>_CT table by using the CDC Capture-Instance Table Reader.
Processing switches to the cdc.<capture instance>_CT table under the following conditions:
- During initialization and startup of the capture process if the start LSN is earlier than the LSN of the active transaction log and exists within the cdc.<capture instance>_CT table.
- Transaction log truncation occurs, causing data loss. Normally, SQL Server prevents log truncation when an open transaction is associated with dbo.$InfaXact . However, a lost network connection might end a transaction, causing log truncation.
Note: Routine log backups can also truncate the transaction log. To prevent data loss, transactions that use the dbo.$InfaXact table lock the active transaction log.
- Off-row LOBs or primary row records are truncated. In this situation, the Log Reader selectively patches the log records by reading column information from the cdc.<capture instance>_CT tables.
Note: Records read from fn_dblog() function are truncated to 8000 bytes.
3The Log Reader and CDC Capture-Instance Table Reader send change records to the CDC Writer.
4The CDC Writer formats the change records and applies them to the target.
Change capture with CDC tables only
Database ingestion and replication incremental load jobs can capture changes directly from the SQL Server CDC tables without using the transaction log.
The following components are involved in the change capture processing that uses CDC tables only:
•The Informatica Intelligent Cloud Services Secure Agent, which can run on a Linux or Windows box that is local to or remote from the SQL Server instance.
•The SQL Server CDC tables: cdc.lsn_time_mapping, cdc.ddl_history, and cdc.<capture instance>_CT. SQL Server creates these tables in the CDC schema when the source database is enabled for CDC. One or two cdc.<capture instance>_CT tables are created for each CDC-enabled source table to store DML changes captured by the native log reader.
•The CDC Capture-Instance Table Reader, which reads change records from the cdc. <capture_instance>_CT tables.
•The CDC Writer, which writes the changes to the target.
The following image shows the SQL Server log-based change data capture components and data flow:
1The CDC Capture-Instance Table Reader reads changes from a cdc. <capture instance>_CT table.
2The CDC Capture-Instance Table Reader sends change records to the CDC Writer.
3The CDC Writer formats the change records and applies them to the target.
Query-based change capture
Database ingestion and replication jobs capture insert and update changes in the source by querying a timestamp column that is updated when a change occurs. Configuration of the source is limited to adding the common CDC query column to each source table. The query-based CDC method uses the query column to identify the rows that changed since the beginning of a specified CDC interval.
To implement the query-based change capture, set the following options on the Source page of the task wizard:
•CDC Method. Select Query-based to enable this capture method.
•CDC Query Column Name. The case-sensitive name of the CDC query column in the source table. The supported SQL Server data types for the query column are DATETIME and DATETIME2. The column must be present in the source table.
•CDC Interval. The frequency of a query-based change data capture cycle. Default is 5 minutes.
•Initial Start Point for Incremental Load. The point from which you want to start the change capture cycle. Default is Latest Available.
After the CDC interval elapses, Database Ingestion and Replication uses a SQL query with a WHERE clause that references the CDC query column to identify the rows that received changes during the CDC interval. The change data is captured and applied to the target.
If the source tables selected for the query-based CDC do not have the CDC query column, change data capture will ignore these tables and will continue with the rest of the tables. For the tables that are skipped, corresponding tables generated in the target database will be empty. If none of the source tables have a CDC query column, the job will fail at runtime.
The following image shows the SQL Server query-based change data capture components and data flow:
1After the CDC interval elapses, Database Ingestion and Replication executes a SQL query in the source database that uses the CDC query column to extract the change data.
2Change records are sent to the CDC Writer.
3The CDC Writer formats the change records and applies them to the target.
MongoDB sources
To use MongoDB sources in database ingestion and replication tasks, review the following considerations.
Source preparation
•Database Ingestion and Replication uses MongoDB change streams to access real-time data changes on a single collection, a database, or an entire deployment.
•For incremental load jobs, ensure that the database user you specify in the MongoDB connection has the readAnyDatabase role and the privileges that grant the changeStream and find actions.
• To open a change stream on a single database, applications must have privileges that grant changeStream and find actions on all non-system collections in the database. Use the following statements to grant these privileges:
•To open a change stream on an entire deployment, applications must have privileges that grant changeStream and find actions on all non-system collections for all databases in the deployment. Use the following statements to grant these privileges:
•The database ingestion and replication task moves the MongoDB data to the target as key-value pairs, where the key is the ObjectID and the value is the JSON string that comprises a BSON document.
•For MongoDB sources, data type mappings do not occur. All data is persisted on the target as string data.
•In incremental load operations, the data change at the source is tracked by means of a unique key (ObjectID) and the same changed JSON string is applied to the target.
•Database Ingestion and Replication does not support time series collections in incremental load jobs that have MongoDB sources.
•In incremental load operations, Database Ingestion and Replication retrieves the change records from the date and time specified as the restart point. For MongoDB sources, the default value for the restart point is the current time. You can specify a different date and time in Greenwich Mean Time (GMT).
•If schema drift occurs on the MongoDB source, the data in BSON documents that are sent to the target reflect the schema changes. However, Database Ingestion and Replication does not specifically detect and report the schema changes.
MySQL sources
To use MySQL sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•To deploy and run a database ingestion and replication task that includes a MySQL source, the source connection must specify a database user who has the required privileges. Use the following SQL statements to grant these privileges to the user:
GRANT SELECT ON database_name.* TO 'user_name'@'%'; GRANT SELECT ON TABLE database_name.* TO 'user_name'@'%';
For incremental load jobs, grant the following additional privileges to the user:
/* To fetch table and column details from system tables */ GRANT SELECT ON sys.* TO 'user_name'@'%';
/* To allow the user to monitor binary log information such as file name, position, and GTID */ GRANT REPLICATION CLIENT ON *.* TO 'user_name;
/* Required for a multi-node MySQL cluster with master and slave configuration */ GRANT REPLICATION SLAVE ON *.* TO 'user_name;
•For incremental load jobs, set the default_storage_engine variable to InnoDB in the [mysqld] section of the my.cnf file. Then restart the MySQL server. To verify the default_storage_engine setting, use the following statement:
SHOW VARIABLES LIKE '%engine%';
In the output, make sure that the default_storage_engine variable is set to InnoDB.
•Database Ingestion and Replication uses binary log files generated by MySQL for capturing change events at the source. The binlog is a set of log files that contain information about data modifications made to a MySQL server instance.
To enable binary logging, start the server with the --log-bin option or use the key-value log-bin="[HostName]-bin" setting in the my.cnf file. Replace [HostName] with the name of your host. Then restart the MySQL server. To verify that binary logging is enabled, use the following statement:
SHOW VARIABLES LIKE 'log_bin';
In the output, make sure that the log_bin variable is set to ON.
•For incremental load jobs, enable row-based logging by using the following statement:
SET GLOBAL binlog_format = 'ROW';
To verify that row-based logging is enabled, use the following statement:
SHOW VARIABLES LIKE 'binlog_format';
In the output, make sure that the binlog_format system variable is set to ROW.
•To enable database ingestion and replication incremental load and combined load jobs to process Updates and Deletes, set the following system variable for binary logging:
binlog_row_image=full
This setting causes both before images and after images to be logged to the binary log for all columns. It's applicable to any MySQL source type and version that Database Ingestion and Replication supports.
To verify this setting, use the following statement:
SHOW VARIABLES LIKE 'binlog_row_image';
•Database Ingestion and Replication can read the binlog files in either one of the following ways:
- Global Transaction ID (GTID) - If you enable MySQL GTID mode, every transaction in MySQL is assigned a GTID to uniquely identify the transaction. Use GTID mode in a multi-cluster environment.
- Binlog file name and position - All transactions in MySQL are saved as anonymous and fetched by using the binlog file name and position. Do not use this method if the MySQL GTID mode is enabled or if you have a multi-cluster environment. In a multi-cluster environment, the binlog file position might vary if a failover occurs, causing inconsistent data.
To enable the GTID mode, use the following statements on each MySQL server:
SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = WARN; SET @@GLOBAL.ENFORCE_GTID_CONSISTENCY = ON; SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE; SET @@GLOBAL.GTID_MODE = ON_PERMISSIVE;
On each server, wait until the status variable 'Ongoing_anonymous_transaction_count' is 0 (zero). You can check the status variable's value by using the following statement:
SHOW STATUS LIKE 'Ongoing_anonymous_transaction_count';
When the count is 0, use the following statement to enable GTID mode:
SET @@GLOBAL.GTID_MODE = ON;
• You must download the MySQL driver file and copy it to a specific installation subdirectory to be able to connect to a MySQL source, if you use the following load type and source edition combinations:
- Incremental load jobs that have MySQL Community Edition or MySQL Enterprise Edition sources
- Initial load jobs that have MySQL Community Edition sources
- Initial load jobs that have Amazon Relational Database Service (RDS) for MySQL sources
Note: You do not need to download the driver if you only run initial load jobs that have MySQL Enterprise Edition sources.
Download the MySQL JDBC driver file, mysql-connector-java-<version>.jar, from the MySQL Community Downloads website and copy it to the following directory:
If you want to be able to test a connection in Administrator after defining connection properties, you must also set the MySQL_JDBC_DRIVER_JARNAME parameter for the Data Integration Server service in the Secure Agent's system configuration details. After the test, you can remove the parameter. This parameter is not used when you use the connection to create a database ingestion and replication task or run the associated job.
Usage considerations
•Database Ingestion and Replication supports MySQL, Amazon Aurora MySQL, Cloud SQL for MySQL, and RDS for MySQL sources for initial load, incremental load, and combined initial and incremental load jobs. Also, Database Ingestion and Replication supports Azure Database for MySQL sources with Snowflake targets for all load types and with Confluent Kafka for incremental loads.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key.
•If you update the primary key value in the MySQL source table for a record that does not exist in the target table, the record is not replicated to the target. However, the monitoring interface increments the Update count to include the primary key update. The data will replicate to the target if the record already exists in the target table before the update of the primary key value.
•Database ingestion and replication jobs that contain a MySQL source column with the SET or ENUM data type replicate the SET and ENUM column data as numeric values to the target. For initial load jobs, you can set the mysql.set.and.enum.as.numeric custom property to false on the Source page of the task wizard to replicate the SET or ENUM data in string or varchar format. The default value is true, which causes Database Ingestion and Replication to replicate the SET or ENUM data as numeric values.
Note: For incremental load jobs, you cannot toggle between the numeric and string or varchar representation of the SET or ENUM column data. If you set the mysql.set.and.enum.as.numeric custom property to false and run an initial load job, followed by an incremental load job, Database Ingestion and Replication replicates the SET and ENUM data to the target as numeric values only.
•Database Ingestion and Replication does not support the following MySQL data types:
- BLOB
- JSON
- LONGBLOB
- LONGTEXT
- MEDIUMBLOB
- MEDIUMTEXT
- TEXT
- TINYBLOB
- TINYTEXT
If a database ingestion and replication task specifies a source schema that includes columns with the JSON data type, deployment of the task ignores the JSON columns and does not create corresponding columns on the target. For other unsupported data types, database ingestion jobs propagate nulls.
For information about the default mappings of supported source data types to target data types, see Default Data Type Mappings.
Netezza sources
To use Netezza sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•Download and install the Netezza JDBC driver.
1Download the Netezza JDBC driver from the IBM website.
2Copy the Netezza JDBC driver jar file, nzjdbc.jar, to the following directory:
•To deploy and run a database ingestion and replication task that includes a Netezza source, the source connection must specify a database user who has the privileges that are required to perform an initial load operation. Configure SELECT permissions for the Netezza user account on the following system views:
- _V_JDBC_SCHEMA1
- _V_JDBC_SCHEMA3
- _V_ODBC_TABLES3
- _V_ODBC_COLUMNS3
- _V_ODBC_PRIMARYKEYS3
Usage considerations
•Database Ingestion and Replication does not support the following Netezza data type:
- ST_GEOMETRY
Database ingestion and replication jobs either fail to deploy or propagate nulls for columns that have this data type.
For information about the default mappings of supported source data types to target data types, see Default Data Type Mappings.
Oracle sources
To use Oracle sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•Define the following system environment variables on the Linux or Windows system where the Secure Agent runs:
- ORACLE_HOME environment variable. Points to the Oracle client installation directory, one level up from the bin directory on Windows or lib directory on Linux. This environment variable is not required. However, if you do not define it, you'll need to specify the full Oracle client installation path when you define other environment variables or Secure Agent properties that specify a path to a subdirectory.
- TNS_ADMIN environment variable. If you specify a TNS name in the Database Connect String property of the Oracle Database Ingestion connection properties, use this environment variable to point to the directory location of the tsnnames.ora file when the file is not in the default $ORACLE_HOME/network/admin directory. The tnsnames.ora file, along with the Oracle Call Interface (OCI), is used to communicate with the Oracle source database.
Note: In Administrator, you can set the ociPath property for the Database Ingestion agent service (DBMI agent) to point to the OCI library that contains the oci.dll or libcIntsh.so file. The OCI library is used by database ingestion and replication CDC tasks to connect to Oracle. By default, Oracle uses the ociPath value of $ORACLE_HOME/lib on Linux or %ORACLE_HOME%\bin on Windows.
•Make sure that the Database Ingestion and Replication user has the Oracle privileges that are required for the database ingestion and replication load type to be performed.
Note: For combined initial and incremental loads that use log-based CDC, make sure that the GRANT FLASHBACK privilege is issued for each selected source table or use the ANY TABLE option. Database Ingestion and Replication uses an Oracle Flashback Query, which consists of a SELECT AS OF scn statement, to query for row data from source tables in the Oracle database. Oracle requires the GRANT FLASHBACK privilege for this query to be used.
•If you use Oracle 11.2.04, set the Oracle COMPATIBLE initialization parameter to 11.2.04 to ensure that Oracle has all of the most current Redo Logs fixes for that release.
•Database ingestion and replication jobs that use log-based CDC require read access to Oracle online and archive redo logs to read incremental change data. If the redo logs are remote from the on-premises system where the Secure Agent runs, make sure that read access to the logs is provided, for example, by using Oracle Automatic Storage Management (ASM), mounting the logs to a network file system (NFS), or configuring BFILE access to logs that are on the Oracle file system.
•If you plan to read data from redo log files in Oracle ASM, Informatica recommends that you set the sqlnet.recv_timeout parameter in the local sqlnet.ora file to less than 5 minutes. This parameter specifies the duration of time that the Oracle client waits for a response from ASM before a query times out. Network interrupts and other factors can occasionally make Oracle connections unresponsive. Setting this value ensures that the reader can respond and recover from any such situation in a timely manner.
•Ensure that the Oracle Database Client or Instant Client is installed and configured on the Secure Agent server for the Secure Agent to communicate with Oracle. If you do not already have an Oracle client installed, you can download a client and access installation information from the Oracle web site, or ask your Oracle DBA to download and configure an Oracle client.
•For incremental load or combined initial and incremental load operations that use log-based CDC, perform the following prerequisite tasks in Oracle:
- Enable ARCHIVELOG mode for the Oracle database. If the database is not in an Amazon RDS environment, issue the following SQL statements:
SHUTDOWN IMMEDIATE; STARTUP MOUNT; ALTER DATABASE ARCHIVELOG; ALTER DATABASE OPEN; SHUTDOWN IMMEDIATE; STARTUP;
For an Amazon RDS for Oracle databases, set the backup retention period to place the database in ARCHIVELOG mode and enable automated backups.
- Define an archive log destination.
- Enable Oracle minimal global supplemental logging on the source database.
- If your Oracle source tables have primary keys, ensure that supplemental logging is enabled for all primary key columns. For source tables that do not have primary keys, ensure that supplemental logging is enabled for all columns from which change data will be captured.
Note: When you create a database ingestion and replication task, you have the option of generating a script that implements supplemental logging for all columns or only primary key columns for the selected source tables.
- Ensure that the Oracle MAX_STRING_SIZE initialization parameter is not set to EXTENDED. If it is set to EXTENDED, Database Ingestion and Replication will not be able to replicate inserts and updates for tables containing columns defined with large (extended size) VARCHAR2, NVARCHAR2, or RAW columns.
If you do not have the authority to perform these tasks, ask your Oracle database administrator to perform them. For more information, see the Oracle documentation.
•For incremental load operations that use query-based CDC, the source table must contain the CDC query column that is used to indicate the changed rows. You must add the query column to the source tables before creating the database ingestion and replication task. The supported Oracle data type for the query column is TIMESTAMP.
If the source tables selected for the query-based CDC do not have the CDC query column, change data capture will ignore these tables and will continue with the rest of the tables. For the tables that are skipped, corresponding tables generated in the target database will be empty. If none of the source tables have a CDC query column, the deployment of the job will fail.
Amazon Relational Database Service (RDS) for Oracle source preparation:
1Create the ONLINELOG_DIR and ARCHIVELOG_DIR directories that will hold the online and archive redo logs, respectively, on the RDS file system. Use the following exec statements:
4In the Amazon RDS console, set the backup retention period for the source database to a value greater than zero to enable automated backups of the database instance.
Note: This step enables ARCHIVELOG mode for the database.
5Ensure that supplemental logging is enabled at the database level. Use the following statement:
When you create a database ingestion and replication task, you can generate a script to enable supplemental logging for the selected source tables.
6Optionally, in the Amazon RDS console, you can create a parameter group and define the cache sizes of the default buffer pool. The default buffer pool holds buffers that use the primary database block size. Use the following DB_CACHE_SIZE parameter values:
- DB_2K_CACHE_SIZE
- DB_4K_CACHE_SIZE
- DB_16K_CACHE_SIZE
- DB_32K_CACHE_SIZE
Then select the parameter group for the source database.
Usage considerations
•For incremental load operations with Oracle sources, Database Ingestion and Replication provides alternative capture methods for capturing change data from the source and applying the data to a target. The available change capture methods are:
- Log-based change data capture. Database Ingestion and Replication reads data changes from Oracle redo logs. This method requires users to have extended privileges.
- Query-based change data capture. Change data capture uses a SQL statement with a WHERE clause that references a common CDC query column to identify the rows with insert and update changes. Configuration of the source database is limited to adding the CDC query column to each source table. Users must have at least read only access to the source tables.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key. Exception: For Oracle query-based CDC, a primary key is required in each source table.
•For Oracle sources that use the multitenant architecture, the source tables must reside in a single pluggable database (PDB) within a multitenant container database (CDB).
•You can use Oracle Transparent Data Encryption (TDE) to encrypt data in tablespaces that contain Oracle source tables for incremental load processing. Database Ingestion and Replication supports storing the master encryption key in a TDE keystore that is in a file system, in ASM, or in an external hardware security module (HSM) that supply PKCS11 interfaces, such as Oracle Key Vault (OKV). For more information, contact Informatica Global Customer Support.
•If Oracle source CHAR or VARCHAR columns contain nulls, the database ingestion and replication job does not delimit the null values with double-quotation (") marks or any other delimiter when writing data to a Amazon S3, Flat File, Microsoft Azure Data Lake, or Microsoft Azure Synapse Analytics target.
•Database Ingestion and Replication can process data across a RESETLOGS boundary. To avoid the source and targets becoming out of sync, Informatica recommends that you stop capture processing before performing a RESETLOGS and then restart capture processing after the RESETLOGS event. Otherwise, the capture process might send data to the target that is subsequently reverted by the RESETLOGS event, causing the source and target to become out of sync.
•Alternative strategies for accessing the Oracle redo logs are available. For more information, see Oracle log access methods for CDC.
•If a database ingestion and replication incremental load or combined initial and incremental load task contains an Oracle source table name or one or more column names that are longer than 30 characters, Oracle suppresses supplemental logging for the entire table, including primary keys and foreign keys. As a result, most operations on the table fail. This problem is caused by an Oracle restriction. In this situation, exclude the table from capture processing or rename the long table and column names to names of 30 characters or less.
•Database Ingestion and Replication can replicate data from Oracle BLOB, CLOB, NCLOB, LONG, LONG RAW, and XML columns to Amazon Redshift, Amazon S3, Databricks, Google Big Query, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Microsoft Fabric OneLake, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server targets. You must select Include LOBs under Advanced on the Source page when you configure the task. LOB column data might truncated on the target if it is greater in size than the byte limit allowed by target type.
Note: Columns that have the LONG, LONG RAW, and XML data types are supported in initial load jobs and in incremental load and combined load jobs that use the Query-based CDC method. However, jobs that use the Log-based CDC method do not replicate data from these types of columns to the generated target table.
•Database Ingestion and Replication does not support the following Oracle source data types with any target type or load type:
- "ANY" types such as ANYTYPE, ANYDATA, ANYDATASET
- Extended types
- INTERVAL
- JSON
- BFILE type for storing LOB data externally
- UROWID
- Spatial types such as SDO_GEOMETRY
- User-defined types such as OBJECT, REF, VARRAY, nested table types
Source columns that have unsupported data types are excluded from the target definition.
For information about the default mappings of supported Oracle data types to target data types, see Default Data Type Mappings.
•To use the Oracle TIMESTAMP WITH LOCAL TIME ZONE data type, set the DBMI_ORACLE_SOURCE_ENABLE_TIMESTAMP_WITH_LOCAL_TZ environment variable to true for the Database Ingestion agent service. In Administrator, open your Secure Agent and click Edit . Under Custom Configuration Details, add the environment variable with the following details:
•Do not use custom data-type mapping rules to map an Oracle source RAW column to a target CHAR or VARCHAR column. Otherwise, the deployment of the database ingestion and replication task might fail.
•Database Ingestion and Replication does not support invisible columns in Oracle source columns, regardless of the target type. For these columns, database ingestion and replication incremental load jobs and combined initial and incremental load jobs propagate nulls to the corresponding target columns.
•If you update the primary key value in the Oracle source table for a record that does not exist in the target table, the record is not replicated to the target. However, the monitoring interface increments the Update count to include the primary key update. The data will replicate to the target if the record already exists in the target table before the update of the primary key value.
•If an Update to an Oracle table does not change an existing column value, the Update count in the monitoring details for the table is still incremented but no Update row is applied to the target. Database Ingestion and Replication ignores Update rows that do not actually change values. Also, for most database targets, Database Ingestion and Replication does some aggregation of change records at a micro-batch level before writing changes to the targets. This situation can also lead to a mismatch between the Update count in the monitoring statistics and the rows applied to the target.
•The supplemental logging settings for tables might be ignored by Oracle if the table name or any table column name is longer than 30 characters. In this case, the results of database ingestion and replication incremental load or combined load jobs are unpredictable.
•Database Ingestion and Replication does not support derived columns in jobs that have an Oracle source.
•For Oracle combined initial and incremental load jobs, Oracle Flashback queries are used to get committed data that was current at a specific point in the change stream. Ensure that no source table is truncated during the initial load period. If truncation occurs, any DDL change performed during a flashback query causes the query to fail.
•If you use a Secure Agent group with multiple agents and the active agent goes down unexpectedly, database ingestion and replication initial load, incremental load, and combined initial and incremental load jobs can automatically switch over to another available agent in the Secure Agent group. The automatic switchover occurs after the 15 minute heartbeat interval elapses. Automatic switchovers are supported for initial load jobs with any source type and a target type other than Kafka. They're also supported for incremental load and combined load jobs that have an Oracle source, subject to following limitations:
- The job cannot have a Kafka target.
- The job cannot have persistent storage enabled.
- The job cannot use the Query-based CDC method to capture changes from the Oracle source.
•For incremental load and combined initial and incremental load jobs that use the query-based CDC method, the following limitations apply:
- A primary key is required in each selected source table. If a primary key is not present in a source table, change data capture ignores the table and continues processing with the rest of the selected source tables. If none of the source tables have a primary key, the job will fail.
- Query-based CDC does not capture Delete operations.
- All Insert and Update operations are treated as Upserts and displayed in the monitoring interface and logs as Updates.
- If a Daylight Savings Time change or a time zone change is detected at the start of a particular cycle or when the job resumes from a failed or stopped state, Database Ingestion and Replication will resume and process the changes that occurred in that cycle. You must restart the Oracle database to apply a Daylight Savings Time change or a time zone change.
- Database ingestion and replication combined initial and incremental load jobs that have Oracle sources can read changes from copies of the archive redo logs. You must set the Reader Mode property to ARCHIVECOPY in the Oracle Database Ingestion connection properties and also set the source custom property pwx.cdcreader.oracle.reader.additional with the dir and file parameters. The dir parameter points to the name of the base directory that the CDC log reader scans for the archive log copies, and the file parameter specifies a mask that is used to filter the log copies.
•After a database ingestion and replication job with an Oracle source has run, if you select additional source columns for replication and redeploy the task, the job does not immediately re-create the target table with the additional columns or replicate data for them. However, an incremental load or combined initial and incremental load job will add the newly selected columns to the target and replicate data to them when it processes the next new DML change record, provided that you set the schema drift Add Column option to Replicate. An initial load job will add the newly selected columns to the target and replicate data to them the next time the job runs.
•Database Ingestion and Replication can capture change data from Oracle Exadata machines but does not support Oracle Exadata Hybrid Columnar Compression (EHCC).
•Database Ingestion and Replication jobs of any load type cannot use Oracle synonyms as sources.
•If LOB data is incompletely logged for an Oracle source for any reason, database ingestion and replication jobs that process the source and have the Include LOBs option enabled will issue a warning message such as:
PWX-36678 ORAD WARN: Failed to fetch LOB data for table <schema>.<table> column <column> to resolve partially logged data.
Usually, data is incompletely logged because it was appended to an existing LOB column. This issue can occur for any load type.
To resolve the issue, specify the pwx.cdcreader.oracle.option.additional custom property with the LOB_FETCHBACK=Y parameter on the Source page of the task wizard. This setting causes the current data to be fetched directly from the database, instead of from the logs. To successfully use this solution, the user must be granted the SELECT ON ANY TABLE privilege.
•Database Ingestion and Replication BFILE access to Oracle data using directory objects can support multiple log locations, such as those in environments with primary and standby databases, RDS database instances, or a Fast Recovery Area to which the USE_DB_RECOVERY_FILE_DEST parameter points. Database Ingestion and Replication can automatically find the logs by querying for all directory objects that begin with ARCHIVELOG_DIR and ONLINELOG_DIR and end with any suffix, for example, ARCHIVELOG_DIR_01, ARCHIVE_LOG_DIR_02, and so on. You can override the ARCHIVELOG_DIR and ONLINELOG_DIR names by using the custom properties pwx.cdcreader.oracle.database.additional BFILEARCHIVEDIR=<directory_object> and pwx.cdcreader.oracle.database.additional BFILEONLINEDIR=<directory_object>.
If you grant the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges to database users who run database ingestion and replication jobs, the directory objects can be created at runtime, as needed. For example, if multiple archive and online log destinations exist in a database, the directory objects could be created with the following naming convention:
- <ARCHIVEDIRNAME>_01
- <ARCHIVEDIRNAME>_02
- <ONLINEDIRNAME>_01
- <ONLINEDIRNAME>_02
Database Ingestion and Replication performs no cleanup processing on these generated directory objects.
If you use the USE_DB_RECOVERY_FILE_DEST parameter and the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges have not been granted, your DBA must create the directory objects daily or weekly, before your database ingestion and replication jobs run, by using a script such as:
create or replace directory ARCHIVELOG_DIR_2024_08_19 as '<DB_RECOVERY_FILE_DEST>/2024_08_19'
If you use the USE_DB_RECOVERY_FILE_DEST parameter and the database user has the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges, the directory objects are created as needed at runtime and dropped after 14 days. These directory objects have the naming convention <ARCHIVEDIRNAME>_YYYY_MM_DD.
•Database ingestion and replication incremental load or combined load jobs that have an Oracle source and a file-based target, such as Azure Data Lake Storage Gen2, Amazon S3, Microsoft Fabric OneLake, or Google Cloud Storage, replicate only updated column values and primary key column values to the target if you set the CDC Script field to Enable CDC for primary key columns in the source properties. For other columns, the jobs replicate nulls to the target because supplemental logging of data to redo logs is not enabled for those columns.
If you need all column values for updates replicated to your file-based target, set the CDC Script field to Enable CDC for all columns.
Gathering Information About the Database Ingestion and Replication environment
Before you start creating database ingestion and replication tasks, gather the following information:
What type of load operation do you plan to perform: an initial load (point-in-time bulk load), incremental load (only the changes), or combined initial and incremental load (initial load followed by incremental load)?
Do you need to capture change data from tablespaces that use Oracle Transparent Data Encryption (TDE)? If yes, what are the TDE wallet directory and password?
Are the redo logs in an Oracle Automatic Storage Management (ASM) environment? If you plan to connect to an ASM instance to read redo logs, are you allowed to create a login user ID for ASM that has SYSDBA or SYSASM authority?
If you do not have the authority to read the redo logs directly, can the archived redo log files be copied to shared disk or to a file system from which you can access them?
Can you make your archived redo logs available for diagnostic use by Informatica Global Customer Support, if necessary, to diagnose an error or anomaly during CDC processing?
Do the source tables contain columns that have unsupported data types? To determine which data types are not supported for your source types, see the source-specific topics under "Database Ingestion and Replication source considerations" in the Database Ingestion and Replication help.
Are you allowed to create a new Oracle user and assign the privileges that Database Ingestion and Replication requires to that user? Determine the user name to use.
To deploy and run a database ingestion and replication task that has an Oracle source, the source connection must specify a Database Ingestion and Replication user who has the privileges required for the ingestion load type.
Privileges for incremental load processing with log-based CDC
Note: If the Oracle logs are managed by ASM, the user must have SYSASM or SYSDBA authority.
For a database ingestion and replication task that performs an incremental load or combined initial and incremental load using the log-based CDC method, ensure that the Database Ingestion and Replication user (cmid_user) has been granted the following privileges:
GRANT CREATE SESSION TO <cmid_user>;
GRANT SELECT ON table TO <cmid_user>; -- For each source table created by user
-- The following grants are required for combined initial and incremental loads only. Do not -- use ANY TABLE unless your security policy allows it. GRANT EXECUTE ON DBMS_FLASHBACK TO <cmid_user>; GRANT FLASHBACK ON table|ANY TABLE TO <cmid_user>;
-- Include the following grant only if you want to Execute the CDC script for enabling -- supplemental logging from the user interface. If you manually enable supplemental -- logging, this grant is not needed. GRANT ALTER table|ANY TABLE TO <cmid_user>;
GRANT SELECT ON DBA_CONSTRAINTS TO <cmid_user>; GRANT SELECT ON DBA_CONS_COLUMNS TO <cmid_user>; GRANT SELECT ON DBA_INDEXES TO <cmid_user>; GRANT SELECT ON DBA_LOG_GROUPS TO <cmid_user>; GRANT SELECT ON DBA_LOG_GROUP_COLUMNS TO <cmid_user>; GRANT SELECT ON DBA_OBJECTS TO <cmid_user>; GRANT SELECT ON DBA_OBJECT_TABLES TO <cmid_user>; GRANT SELECT ON DBA_TABLES TO <cmid_user>; GRANT SELECT ON DBA_TABLESPACES TO <cmid_user>; GRANT SELECT ON DBA_USERS TO <cmid_user>;
GRANT SELECT ON "PUBLIC".V$ARCHIVED_LOG TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$CONTAINERS TO <cmid_user>; -- For Oracle multitenant environments GRANT SELECT ON "PUBLIC".V$DATABASE TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$DATABASE_INCARNATION TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$ENCRYPTION_WALLET TO <cmid_user>; -- For Oracle TDE access GRANT SELECT ON "PUBLIC".V$LOG TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$LOGFILE TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$PARAMETER TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$PDBS TO <cmid_user>; -- For Oracle multitenant environments GRANT SELECT ON "PUBLIC".V$SPPARAMETER TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$STANDBY_LOG TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$THREAD TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$TRANSACTION TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$TRANSPORTABLE_PLATFORM TO <cmid_user>; GRANT SELECT ON "PUBLIC".V$VERSION TO <cmid_user>;
GRANT SELECT ON SYS.ATTRCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CDEF$ TO <cmid_user>; GRANT SELECT ON SYS.COL$ TO <cmid_user>; GRANT SELECT ON SYS.COLTYPE$ TO <cmid_user>; GRANT SELECT ON SYS.IDNSEQ$ TO <cmid_user>; GRANT SELECT ON SYS.IND$ TO <cmid_user>; GRANT SELECT ON SYS.INDPART$ TO <cmid_user>; GRANT SELECT ON SYS.OBJ$ TO <cmid_user>; GRANT SELECT ON SYS.PARTOBJ$ TO <cmid_user>; GRANT SELECT ON SYS.RECYCLEBIN$ TO <cmid_user>; GRANT SELECT ON SYS.TAB$ TO <cmid_user>; GRANT SELECT ON SYS.TABCOMPART$ TO <cmid_user>; GRANT SELECT ON SYS.TABPART$ TO <cmid_user>; GRANT SELECT ON SYS.TABSUBPART$ TO <cmid_user>;
-- Also ensure that you have access to the following ALL_* views: ALL_CONSTRAINTS ALL_CONS_COLUMNS ALL_ENCRYPTED_COLUMNS ALL_INDEXES ALL_IND_COLUMNS ALL_OBJECTS ALL_TABLES ALL_TAB_COLS ALL_TAB_PARTITIONS ALL_USERS
Privileges for incremental load processing with query-based CDC
For a database ingestion and replication task that performs an incremental load or combined initial and incremental load using the query-based CDC method, ensure that the user has the following privileges at minimum:
GRANT CREATE SESSION TO <cmid_user>;
GRANT SELECT ON DBA_INDEXES TO <cmid_user>; GRANT SELECT ON DBA_OBJECT_TABLES TO <cmid_user>; GRANT SELECT ON DBA_OBJECTS TO cmid_user; GRANT SELECT ON DBA_TABLES TO <cmid_user>; GRANT SELECT ON DBA_USERS TO <cmid_user>; GRANT SELECT ON DBA_VIEWS TO <cmid_user>; -- Only if you unload data from views
GRANT SELECT ANY TABLE TO <cmid_user>; -or- GRANT SELECT ON table TO <cmid_user>; -- For each source table created by user
GRANT SELECT ON ALL_CONSTRAINTS TO <cmid_user>; GRANT SELECT ON ALL_CONS_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_ENCRYPTED_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_IND_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_INDEXES TO <cmid_user>; GRANT SELECT ON ALL_OBJECTS TO <cmid_user>; GRANT SELECT ON ALL_TAB_COLS TO <cmid_user>; GRANT SELECT ON ALL_USERS TO <cmid_user>;
GRANT SELECT ON "PUBLIC"."V$DATABASE" TO cmid_user; GRANT SELECT ON "PUBLIC"."V$CONTAINERS" TO cmid_user; GRANT SELECT ON SYS.ATTRCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CDEF$ TO <cmid_user>; GRANT SELECT ON SYS.COL$ TO <cmid_user>; GRANT SELECT ON SYS.COLTYPE$ TO <cmid_user>; GRANT SELECT ON SYS.IND$ TO <cmid_user>; GRANT SELECT ON SYS.IDNSEQ$ TO cmid_user; GRANT SELECT ON SYS.OBJ$ TO <cmid_user>; GRANT SELECT ON SYS.RECYCLEBIN$ TO <cmid_user>; GRANT SELECT ON SYS.TAB$ TO <cmid_user>;
Privileges for initial load processing
For a database ingestion and replication task that performs an initial load, ensure that the user has the following privileges at minimum:
GRANT CREATE SESSION TO <cmid_user>;
GRANT SELECT ON DBA_INDEXES TO <cmid_user>; GRANT SELECT ON DBA_OBJECT_TABLES TO <cmid_user>; GRANT SELECT ON DBA_OBJECTS TO cmid_user; GRANT SELECT ON DBA_TABLES TO <cmid_user>; GRANT SELECT ON DBA_USERS TO <cmid_user>; GRANT SELECT ON DBA_VIEWS TO <cmid_user>; -- Only if you unload data from views
GRANT SELECT ANY TABLE TO <cmid_user>; -or- GRANT SELECT ON table TO <cmid_user>; -- For each source table created by user
GRANT SELECT ON ALL_CONSTRAINTS TO <cmid_user>; GRANT SELECT ON ALL_CONS_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_ENCRYPTED_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_IND_COLUMNS TO <cmid_user>; GRANT SELECT ON ALL_INDEXES TO <cmid_user>; GRANT SELECT ON ALL_OBJECTS TO <cmid_user>; GRANT SELECT ON ALL_TAB_COLS TO <cmid_user>; GRANT SELECT ON ALL_USERS TO <cmid_user>;
GRANT SELECT ON "PUBLIC"."V$DATABASE" TO cmid_user; GRANT SELECT ON "PUBLIC"."V$CONTAINERS" TO cmid_user; GRANT SELECT ON SYS.ATTRCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CCOL$ TO <cmid_user>; GRANT SELECT ON SYS.CDEF$ TO <cmid_user>; GRANT SELECT ON SYS.COL$ TO <cmid_user>; GRANT SELECT ON SYS.COLTYPE$ TO <cmid_user>; GRANT SELECT ON SYS.IND$ TO <cmid_user>; GRANT SELECT ON SYS.IDNSEQ$ TO cmid_user; GRANT SELECT ON SYS.OBJ$ TO <cmid_user>; GRANT SELECT ON SYS.RECYCLEBIN$ TO <cmid_user>; GRANT SELECT ON SYS.TAB$ TO <cmid_user>;n
Oracle privileges for Amazon RDS for Oracle sources
If you have an Amazon RDS for Oracle source, you must grant certain privileges to the Database Ingestion and Replication user.
Important: You must log in to Amazon RDS under the master username to run GRANT statements and procedures.
Grant the SELECT privilege, at minimum, on objects and system tables that are required for CDC processing to the Database Ingestion and Replication user (cmid_user). Some additional grants are required in certain situations.
Use the following GRANT statements:
GRANT SELECT ON "PUBLIC"."V$ARCHIVED_LOG" TO "cmid_user";
GRANT SELECT ON "PUBLIC"."V$DATABASE" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$LOG" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$LOGFILE" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$TRANSPORTABLE_PLATFORM" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$THREAD" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$DATABASE_INCARNATION" TO "cmid_user"; GRANT SELECT ON "PUBLIC"."V$TRANSACTION" TO "cmid_user";
GRANT SELECT ON "SYS"."DBA_CONS_COLUMNS" TO "cmid_user"; GRANT SELECT ON "SYS"."DBA_CONSTRAINTS" TO "cmid_user"; GRANT SELECT ON DBA_INDEXES TO "cmid_user"; GRANT SELECT ON "SYS"."DBA_LOG_GROUP_COLUMNS" TO "cmid_user"; GRANT SELECT ON "SYS"."DBA_TABLESPACES" TO "cmid_user";
GRANT SELECT ON "SYS"."OBJ$" TO "cmid_user"; GRANT SELECT ON "SYS"."TAB$" TO "cmid_user"; GRANT SELECT ON "SYS"."IND$" TO "cmid_user"; GRANT SELECT ON "SYS"."COL$" TO "cmid_user";
GRANT SELECT ON "SYS"."PARTOBJ$" TO "cmid_user"; GRANT SELECT ON "SYS"."TABPART$" TO "cmid_user"; GRANT SELECT ON "SYS"."TABCOMPART$" TO "cmid_user"; GRANT SELECT ON "SYS"."TABSUBPART$" TO "cmid_user"; COMMIT;
/* For combined load jobs:*/ GRANT EXECUTE ON DBMS_FLASHBACK TO "cmid_user";
/*To provide read access to the Amazon RDS online and archived redo logs:*/ GRANT READ ON DIRECTORY ONLINELOG_DIR TO "cmid_user"; GRANT READ ON DIRECTORY ARCHIVELOG_DIR TO "cmid_user";
Additionally, log in as the master user and run the following Amazon RDS procedures to grant the SELECT privilege on some more objects:
Database ingestion and replication incremental load and combined initial and incremental load jobs can access the Oracle redo logs for CDC processing in alternative ways, depending on your environment and requirements.
Direct log access
Database ingestion and replication jobs can directly access the physical Oracle redo logs on the on-premises source system to read change data.
Note: If you store the logs on solid-state disk (SSD), this method can provide the best performance.
The following image shows the data flow:
1The Oracle database writes change records to the database log files on disk.
2The Database Ingestion and Replication CDC Reader reads the physical log files and extracts change records from the log files for the source tables of CDC interest.
3The Database Ingestion and Replication CDC Writer reads the change records.
4The CDC Writer applies the change records to the target.
NFS-mounted logs
Database ingestion and replication jobs can access to Oracle database logs from shared disk by using a Network File Sharing (NFS) mount or another method such as Network Attached Storage (NAS) or clustered storage.
The following image shows the data flow:
1The Oracle database writes change records to database log files. The log files are written to shared disk.
The shared disk can be on any system that allows the files to appear as local to both the database and Secure Agent hosts. This sharing can be achieved by using NFS, as shown above, or by using Network Attached Storage (NAS) or clustered storage.
2The Database Ingestion and Replication CDC Reader reads the log files from the NFS server over the network and extracts the change records for the source tables of CDC interest.
3The Database Ingestion and Replication CDC Writer reads the change records.
4The CDC Writer applies the change records to the target.
ASM-managed logs
Database ingestion and replication jobs can access Oracle redo logs that are stored in an Oracle Automatic Storage Management (ASM) system. To read change data from the ASM-managed redo logs, the ASM user must have SYSASM or SYSDBA authority on the ASM instance.
When you configure an Oracle Database Ingestion connection, complete the properties that include "ASM" in their names.
Also, Informatica recommends that you set the sqlnet.recv_timeout parameter in the local sqlnet.ora file to less than 5 minutes when reading data from redo log files in Oracle ASM. This parameter specifies the duration of time that the Oracle client waits for a response from ASM before a query times out. Network interrupts and other factors can occasionally make Oracle connections unresponsive. Setting this value ensures that the reader can respond and recover from any such situation in a timely manner.
The following image shows the data flow:
1The Oracle database writes change records to the ASM-managed database log files.
2The Database Ingestion and Replication CDC Reader reads the ASM-managed log files and extracts the change records for the source tables of CDC interest.
3The Database Ingestion and Replication CDC Writer reads the change records.
4The CDC Writer applies the change records to the target.
ASM-managed logs with a staging directory
Database ingestion jobs can access ASM-managed redo logs from a staging directory in the ASM environment. In comparison to using ASM only, this method can provide faster access to the log files and reduce I/O on the ASM system. To read change data from the ASM-managed logs, the ASM user must have SYSASM or SYSDBA authority on the ASM instance.
The following image shows the data flow:
1The Oracle database writes change records to the ASM-managed log files.
2ASM copies the logs to a staging directory.
The staging directory must be on shared disk, such as an NFS mount, so that ASM can write data to it and database ingestion and replication jobs can read data from it.
3The Database Ingestion and Replication CDC Reader reads the log files in the staging directory and extracts the change records for the source tables of CDC interest.
4The Database Ingestion and Replication CDC Writer reads the change records.
5The CDC Writer applies the change records to the target.
BFILE access to logs in the Oracle server file system by using directory objects
On an on-premises Oracle source system, you can configure Database Ingestion and Replication to read online and archived redo logs from the local Oracle server file system by using Oracle directory objects with BFILE locators. You must create Oracle directory objects named ARCHIVELOG_DIR and ONLINELOG_DIR that point to the locations of the Oracle redo log files. For information about configuring BFILE access, see Configuring BFILE access to Oracle redo logs in the Oracle file system.
The following image shows the data flow:
1The Oracle database writes change records to the redo log files in the local Oracle server file system. When a database ingestion and replication task needs to read log files, it connects to Oracle and issues a select request that references the ARCHIVELOG_DIR or ONLINELOG_DIR directory object to access the logs.
Note: If you use BFILE access to Oracle data that is stored in multiple log locations, such as in an environment with a standby database, RDS database instance, or Fast Recovery Area, Database Ingestion and Replication can automatically find the logs by querying for all directory objects that begin with ARCHIVELOG_DIR and ONLINELOG_DIR and end with any suffix, for example, ARCHIVELOG_DIR_01, ARCHIVE_LOG_DIR_02, and so on. If database users who run database ingestion and replication jobs have the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges, the directory objects can be created at runtime, as needed.
2The Database Ingestion and Replication CDC Reader reads the log file by using the OCI Client that is installed on the Secure Agent box and extracts the change records for the source tables of CDC interest.
3The CDC Reader sends the change records to the CDC Writer.
4The CDC Writer applies the change records to the target.
Configuring BFILE access to Oracle redo logs in the Oracle file system
If you store redo logs in the local Oracle server file system and want to access the logs by using Oracle directory objects with BFILEs, perform the following configuration tasks:
Complete the following usual Oracle source prerequisite tasks that are not specific to BFILE access:
•Define the ORACLE_HOME environment variable on the Linux or Windows system where the Secure Agent runs for Database Ingestion and Replication to use the Oracle Call Interface (OCI) to communicate with the Oracle source database.
•Make sure the Database Ingestion and Replication user has the Oracle privileges that are required for the database ingestion and replication incremental load processing. For more information, see Oracle privileges.
•Enable ARCHIVELOG mode for the Oracle database.
•Define the archive log destination.
•Enable Oracle minimal global supplemental logging on the source database.
•If your Oracle source tables have primary keys, ensure that supplemental logging is enabled for all primary key columns. For source tables that do not have primary keys, ensure that supplemental logging is enabled for all columns from which change data will be captured.
Note: When you create a database ingestion and replication task, you have the option of generating a script that implements supplemental logging for all columns or only primary key columns for the selected source tables.
•If the Oracle database parameter MAX_STRING_SIZE=EXTENDED is set in the database and a source table selected for capture processing has VARCHAR2, NVARCHAR2, or RAW columns greater than 4000 characters, those columns are excluded from capture processing unless you select the Include LOBs check box for the database ingestion and replication task.
Additionally, for BFILE access, perform the following steps to configure an online log destination and archive log destination:
1Query the Oracle database for the online and archived redo log locations in the Oracle server file system. You can use the following example queries:
To get location of the online redo logs:
select * from v$logfile;
To get the archive log destination:
select dest_id, dest_name, destination, status from V$ARCHIVE_DEST;
2Create the ONLINELOG_DIR and ARCHIVELOG_DIR directory objects that point to the locations of log files from step 1. An Oracle directory object specifies a logical alias name for a physical directory on the Oracle server file system under which the log files to be accessed are located. For example:
CREATE DIRECTORY ONLINELOG_DIR AS '/u01/oracle/data'; CREATE DIRECTORY ARCHIVELOG_DIR AS '/u01/oracle/archivedata';
If you plan to set the reader mode to ARCHIVEONLY in the Oracle Database Ingestion connection to read changes only from archive logs, you do not need to create an ONLINELOG_DIR directory or directory object.
The Oracle database does not verify that the directories you specify exist. Make sure that you specify valid directories that exist in the Oracle file system.
3To verify that the directory objects were created with the correct file system paths for the redo logs, issue a select statement such as:
4Grant read access on the ONLINELOG_DIR and ARCHIVELOG_DIR directory objects to the Database Ingestion and Replication user who is specified in the Oracle Database Ingestion connection properties. For example:
grant read on directory "ARCHIVELOG_DIR" to "cmid_user"; grant read on directory "ONLINELOG_DIR" to "cmid_user";
Note: If the ONLINELOG_DIR path does not exist or match the path to the active redo logs, Database Ingestion and Replication tries to create the directory. If you do not have sufficient privileges to create the directory object, an error message is issued. In this case, ask your DBA to create the directory object with the correct path.
5In the Oracle Database Ingestion connection properties, select the BFILE Access check box.
Important: If you use BFILE access to Oracle data stored in multiple log locations, Database Ingestion and Replication can automatically find the logs by querying for all directory objects that begin with ARCHIVELOG_DIR and ONLINELOG_DIR and end with any suffix, for example, ARCHIVELOG_DIR_01, ARCHIVE_LOG_DIR_02, and so on. This capability enables Database Ingestion and Replication to support Oracle standby databases that use log locations different from those on the primary database, replica logs on Amazon RDS database instances, and archive logs in the Fast Recovery Area to which the USE_DB_RECOVERY_FILE_DEST parameter points.
You can override the ARCHIVELOG_DIR and ONLINELOG_DIR names by using the custom properties pwx.cdcreader.oracle.database.additional BFILEARCHIVEDIR=<directory_object> and pwx.cdcreader.oracle.database.additional BFILEONLINEDIR=<directory_object> if necessary.
If you grant the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges to database users who run database ingestion and replication jobs, the directory objects can be created at runtime, as needed. For example, if multiple archive and online log destinations exist in a database, the directory objects could be created with the following naming convention:
•<ARCHIVEDIRNAME>_01
•<ARCHIVEDIRNAME>_02
•<ONLINEDIRNAME>_01
•<ONLINEDIRNAME>_02
Database Ingestion and Replication performs no cleanup processing on these generated directory objects.
If you use the USE_DB_RECOVERY_FILE_DEST parameter and the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges have not been granted, your DBA must create the directory objects daily or weekly, before your database ingestion and replication jobs run, by using a script such as:
create or replace directory ARCHIVELOG_DIR_2024_08_19 as '<DB_RECOVERY_FILE_DEST>/2024_08_19'
If you use the USE_DB_RECOVERY_FILE_DEST parameter and the database user has the CREATE ANY DIRECTORY and DROP ANY DIRECTORY privileges, the directory objects are created as needed at runtime and dropped after 14 days. These directory objects have the naming convention <ARCHIVEDIRNAME>_YYYY_MM_DD.
Oracle Data Guard databases or far sync instances as sources
Database Ingestion and Replication can capture change data from Oracle Data Guard primary databases, logical or physical standby databases, and far sync instances.
A far sync instance is a remote Oracle Data Guard destination that accepts redo from the primary database and then ships that redo to other members of the Oracle Data Guard configuration.
You can initially load a target with data either from the Oracle Data Guard primary database or from a standby database that is open in read mode.
Configuration
Oracle change capture configuration depends on the Oracle Data Guard database type.
•For the primary Oracle database , grant SELECT permissions on the V$STANDBY_LOG view to the Database Ingestion and Replication user:
GRANT SELECT ON "PUBLIC".V$STANDBY_LOG TO <cmid_user>;
If the primary database is in an Amazon RDS for Oracle environment:
•For a physical standby database in mount mode (not open with read only access), set the following Oracle Database Ingestion connection properties:
- Database Connect String - Ensure that it points to the primary database to read the Oracle catalog.
- Standby Connect String - An Oracle connection string, defined in TNS, that the log reader uses to connect to the Oracle physical standby database and monitor the logs.
- Standby User Name - A user ID that the log reader uses to connect to the Oracle physical standby database. This user ID must have SYSDBA authority.
- Standby Password - A password that the log reader uses to connect to the Oracle physical standby database.
Note: With a database in mount mode, you can use a password file for user authentication. Initially, you must grant SYSDBA authority to the user. If you want to avoid granting permanent SYSDBA authority to the user, you can copy the primary password file to the physical standby or far sync instance and then revoke SYSDBA authority for the user. Repeat this process whenever you refresh the password file.
Optionally, configure the following additional connection properties:
- RAC Members - The maximum number of active threads on the Data Guard primary database when the database is in a RAC environment,
- Reader Standby Log Mask - A mask that the log reader uses for selecting redo logs for an Oracle standby database when the database uses multiplexing of redo logs.
For more information, see "Oracle Database Ingestion connection properties" in Connectors and Connections.
•For a logical standby database , no special configuration tasks are required. Configure it the same way as for an Oracle database that is not in a Data Guard environment.
Standby-to-primary role transitions
In an Oracle Data Guard environment, a physical standby database can transition to the primary role. Usually, the role transition occurs because of a failover or switchover. During the transition, all active connections to the physical standby database terminate.
To be able to resume CDC processing after the physical standby database transitions to the primary role, you might need to adjust some connection configuration properties on the original standby system for Database Ingestion and Replication to process past the transition point. After the transition, you can adjust the properties again for optimal performance in the new primary database environment.
The following table describes these connection properties by transition phase:
Connection Property
Before Transition
During Transition
After Transition
RAC Members
Specify the number of active threads on the primary database.
Specify the total number of active threads with unique thread IDs on both the standby database and primary database.
For example, if the primary database is a two-node RAC database that uses thread IDs 1 and 2 and the standby database is a 3-node RAC database that uses thread IDs 2, 3, and 4, specify a property value of 4.
After the restart point has progressed beyond the transition point, edit the property value, as needed, for optimal performance of change data capture from the new primary database.
Informatica recommends that you use the lowest value that is suitable for your environment to minimize the overhead of CDC thread tracking.
Reader Standby Log Mask
Standby Connect String
Standby User Name
Standby Password
Remove all standby properties. They're not applicable to physical standby databases open for read only access.
Properties remain removed.
Do not specify these properties. They're not used for a primary database.
Database Connect String
If the standby database is not open, define the connection string for the primary database.
If the standby database is open, define the connection string for the standby database.
Specify the connection string for the database that will have the primary role after the role transition.
Ensure that this connection property defines the connection string for the new primary database.
Oracle archive log retention considerations
Database ingestion and replication incremental load and combined initial and incremental load jobs must be able to access transaction data in Oracle online and archive redo logs. If the logs are not available, database ingestion and replication jobs end with an error.
Typically, the Oracle DBA sets the archive log retention period based on your oganization's particular business needs and Oracle environment. Make sure that the source archive logs are retained for the longest period for which you expect change capture to be stopped or latent, plus about 1 hour, so that the logs will be available for restart processing.
To determine if the current log retention policy in your environment is sufficient to accommodate database ingestion and replication change capture processing, consider the following factors:
•How long are Oracle transactions typically open on a source?
•What is the longest period of time that change capture is allowed to be down or latent, accounting for weekends and holidays?
•What is the replication latency from source to target?
•Do you run database ingestion and replication jobs based on a schedule? If yes, what type of schedule?
•Is the pwx.cdcreader.oracle.option.additional ageOutPeriod=minutes custom property set on the Source page of task wizard?
Note: This property specifies the age at which outstanding UOWs without change records of CDC interest are removed from the calculation of the next restart point. You can use the property to prevent CDC failures that might occur if you shut down and then restart capture processing while the transaction is outstanding and the redo log in which the UOW started is not available.
•What is the redo generation rate?
•Do you ship copies of archive logs to a secondary system?
If the archive logs are not available when you need to restart capture processing in the logs, you can ask your DBA to restore them and to modify the retention period if necessary. Otherwise, perform another initial load to re-materialize the target and then start incremental change data processing again. However, in this case, you might lose some changes.
PostgreSQL sources
To use PostgreSQL sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
To use PostgreSQL sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
On the Secure Agent system, install the ODBC driver that is appropriate for your operating system.
•On Windows, install the latest version of the 64-bit PostgreSQL ODBC driver.
1Download and install the PostgreSQL ODBC driver.
Note: If the source database contains objects with multibyte-character names, such as table names, column names, and publication names, you must use either a PostgreSQL Unicode ODBC driver or the DataDirect ODBC for PostgreSQL driver. This requirement applies to all PostgreSQL source types, including Amazon Aurora PostgreSQL, Azure Database for PostgreSQL - Flexible Server, Cloud SQL for PostgreSQL, and RDS for PostgreSQL. If you do not use a Unicode-compatible ODBC driver, your incremental load jobs will fail when encountering a multibyte-character name.
2Set the PGSQL_ODBC_DRIVER environment variable to the driver name that is displayed by ODBC Data Source Administrator (64-bit).
Note: You can override this driver for a database ingestion and replication task by setting the pwx.custom.pgsql_odbc_driver custom property on the Source page of the task wizard.
•On Linux or UNIX, the DataDirect ODBC driver for PostgreSQL is delivered as part of the Linux installation. Alternatively, you can install the unixODBC or iODBC driver manager or the PostgreSQL ODBC driver.
1If you do not want to use the DataDirect ODBC for PostgreSQL driver that is provided in the Linux installation, install the unixODBC or iODBC driver manager or install the PostgreSQL ODBC driver.
Note: If the source database contains objects with multibyte-character names, such as table names, column names, and publication names, you must use either a PostgreSQL Unicode ODBC driver or the DataDirect ODBC for PostgreSQL driver. This requirement applies to all PostgreSQL source types, including Amazon Aurora PostgreSQL, Azure Database for PostgreSQL - Flexible Server, Cloud SQL for PostgreSQL, and RDS for PostgreSQL. If you do not use a Unicode-compatible ODBC driver, your incremental load jobs will fail when encountering a multibyte-character name.
3Optional. Set the following environment variables:
▪ Set the ODBCSYSINI variable to the directory where odbcinst.ini is located. If odbcinst.ini is located in the default /etc directory, you do not need to set the ODBCSYSINI variable.
▪ Add the directory where the PostgreSQL ODBC driver is installed to the LD_LIBRARY_PATH variable. If the driver is installed in the default directory of /usr/lib64, you do not need to add the path to the LD_LIBRARY_PATH variable.
▪ Set the PGSQL_ODBC_DRIVER parameter to the driver name that you specified in odbcinst.ini.
On the PostgreSQL database system, perform the following configuration steps:
1For incremental load and initial and incremental load jobs, ensure that the PostgreSQL postgresql.conf configuration file specifies the wal_level=logical parameter.
This parameter determines how much information PostgreSQL writes to the Write-Ahead Log (WAL). The setting of logical adds information that is required to support logical decoding.
To set wal_level to logical on Amazon Aurora PostgreSQL or Amazon RDS for PostgreSQL sources, set the rds.logical_replication parameter to 1 in the cluster parameter group. For Azure Database for PostgreSQL - Flexible Server, set the wal_level parameter to logical on the Server Parameters page in the Azure portal.
For Cloud SQL for PostgreSQL sources, perform the following actions:
aConnect to the database by using the public IP.
Note: Ensure that you add the required IPs under Authorized networks in the Google Cloud console.
bCreate a Cloud SQL for PostgreSQL database instance replica.
cIn the Cloud Shell, as a local admin user, run the following commands:
alter database postgres set default_transaction_read_only = off; gcloud sql connect database_replica --user=postgres --quiet; ALTER USER postgres WITH REPLICATION; CREATE USER replication_user WITH REPLICATION IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'password'; ALTER USER postgres WITH REPLICATION;
dIn the Google Cloud console, add the following database flags:
▪ cloudsql.logical_decoding. Set the value to on.
▪ max_replication_slots. Set the value to 64.
▪ cloudsql.enable_pglogical. Set the value to on.
▪ max_wal_senders. Set the value to 64.
eRestart the database instance.
2If you use the DataDirect ODBC for PostgreSQL driver, ensure that the database does not use the SCRAM-SHA-256 authentication method. Use another authentication method, such as MD5.
Note: The PostgreSQL ODBC driver supports the SCRAM-SHA-256 authentication method. In PostgreSQL 13, this authentication method became the default method.
3To deploy and run a database ingestion and replication task that includes a PostgreSQL source, the source connection must specify a database user who has the required privileges. Create the user and grant privileges to that user in the following ways:
- For initial load jobs, use the following SQL statements:
CREATE USER dbmi_user WITH PASSWORD 'password'; GRANT SELECT ON ALL TABLES IN SCHEMA schema TO dbmi_user;
- For incremental load and initial and incremental load jobs with on-premises PostgreSQL sources, use the following SQL statement:
CREATE USER dbmi_user WITH PASSWORD 'password' REPLICATION;
For Amazon Aurora PostgreSQL and RDS for PostgreSQL sources, use the following statements:
CREATE USER dbmi_user WITH PASSWORD 'password'; GRANT rds_replication to dbmi_user;
Additionally, if you use the pgoutput plugin, use the following SQL statement to grant ownership of the tables in the database that you want to add to the pgoutput publication to the dbmi_user that you created:
GRANT CREATE ON DATABASE database TO dbmi_user;
4If you plan to use the wal2json plugin for logical decoding output for incremental load or initial and incremental load jobs, install the plugin.
5If you plan to use the pgoutput plugin for incremental load or initial and incremental load jobs, use the following SQL statement to create publications for database ingestion jobs:
CREATE PUBLICATION publication_name [FOR TABLE [ONLY] table_name [*] [,...] | FOR ALL TABLES ];
Ensure that the publication includes all tables that you want to replicate to the target.
6For incremental load and initial and incremental load jobs, use the following function to create a PostgreSQL logical replication slot:
Where the plugin_type is either the pgoutput plugin or the wal2json plugin.
7For incremental load and initial and incremental load jobs with PostgreSQL 9.6 sources, ensure that the max_replication_slots parameter in the postgresql.conf configuration file has a value greater than or equal to the number of concurrent database ingestion jobs that you plan to use.
Important: All replication slots must be unique across all concurrent jobs.
8For incremental load and initial and incremental load jobs, ensure that the PostgreSQL sources use the UTF-8 encoding.
9Ensure that the PostgreSQL sources use the UTF-8 encoding. If you use another encoding for the source database, your initial load, incremental load, and combined initial and incremental load jobs might fail.
Usage considerations
•Database Ingestion and Replication supports the following types of PostgreSQL sources in database ingestion and replication jobs that use any load type: on-premises PostgreSQL, Amazon Aurora PostgreSQL, Azure Database for PostgreSQL - Flexible Server, Cloud SQL for PostgreSQL, and RDS PostgreSQL.
•Database Ingestion and Replication supports PostgreSQL sources in database ingestion and replication jobs that have any target type.
•Database Ingestion and Replication supports Cloud SQL for PostgreSQL sources of any load type that have a Google BigQuery or Snowflake target only.
•Because Database Ingestion and Replication expects each source table row to be unique, Informatica recommends that each source table have a primary key. Database Ingestion and Replication does not honor unique indexes in place of a primary key. If no primary key is specified, Database Ingestion and Replication treats all columns as if they are part of the primary key.
•Database Ingestion and Replication supports schema drift options for PostgreSQL sources in database ingestion and replication incremental load and initial and incremental load jobs with the following limitations:
- PostgreSQL does not support changes to primary keys for tables from which change data capture is enabled.
- Database ingestion and replication jobs cannot capture DML changes from source tables for which table partition IDs are changed.
•Database Ingestion and Replication does not support generated columns in incremental load jobs that have a PostgreSQL source. If the source table contains generated columns, change data capture will ignore them and will continue with the rest of the columns.
•Database ingestion and replication jobs can replicate data from PostgreSQL BYTEA, JSON, JSONB, TEXT, and XML columns if you select Include LOBs under Advanced on the Source page of the task wizard.
The supported target types depend on the load type:
- For initial load jobs and incremental load jobs: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server.
- For incremental load jobs: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Kafka-enabled Azure Event Hubs, Microsoft Azure Data Lake Storage Gen 2, Microsoft Azure Synapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server.
- For combined initial and incremental load jobs: Amazon Redshift, Amazon S3, Databricks, Google BigQuery, Google Cloud Storage, Microsoft Azure Data Lake Storage Gen 2, Microsoft AzureSynapse Analytics, Oracle, Oracle Cloud Object Storage, PostgreSQL, Snowflake, and SQL Server.
Database ingestion and replication incremental load and initial and incremental load jobs can replicate data from TEXT, XML, and BIT VARYING and CHARACTER VARYING columns without length limits if you set the pwx.custom.pgsql_enable_lobs custom property to true on the Source page of the task wizard. Incremental load and initial and incremental load jobs also always replicate data from BYTEA, JSON, and JSONB columns.
LOB column data is truncated before being written to the target if it is greater in size than a byte limit that depends on the LOB type and target type. For more information, see Configuring the source.
•For PostgreSQL 9.6, the pgoutput plugin is not available.
•For initial load jobs, Database Ingestion and Replication does not support the following PostgreSQL data types:
- ABSTIME
- Array types
- NAME
- Object identifier types
- PG_LSN
- RELTIME
- Text search types:
▪ TSQUERY
▪ TSVECTOR
- User-defined types
For incremental load and initial and incremental load jobs, Database Ingestion and Replication does not support the following PostgreSQL data types, in addition to those not supported for initial load jobs:
- Spatial types
▪ Box
▪ Circle
▪ Line
▪ LSeg
▪ Path
▪ Point
▪ Polygon
- Unbounded varying types
Database ingestion and replication jobs either fail to deploy or propagate nulls for columns that have these data types.
For information about the default mappings of supported PostgreSQL data types to target types, see Default Data Type Mappings.
•If you update the primary key value in the PostgreSQL source table for a record that does not exist in the target table, the record is not replicated to the target. However, the monitoring interface increments the Update count to include the primary key update. The data will be replicated to the target only if the record exists in the target table before the primary key update occurs.
SAP HANA and SAP HANA Cloud sources
To use SAP HANA and SAP HANA Cloud sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•The SAP HANA Database Ingestion connector uses JDBC to connect to the SAP HANA and SAP HANA Cloud database to read data and metadata and to test connection properties. You must download the SAP HANA JDBC driver file, ngdbc.jar, and copy it to a specific subdirectory of the Secure Agent installation directory on the machine where the Secure Agent runs.
1Download the SAP HANA JDBC driver jar file, ngdbc.jar, to the Linux or Windows machine where the Secure Agent runs.
Verify that you download the most recent version of the file. If you encounter any issues with downloading the file, contact SAP Customer Support.
2Copy the ngdbc.jar file to the following directory:
•Create a Database Ingestion and Replication user. Connect to the source database as a user with admin authority and execute the following statement:
CREATE USER dbmi_user password "<password>" NO FORCE_FIRST_PASSWORD_CHANGE;
This statement creates a user in the database with the default permissions, which permit basic data dictionary views to be read and the required CDC objects to be created in the user's own schema.
•To deploy and run a database ingestion and replication task that includes an SAP HANA or SAP HANA Cloud source, the source connection must specify a Database Ingestion and Replication user (dbmi_user) who has the privileges to read metadata and other information from the following system views:
- SYS.M_DATABASE - Used to fetch the database version.
- SYS.M_CS_PARTITIONS - Used to identify if a table is partitioned. (Not applicable for SAP HANA Cloud)
- SYS.SCHEMAS - Used to fetch the list of schemas for a database.
- SYS.TABLES - Used to fetch a list of table names for a schema.
- SYS.TABLE_COLUMNS - Used to fetch column metadata for a table.
- SYS.INDEXES - Used to fetch index information for a table.
- SYS.INDEX_COLUMNS - Used to fetch index information for a table.
•For incremental load jobs, grant the following privileges:
- For triggers to write rows to the PKLOG and shadow _CDC tables of the dbmi_user, grant INSERT access on the dbmi_user's schema to the user that owns the schema of the source tables (schema_user):
GRANT INSERT ON SCHEMA dbmi_user TO schema_user;
- To capture change data from source tables in the schema_user's schema by using triggers, execute one of the following statements:
GRANT TRIGGER ON SCHEMA schema_user TO dbmi_user;
This statement grants trigger access to all tables in the schema.
- or -
GRANT TRIGGER ON database.table_name TO dbmi_user;
This statement grants trigger access to a specific source table. Use this statement when you want to capture data from just a few selected tables. Repeat the grant for each source table of CDC interest.
•For initial load jobs, execute one of the following grant statements to read data from source tables:
GRANT SELECT ON SCHEMA schema_user TO dbmi_user;
This statement grants SELECT access to all tables in the schema.
- or -
GRANT SELECT ON database.table_name TO dbmi_user;
This statement grants SELECT access on a specific source table. Repeat this grant for each source table from which you want to read data.
•For SAP HANA Cloud sources, the connection requires an SAP HANA JDBC custom connection property setting for encryption. Enter the following properties in the Advanced Connection Properties field in the SAP HANA Database Ingestion connection properties:
encrypt=true&validateCertificate=false
Usage considerations
•Database Ingestion and Replication supports SAP HANA and SAP HANA Cloud sources on Red Hat Linux or SUSE Linux for initial load and incremental load jobs but not for combined initial and incremental load jobs.
•Database ingestion and replication incremental load jobs support table names up to 120 characters in length.
•Schema drift options are not supported for incremental load jobs with SAP HANA or SAP HANA Cloud sources.
•Database Ingestion and Replication does not require primary keys on the SAP HANA source tables for initial load or incremental load jobs.
•Database Ingestion and Replication does not support the following source data types, even though they're mapped to default column data types on the target:
- ARRAY
- BINTEXT
- BLOB
- CLOB
- NCLOB
- ST_GEOMETRY
- ST_POINT
- TEXT
Database Ingestion and Replication jobs propagate nulls for columns that have these data types.
Note: The ALPHANUM, BINTEXT, CHAR and CLOB data types are not available in SAP HANA Cloud.
For information about the default mappings of SAP HANA data types to target data types, see Default Data Type Mappings.
•For incremental load jobs, Database Ingestion and Replication requires the following tables in the source database:
- PKLOG log table. Contains metadata about captured DML changes, such as the change type and timestamp, transaction ID, schema name, and table name.
- PROCESSED log table. Contains the maximum sequence number (SCN) for the most recent change data capture cycle.
- Shadow <schema>.<tablename>_CDC tables. Contains before images of updates and after images of inserts, updates, and deletes captured from the source tables, with metadata such as the transaction ID and timestamp. A shadow table must exist for each source table from which changes are captured.
Also, Database Ingestion and Replication uses AFTER DELETE, AFTER INSERT, and AFTER UPDATE triggers to get before images and after images of DML changes for each source table and to write entries for the changes to the PKLOG and shadow _CDC tables. Database Ingestion and Replication also writes SAP HANA sequence values to the PKLOG and shadow _CDC tables for each insert, update, and delete row processed. The sequence values link the rows of the shadow _CDC table to the rows of the PKLOG table during CDC processing.
When you deploy a task, Database Ingestion and Replication validates the existence of the PKLOG, PROCESSED, and shadow _CDC tables, the triggers, and the sequences. The deploy operation fails if these items do not exist.
•From the Source page in the task wizard, you can download or execute a CDC script that creates the PKLOG, PROCESSED, and shadow _CDC tables, the triggers, and the sequences. If you specified a Trigger Prefix value in the SAP HANA Database Ingestion connection properties, the names of the generated triggers begin with prefix_.
By default, the triggers capture the application's system user. If you want to capture the transaction user instead, download the CDC script and replace all occurrences of 'APPLICATIONUSER' with 'XS_APPLICATIONUSER' in the script. For example, you could make this substitution in the AFTER DELETE triggers so that you can identify and filter out deletes associated with an archiving process.
•In incremental load jobs, only source table columns that are included in the shadow _CDC tables are part of the source schema definition. When a new column is added to the source table and is not present in the shadow _CDC tables, the newly added column is ignored.
•If you remove or rename an existing source column, modify the triggers and the corresponding shadow _CDC table accordingly. Otherwise, the job will fail.
• During the execution of an incremental load job, some housekeeping takes place to delete outdated records from the PKLOG and shadow _CDC tables to maintain the size of the tables. To enable automatic housekeeping of the PKLOG table and shadow _CDC tables, specify a value greater than 0 in the Log Clear field in the SAP HANA Database Ingestion connection properties. The default value is 14 days, and the maximum value is 366. A value of 0 deactivates housekeeping.
Housekeeping takes place while an incremental load job is running. If multiple jobs are running against different tables, each job performs housekeeping against the PKLOG table and against the shadow _CDC tables that are defined for that job only. If you remove a source table from the job, no purging for the corresponding shadow _CDC table occurs.
•Deployment of a database ingestion and replication task that has an SAP HANA source and Microsoft Azure Synapse Analytics target might fail if a source table contains a multiple-column primary key of a long length. In this case, reduce the length of the primary key and then deploy the task again.
Teradata sources
To use Teradata sources in database ingestion and replication tasks, first prepare the source database and review the usage considerations.
Source preparation
•To deploy and run a database ingestion and replication task that includes a Teradata source, the source connection must specify a database user who has the privileges that are required to perform an initial load operation. Use the following SQL statements to grant these privileges to the user:
GRANT SELECT ON database_name TO user_name/user_role;
Usage considerations
•Database ingestion and replication jobs that have a Teradata source can have any target type except for PostgreSQL and SQL Server.
•Database Ingestion and Replication does not support the following Teradata data types:
- ARRAY
- BLOB
- CLOB
- JSON
- ST_GEOMETRY
- XML
For information about the default mappings of supported source data types to target data types, see Default Data Type Mappings.