Partitioned Relational Sources
When a mapping that is enabled for partitioning reads from an IBM DB2 for LUW or Oracle source, the Data Integration Service can use multiple threads to read the relational source. The Data Integration Service creates a separate connection to the database for each thread.
Note: If a mapping reads from a relational source other than DB2 for LUW or Oracle, the Data Integration Service uses one thread to read from the source. The Data Integration Service can use multiple threads for the remaining mapping pipeline stages.
The Data Integration Service queries the DB2 for LUW or Oracle database system for partition information. If the source tables support database partitioning, the Data Integration Service can use multiple threads to read the partitioned data from the corresponding nodes in the database. The Data Integration Service generates an SQL query for each reader thread.
The number of reader threads that the Data Integration Service uses depends on the following situations:
- Number of database partitions is less than or equal to the maximum parallelism value.
The Data Integration Service uses one reader thread for each database partition. The Data Integration Service distributes one database partition to each reader thread.
For Oracle sources that use composite partitioning, the Data Integration Service uses one reader thread for each database subpartition. For example, if an Oracle source contains three partitions and two subpartitions for each partition, then the Data Integration Service uses six reader threads.
- Number of database partitions is more than the maximum parallelism value.
The Data Integration Service uses the number of reader threads defined by the maximum parallelism value. The Data Integration Service distributes multiple database partitions to some of the reader threads. For example, a DB2 for LUW source has five database partitions, and the maximum parallelism value is three. The Data Integration Service uses three reader threads. The Data Integration Service distributes two database partitions to the first reader thread and the second reader thread. The service distributes one database partition to the third reader thread.
- No database partitions.
- The Data Integration Service uses one thread to read from the source. The Data Integration Service can use multiple threads for the remaining mapping pipeline stages.
Relational Connection Types for Partitioning
The Data Integration Service can use multiple threads to read a DB2 for LUW or Oracle relational source based on the connection type used to connect to the database.
You can use any of the following connection types to connect to a DB2 for LUW or Oracle database:
- •DB2 for LUW connection or Oracle connection
- •JDBC connection
- •ODBC connection
To use multiple threads to read a DB2 for LUW or Oracle relational source, the relational data object must use a DB2 for LUW or Oracle connection.
If the DB2 for LUW or Oracle relational data object uses a JDBC or ODBC connection, the Data Integration Service uses one thread to read the source. The Data Integration Service can use multiple threads for the remaining mapping pipeline stages.
SQL Queries for Partitioned Relational Sources
When the Data Integration Service uses multiple threads to read a relational source, it generates an SQL query for each reader thread.
If the database source has more database partitions than the maximum parallelism value, the Data Integration Service distributes the data across the reader threads. The Data Integration Service can generate SQL queries that read from multiple database partitions. When an Oracle source contains subpartitions, the Data Integration Service can generate SQL queries that read from multiple database subpartitions.
DB2 for LUW or Oracle Source Example
The maximum parallelism value is three, and the relational source has five database partitions. When the Data Integration Service runs SQL queries against the database partitions, the first and second reader threads receive data from two database partitions. The third reader thread receives data from one database partition. In this example, the simply query in the Read transformation does not have the select distinct option enabled.
When you use a DB2 for LUW source, the Data Integration Service generates SQL statements similar to the following statements for the first reader thread:
SELECT <column list> FROM <table name>
WHERE (nodenumber(<column 1>)=0 OR nodenumber(<column 1>) = 3)
When you use an Oracle source, the Data Integration Service generates SQL statements similar to the following statements for the first reader thread:
SELECT <column list> FROM <table name> PARTITION <database_partition1 name> UNION ALL
SELECT <column list> FROM <table name> PARTITION <database_partition4 name> UNION ALL
Oracle Source with Subpartitions Example
An Oracle source has five partitions, 1–5, and two subpartitions, a and b, in each partition. The maximum parallelism value is three. The first reader thread receives data from four database subpartitions. The second and third reader threads receive data from three database subpartitions. In this example, the simple query in the Read transformation does not have the select distinct option enabled.
The Data Integration Service generates SQL statements similar to the following statements for the first reader thread:
SELECT <column list> FROM <table name> SUBPARTITION <database_subpartition1_a name> UNION ALL
SELECT <column list> FROM <table name> SUBPARTITION <database_subpartition1_b name> UNION ALL
SELECT <column list> FROM <table name> SUBPARTITION <database_subpartition4_a name> UNION ALL
SELECT <column list> FROM <table name> SUBPARTITION <database_subpartition4_b name> UNION ALL
Rules and Guidelines for Relational Source Partitions
Consider the following rules and guidelines when you enable partitioning for a mapping that reads from a relational source:
- •The Data Integration Service uses one thread to read the source, but can use multiple threads for the remaining mapping pipeline stages in the following situations:
- - The mapping reads from a relational source other than DB2 for LUW or Oracle.
- - The mapping uses a JDBC or ODBC connection to read from a DB2 for LUW or Oracle source.
- - The mapping pushes transformation logic to the source database.
- - You use the simple query in the Read transformation to select the ports to sort by or to configure a user-defined join.
- - You use the advanced query in the Read transformation to create a custom SQL query.
- •If you use the simple query in the Read transformation to create hints, select distinct values, or enter a source filter, the Data Integration Service can use multiple threads to read the source. The Data Integration Service adds the hints, distinct values, or source filter to the SQL query generated for each partition.