You can read the thread statistics in the session log to determine if the source is the bottleneck. When a mapping spends more time on the reader thread than the transformation or writer threads, you have a source bottleneck.
If the mapping reads from a cloud database or data warehouse source, use the following methods to identify source bottlenecks:
•Use a Filter transformation.
•Use a read test mapping.
•Use a database query.
If the mapping reads from a flat file source, you probably don't have a source bottleneck.
Using a Filter transformation
You can use a Filter transformation in the mapping to measure the time it takes to read source data.
Add a Filter transformation after each source. Set the filter condition to false so that no data is processed by the Filter transformation. If the time it takes to run the new mapping remains about the same, you have a source bottleneck.
Using a read test mapping
You can create a read test mapping to identify source bottlenecks. A read test mapping isolates the read query by removing the transformation in the mapping.
To create a read test mapping, complete the following steps:
1Make a copy of the original mapping.
2In the copy of the mapping, keep only the source transformations and any custom joins or queries.
3Remove all other transformations.
4Connect the sources to a flat file target.
Run the read test mapping. If the mapping performance is similar to the original run, you have a source bottleneck.
Using a query
To identify bottlenecks, run the read query directly against the source database or data warehouse.
Copy the read query directly from the session log. Run the query against the source with a query tool. On Windows, you can load the result of the query in a file. On Linux, you can load the result of the query in /dev/null.
Measure the query run time and the time it takes for the query to return the first row.