Data Integration Performance Tuning > Optimizing mappings > Optimizing Aggregator transformations

Optimizing Aggregator transformations

Aggregator transformations often slow performance because they group data before processing it. Aggregator transformations need additional memory to hold intermediate group results.

For optimal performance, place Aggregator transformations as close to the Source transformation as possible.

Use the following guidelines to optimize the performance of an Aggregator transformation:

•Group by simple columns.
•Use sorted input.
•Filter data before you aggregate it.
•Limit field connections.

Grouping by simple columns

You can optimize Aggregator transformations when you group by simple columns. When possible, use numbers instead of string and dates in the columns used for the GROUP BY. Avoid complex expressions in the Aggregator expressions.

Using sorted input

To increase mapping performance, sort data for the Aggregator transformation. Use the Sorted Input option to sort data.

The Sorted Input option decreases the use of aggregate caches. When you use the Sorted Input option, Data Integration assumes all data is sorted by group. As Data Integration reads rows for a group, it performs aggregate calculations. When necessary, it stores group information in memory.

The Sorted Input option reduces the amount of data cached during the task run and improves performance. To pass sorted data to the Aggregator transformation, use sorted input with either the Source transformation source filter or a Sorter transformation.

You can increase performance when you sort input in mappings with multiple partitions.

Filtering data before you aggregate

Filter the data before you aggregate it. If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce unnecessary aggregation.

Limiting connected fields

Limit the number of connected input/output or output fields to reduce the amount of data the Aggregator transformation stores in the data cache.