Data Integration allocates cache memory for Aggregator, Joiner, Lookup, Rank, and Sorter transformations in a mapping.
You can configure the cache sizes for these transformations. The cache size determines how much memory Data Integration allocates for each transformation cache at the start of a mapping run.
If the cache size is larger than the available memory on the machine, Data Integration cannot allocate enough memory and the task fails.
If the cache size is smaller than the amount of memory required to run the transformation, Data Integration processes some of the transformation in memory and stores overflow data in cache files. When Data Integration pages cache files to the disk, processing time increases. For optimal performance, configure the cache size so that Data Integration can process the transformation data in the cache memory.
By default, Data Integration automatically calculates the memory requirements at run time based on the maximum amount of memory that it can allocate. After you run a mapping in auto cache mode, you can tune the cache sizes for each transformation.
Cache types
Aggregator, Joiner, Lookup, and Rank transformations require an index cache and a data cache. Sorter transformations require one cache.
The following table describes the type of information that Data Integration stores in each cache:
Transformation
Cache types
Aggregator
Index. Stores group values as configured in the group by fields.
Data. Stores calculations based on the group by fields.
Joiner
Index. Stores all master rows in the join condition that have unique keys.
Data. Stores master source rows.
Lookup
Index. Stores lookup condition information.
Data. Stores lookup data that is not stored in the index cache.
Rank
Index. Stores group values as configured in the group by fields.
Data. Stores row data based on the group by fields.
Sorter
Sorter. Stores sort keys and data.
Cache files
When you run a mapping, Data Integration creates at least one cache file for each Aggregator, Joiner, Lookup, Rank, and Sorter transformation. If Data Integration cannot run a transformation in memory, it writes the overflow data to the cache files.
For Aggregator, Joiner, Lookup, and Rank transformations, Data Integration creates index and data cache files to run the transformation. For Sorter transformations, Data Integration creates one sorter cache file. By default, Data Integration stores cache files in the directory entered in the Secure Agent $PMCacheDir property for the Data Integration Server. You can change the cache directory on the Advanced tab of the transformation properties. If you change the cache directory, verify that the directory exists and contains enough disk space for the cache files.
Cache size
Cache size determines how much memory Data Integration allocates for each transformation cache at the start of a mapping run. You can configure a transformation to use auto cache mode or use a specific value.
Auto cache
By default, a transformation cache size is set to Auto. Data Integration automatically calculates the cache memory requirements at run time. You can also define the maximum amount of memory that Data Integration can allocate in the advanced session properties when you configure the task.
Data Integration allocates more memory to transformations with higher processing times. For example, Data Integration allocates more memory to the Sorter transformation because the Sorter transformation typically takes longer to run.
In transformations that use a data and an index cache, Data Integration also allocates more memory to the data cache than to the index cache. It allocates all of the memory for the Sorter transformation to the sorter cache.
Specific cache size
You can configure a specific cache size for a transformation. Data Integration allocates the specified amount of memory to the transformation cache at the start of the mapping run. Configure a specific value in bytes when you tune the cache size.
You can use session logs to determine the optimal cache size. When you configure the cache size to use the value specified in the session log, you can ensure that no allocated memory is wasted. However, the optimal cache size varies based on the size of the source data. Review the mapping logs after subsequent mapping runs to monitor changes to the cache size.
To define specific cache sizes, enter the cache size values on the Advanced tab in the transformation properties.
Optimizing the cache size
For optimal mapping performance, configure the cache sizes so that Data Integration can run the complete transformation in the cache memory.
1On the Advanced tab of the transformation properties, set the tracing level to verbose initialization.
2Run the task in auto cache mode.
3Analyze the transformation statistics in the session log to determine the cache sizes required for optimal performance.
For example, you have a Joiner transformation called "Joiner." The session log contains the following text:
CMN_1795 [2023-01-06 16:16:59.026] The index cache size that would hold [10005] input rows from the master for [Joiner], in memory, is [8437760] bytes CMN_1794 [2023-01-06 16:16:59.026] The data cache size that would hold [10005] input rows from the master for [Joiner], in memory, is [103891920] bytes
The log shows that the index cache size requires 8,437,760 bytes and the data cache requires 103,891,920 bytes.
4On the Advanced tab of the transformation properties, enter the value in bytes that the session log recommends for the cache sizes.