Eliminating mapping task bottlenecks

To eliminate mapping task bottlenecks, optimize the mapping task.

Buffer memory

When Data Integration initializes a task run, it allocates blocks of memory to hold source and target data.

Data Integration allocates at least two blocks for each source and target partition. Mapping tasks that use a large number of sources and targets might require additional memory blocks. If Data Integration can't allocate enough memory blocks to hold the data, the task fails.

You can configure the amount of buffer memory, or you can configure Data Integration to calculate buffer settings at run time.

To increase the number of available memory blocks, adjust the following mapping task properties:

Note: If data partitioning is enabled, the DTM buffer size is the total size of all memory buffer pools allocated to all partitions. For a task that contains n partitions, set the DTM Buffer Size to at least n times the value for the task with one partition.

Increasing DTM buffer size

The DTM buffer size setting specifies the amount of memory that Data Integration uses as DTM buffer memory. When you increase the DTM buffer memory, Data Integration creates more buffer blocks, which improves performance during momentary slowdowns.

Increasing DTM buffer memory allocation generally causes performance to improve initially and then level off. If you don't see a significant increase in performance, DTM buffer memory allocation isn't a factor in mapping performance.

To increase the DTM buffer size, open the task and edit the DTM Buffer Size advanced session property. Increase the DTM buffer size by multiples of the buffer block size.

Optimizing the buffer block size

If the Secure Agent machine has limited physical memory and the mapping contains a large number of sources, targets, or partitions, you might need to decrease the buffer block size.

If you're manipulating unusually large rows of data, increase the buffer block size to improve performance. If you don't know the approximate size of the rows, determine the row size by completing the following steps:

The total precision represents the total bytes needed to move the largest row of data. For example, if the total precision equals 33,000, then Data Integration requires 33,000 bytes in the buffer block to move that row. If the buffer block size is only 64,000 bytes, then Data Integration can't move more than one row at a time.

To set the buffer block size, open the task and edit the Default Buffer Block Size advanced session property.

As with DTM buffer memory allocation, increasing buffer block size should improve performance. If you don't see an increase, then buffer block size isn't a factor in task performance.

Caches

Data Integration uses the index and data caches for XML targets and Aggregator, Rank, Lookup, and Joiner transformations.

Data Integration stores transformed data in the data cache before returning it to the pipeline. Data Integration stores group information in the index cache. Also, Data Integration uses a cache to store data for Sorter transformations.

To configure the amount of cache memory, specify the cache size. If the allocated cache isn't large enough to store the data, Data Integration stores the data in a temporary disk file, a cache file, as it processes the task data. Performance slows each time Data Integration pages to a temporary file.

Perform the following tasks to optimize caches:

Limiting connected fields

For transformations that use data cache, limit the number of connected input/output and output only fields. Limiting the number of connected input/output or output fields reduces the amount of data the transformations store in the data cache.

Increasing cache sizes

Configure the cache size to specify the amount of memory allocated to process a transformation. The amount of memory you configure depends on how much memory cache and disk cache you want to use.

If the cache size isn't big enough, Data Integration processes some of the transformation in memory and pages information to cache files to process the rest of the transformation. Each time Data Integration pages to a cache file, performance slows.

If the mapping contains a transformation that uses a cache, and you run the task on a machine with sufficient memory, increase the cache sizes to process the transformation in memory.

Verbose logs

You can run a mapping task in standard or verbose execution mode. When you run the task in verbose execution mode, the mapping generates additional data in the logs that you can use for troubleshooting.

Use verbose execution mode only for troubleshooting purposes. Verbose execution mode impacts performance because of the amount of data it generates.