Data Integration Performance Tuning > Optimizing mappings > Optimizing Lookup transformations
  

Optimizing Lookup transformations

If the lookup table is on the same database as the source table and caching isn't feasible, join the tables in the source database rather than using a Lookup transformation.
When you use a Lookup transformation, perform the following tasks to increase performance:

Caching lookup tables

If a mapping contains Lookup transformations, you might want to enable lookup caching.
When you enable caching, Data Integration caches the lookup table and queries the lookup cache during the task run. When this option isn't enabled, Data Integration queries the lookup table on a row-by-row basis.
The result of the lookup query and processing is the same, whether or not you cache the lookup table. However, using a lookup cache can increase mapping performance for smaller lookup tables. In general, you want to cache lookup tables that need less than 300 MB.
If you enable lookup caching, perform the following tasks to increase performance:

Cache types

Consider the type of cache you use to improve performance.
You can use the following cache types:
Persistent cache
To save and reuse the cache files, you can configure the transformation to use a persistent cache. Use a persistent cache when you know the lookup table doesn't change between task runs. Using a persistent cache can improve performance because Data Integration builds the memory cache from the cache files instead of from the database.
Dynamic cache
Use a dynamic lookup cache to keep the lookup cache synchronized with the target. If the cache is static, the data in the lookup cache doesn't change as the mapping task runs. If the task uses the cache multiple times, the task uses the same data. If the cache is dynamic, the task updates the cache based on the actions in the task, so if the task uses the lookup multiple times, downstream transformations can use updated data.

Enabling concurrent caches

When Data Integration runs a mapping that contains Lookup transformations, it builds a cache in memory when it processes the first row of data in a cached Lookup transformation. If there are multiple Lookup transformations in a mapping, Data Integration creates the caches sequentially when the first row of data is processed by the Lookup transformation. This slows Lookup transformation processing.
You can enable concurrent caches to improve performance. When the number of additional concurrent pipelines is set to one or more, Data Integration builds caches concurrently rather than sequentially. Performance improves greatly when the tasks contain a number of active transformations that might take time to complete, such as Aggregator, Joiner, or Sorter transformations. When you enable multiple concurrent pipelines, Data Integration doesn't wait for active task runs to complete before it builds the cache. Other Lookup transformations in the pipeline also build caches concurrently.

Optimizing lookup condition matching

When the Lookup transformation matches lookup cache data with the lookup condition, it sorts and orders the data to determine the first matching value and the last matching value.
You can configure the transformation to return any value that matches the lookup condition. When you configure the Lookup transformation to return any matching value, the transformation returns the first value that matches the lookup condition. It doesn't index all fields as it does when you configure the transformation to return the first matching value or the last matching value. When you use any matching value, performance can improve because the transformation doesn't index on all fields.

Reducing cached rows

You can reduce the number of rows included in the cache to increase performance. Use the Lookup SQL Override option to add a WHERE clause to the default SQL statement.

Overriding the ORDER BY statement

By default, Data Integration generates an ORDER BY statement for a cached lookup. The ORDER BY statement contains all lookup fields.
To increase performance, suppress the default ORDER BY statement and enter an override ORDER BY with fewer columns.
Data Integration always generates an ORDER BY statement, even if you enter one in the override. Place two dashes (--) after the ORDER BY override to suppress the generated ORDER BY statement.
For example, a Lookup transformation uses the following lookup condition:
ITEM_ID = IN_ITEM_ID
PRICE <= IN_PRICE
The Lookup transformation includes three lookup fields used in the mapping, ITEM_ID, ITEM_NAME, and PRICE. When you enter the ORDER BY statement, enter the columns in the same order as the fields in the lookup condition. Enclose all database reserved words in quotes.
Enter the following lookup query in the lookup SQL override:
SELECT ITEMS_DIM.ITEM_NAME, ITEMS_DIM.PRICE, ITEMS_DIM.ITEM_ID FROM ITEMS_DIM ORDER BY ITEMS_DIM.ITEM_ID, ITEMS_DIM.PRICE --

Using a machine with more memory

To increase performance, run the mapping on a Secure Agent machine with a large amount of memory. Increase the index and data cache sizes as high as you can without straining the machine.
If the Secure Agent machine has enough memory, increase the cache so it can hold all data in memory without paging to disk.

Optimizing the lookup condition

If you include more than one lookup condition, place the conditions in an optimal order to increase lookup performance.
Use the following order:
  1. 1Equal to (=)
  2. 2Less than (<), greater than (>), less than or equal to (<=), greater than or equal to (>=)
  3. 3Not equal to (!=)

Filtering lookup rows

Create a filter condition to reduce the number of lookup rows retrieved from the source when the lookup cache is built.

Indexing the lookup table

Data Integration needs to query, sort, and compare values in the lookup condition columns. The index needs to include every column used in a lookup condition.
You can improve performance for the following types of lookups:

Optimizing multiple lookups

If a mapping contains multiple lookups, even with caching enabled and enough heap memory, the lookups can slow performance. Tune the Lookup transformations that query the largest amounts of data to improve overall performance.