Compute Component

The compute component of the Data Integration Service is the execution Data Transformation Manager (DTM). The DTM extracts, transforms, and loads data to complete a data transformation job.

The DTM must run on a node with the compute role. A node with the compute role can perform computations requested by application services.

Execution Data Transformation Manager

The execution Data Transformation Manager (DTM) extracts, transforms, and loads data to run a data transformation job such as a preview or mapping.

When a service module in the Data Integration Service receives a request to run a job, the service module sends the request to the LDTM. The LDTM optimizes and compiles the job, and then sends the compiled job to the DTM. A DTM instance is started to run the job and complete the request.

A DTM instance is a specific, logical representation of the DTM. The Data Integration Service runs multiple instances of the DTM to complete multiple requests. For example, the Data Integration Service runs a separate instance of the DTM each time it receives a request from the Developer tool to preview a mapping.

The DTM completes the following types of jobs:

DTM Resource Allocation Policy

The Data Transformation Manager resource allocation policy determines how to allocate the CPU resources for tasks. The DTM uses an on-demand resource allocation policy to allocate CPU resources.

When the DTM runs a mapping, it converts the mapping into a set of tasks such as:

The DTM allocates CPU resources only when a DTM task needs a thread. When a task completes or if a task is idle, the task returns the thread to a thread pool. The DTM reuses the threads in the thread pool for other DTM tasks.

Processing Threads

When the DTM runs mappings, it uses reader, transformation, and writer pipelines that run in parallel to extract, transform, and load data.

The DTM separates a mapping into pipeline stages and uses one reader thread, one transformation stage, and one writer thread to process each stage. Each pipeline stage runs in one of the following threads:

Because the pipeline contains three stages, the DTM can process three sets of rows concurrently and optimize mapping performance. For example, while the reader thread processes the third row set, the transformation thread processes the second row set, and the writer thread processes the first row set.

If you have the partitioning option, the Data Integration Service can maximize parallelism for mappings and profiles. When you maximize parallelism, the DTM separates a mapping into pipeline stages and uses multiple threads to process each stage.

Output Files

The DTM generates output files when it runs mappings, mappings included in a workflow, profiles, SQL queries to an SQL data service, or web service operation requests. Based on transformation cache settings and target types, the DTM can create cache, reject, target, and temporary files.

By default, the DTM stores output files in the directories defined by execution options for the Data Integration Service.

Data objects and transformations in the Developer tool use system parameters to access the values of these Data Integration Service directories. By default, the system parameters are assigned to flat file directory, cache file directory, and temporary file directory fields.

For example, when a developer creates an Aggregator transformation in the Developer tool, the CacheDir system parameter is the default value assigned to the cache directory field. The value of the CacheDir system parameter is defined in the Cache Directory property for the Data Integration Service. Developers can remove the default system parameter and enter a different value for the cache directory. However, jobs fail to run if the Data Integration Service cannot access the directory.

In the Developer tool, developers can change the default system parameters to define different directories for each transformation or data object.

Cache Files

The DTM creates at least one cache file for each Aggregator, Joiner, Lookup, Rank, and Sorter transformation included in a mapping, profile, SQL data service, or web service operation mapping.

If the DTM cannot process a transformation in memory, it writes the overflow values to cache files. When the job completes, the DTM releases cache memory and usually deletes the cache files.

By default, the DTM stores cache files for Aggregator, Joiner, Lookup, and Rank transformations in the list of directories defined by the Cache Directory property for the Data Integration Service. The DTM creates index and data cache files. It names the index file PM*.idx, and the data file PM*.dat.

The DTM stores the cache files for Sorter transformations in the list of directories defined by the Temporary Directories property for the Data Integration Service. The DTM creates one sorter cache file.

Reject Files

The DTM creates a reject file for each target instance in a mapping or web service operation mapping. If the DTM cannot write a row to the target, the DTM writes the rejected row to the reject file. If the reject file does not contain any rejected rows, the DTM deletes the reject file when the job completes.

By default, the DTM stores reject files in the directory defined by the Rejected Files Directory property for the Data Integration Service. The DTM names reject files based on the name of the target data object. The default name for reject files is <file_name>.bad.

Target Files

If a mapping or web service operation mapping writes to a flat file target, the DTM creates the target file based on the configuration of the flat file data object.

By default, the DTM stores target files in the list of directories defined by the Target Directory property for the Data Integration Service. The DTM names target files based on the name of the target data object. The default name for target files is <file_name>.out.

Temporary Files

The DTM can create temporary files when it runs mappings, profiles, SQL queries, or web service operation mappings. When the jobs complete, the temporary files are usually deleted.

By default, the DTM stores temporary files in the list of directories defined by the Temporary Directories property for the Data Integration Service. The DTM also stores the cache files for Sorter transformations in the list of directories defined by the Temporary Directories property.