Data Integration Performance Tuning > Data Integration performance tuning overview > Best practices
  

Best practices

Follow best practices to optimize performance.
Consider employing best practices in the following areas:

Secure Agent and cloud regions

To avoid performance impact due to network latency, locate the Secure Agent and the components it interacts with in the same region. For example, if your cloud data warehouse end points are located in an AWS US-West region, locate the Secure Agent machine within the US-West region.
If the Secure Agent and the components are deployed on the same cloud environment without on-premise components, put them on the same VPC/subnet. The availability zones should also be in the same region.
The following image shows the Informatica Intelligent Cloud Services Secure Agent and the components that it interacts with:
The image shows the Informatica Intelligent Cloud Services and cloud data are outside the firewall. The Secure Agent and on-premises data are within the firewall.

Secure Agent machines

For cloud data lake to cloud data warehouse mappings, choose hardware that supports high throughput in terms of disk input/output (I/O) and network bandwidth.
Be sure the machine that the Secure Agent is installed on meets the following requirements:

Secure Agent machine sizing requirements

Consider Secure Agent machine memory requirements for optimal performance.
A typical cloud data lake to cloud data warehouse mapping might require up to 3 CPU cores and 1 GB of JVM heap memory for a data size of approximately 7.5 GB. The default value for JVM heap memory is 64 MB. Additional DTM buffer block sizing and buffer pool sizing will increase memory footprint.
The following graph illustrates the physical memory (resident memory) usage for a flat file to cloud data warehouse passthrough mapping as it relates to the number of partitions. The default buffer block size is set to 100 MB and the JVM heap memory is set to 1 GB.
The image shows that the amount of resident memory usage increases as the number of partitions increase. For example, with one partition, resident memory usage is 1536 MB. With eight partitions, resident memory usage is 9626 MB.
Adding partitions to a cloud data lake to cloud data warehouse mapping linearly increases the requirement for CPU cores. The following graph illustrates the CPU consumption in cores for a flat file to cloud data warehouse passthrough mapping, with an increasing number of partitions.
The image shows that the amount of resident CPU core usage increases as the number of partitions increase. For example, a mapping with one partition requires 1.8 CPU cores. A mapping with eight partitions requires 14.4 CPU cores.
To improve performance, configure the maxDTMProcesses custom property and the JVM options.

Secure Agent machine sizing requirements for advanced mode

For a Secure Agent that processes mappings in advanced mode, consider Secure Agent machine requirements for optimal performance.
At a minimum, the Secure Agent requires 4 CPUs, 16 GB of memory, and 100 GB of disk space. For optimal processing, use an SSD.
The following table lists the optimal Secure Agent configuration based on the number of concurrent Spark tasks that the Secure Agent processes:
Number of concurrent Spark tasks
CPU and memory
JVM heap size
0-250 Spark tasks
8 CPUs and 32 GB memory
2 GB
250-500 Spark tasks
16 CPUs and 64 GB memory
On AWS, use 8 CPUs and 32 GB memory.
4 GB
On AWS, use 2 GB.
The JVM heap size is set to 2 GB by default. Increase the heap size to avoid out of memory errors.

maxDTMProcesses property

Set the maxDTMProcesses custom property to improve performance.
By default, a Secure Agent can schedule two mapping tasks for execution. If there are more than two tasks, additional tasks are queued and then scheduled for execution when a slot becomes available. This can cause the Secure Agent machine's capacity to be underutilized.
To achieve better utilization of the CPU capacity of the Secure Agent machine and achieve a higher degree of concurrency, you can set the maxDTMProcesses custom property for the Data Integration Server to the number of parallel tasks. For example, setting this property to 16 allows 16 tasks to run simultaneously.
The following image shows configuration for the maxDTMProcesses custom property on the agent details page in Administrator:
The Custom Configuration Details section of the agent details page shows the maxDTMProcesses property with a value of 16.
The recommended maxDTMProcesses value varies based on the connection types of the jobs that the agent runs:
When you set the maxDTMProcesses custom property, note the following guidelines:
Use these guidelines as a starting point and set the correct value iteratively.
For more information about setting agent properties, see Runtime Environments.

INFA_MEMORY and JVM options

You can configure the INFA_MEMORY property and JVM options to achieve optimal performance and avoid Java heap and memory-related errors. Define these properties on the agent details page in Administrator.
You can configure the INFA_MEMORY and JVM options using the following formats for values:
Format
Description
-Xms**m
The initial amount of memory allocated to the JVM when the Java process starts.
Since this is the initial value, the Xms value can be small, for example, 64m or 128m. The Java process will allocate more space as required.
-Xmx****m
The maximum amount of memory that the JVM can allocate as heap. After the Java process starts, it will continue to allocate space to store its objects. The allocation can continue until the maximum setting is reached.
Set the value to be large enough to hold all the Java objects and classes. On a 64-bit agent, the value can be about 1024M or 2048M.
-XX:MaxPermSize=***m
The maximum amount of permanent space that the JVM can use at a time. If the JVM needs more than the specified amount, the Java process fails.
You can set this value to an average of 512M. However, the value must be less than the -Xmx value. Increase the MaxPermSize value if you receive an error about permGen space.

INFA_MEMORY

Define the INFA_MEMORY property on the agent details page in the System Configuration Details section. Select the Data Integration Service and the Tomcat JRE type, then define the INFA_MEMORY property.
The following image shows the INFA_MEMORY property on the agent details page:
The System Configuration Details section of the agent details page shows the INFA_MEMORY property with the value -Xms32m -Xmx512m.

JVM options

Define JVM options on the agent details page in the System Configuration Details section. Select the Data Integration Service and the DTM type, then define a JVMOption property.
Each JVMOption property can include one JVM attribute. You can define up to five JVMOption properties. If you already have five JVMOption properties, you can create more as a custom property.
The -Xmx property determines the heap size. The default value is 64MB. For mappings that process large data volumes with multiple partitions, the default value can cause mapping failure because of insufficient Java heap space. The recommended value for this property is 2024MB.
The following image shows configuration for the JVMOption1 property on the agent details page:
The System Configuration Details section of the agent details page shows the JVMOption1 property with the value '-Xmx2024m'.

Secure Agent upgrades

If your organization uses multiple services and connectors, demand on a Secure Agent group can be high. To improve Secure Agent upgrade performance, you can disable services and connectors Secure Agent groups.

SQL ELT optimization

For a cloud data lake to cloud data warehouse integration, Data Integration intelligently identifies the opportunities to leverage cloud ecosystem or cloud data warehouse APIs for data pipeline processing.
Using SQL ELT optimization, Data Integration pushes the processing down to the cloud ecosystems or cloud data warehouses. This improves the data processing performance as data is processed close to the source and saves your company data transfer costs.
Data Integration supports full SQL ELT optimization for cloud data warehouse to cloud data warehouse integrations. Data Integration converts mapping logic to equivalent, optimized SQL queries that the cloud data warehouse executes, without data transfers that would normally incur additional processing time and data transfer charges. Data Integration uses the cloud data warehouse's compute capacity to process the data without additional resources, thus achieving compute efficiency.
Try to use SQL ELT optimization for cloud data lake to cloud data warehouse or cloud data warehouse to cloud data warehouse integrations whenever SQL ELT optimization is supported.

Cloud connector performance

Cloud data lake and cloud data warehouse connectors are designed for optimal data loading and unloading performance.
A common design pattern across these connectors is in the way Informatica stages data locally on disk before uploading to an end point or after downloading data from an end point. This staging process is a disk-intensive operation and requires both CPU and disk I/O resources. Keep this in mind for all cloud data lake to cloud data lake, cloud data lake to cloud data warehouse, and cloud data warehouse to cloud data warehouse integrations when SQL ELT optimization isn't used.
The following graph represents the performance of a concurrent cloud data warehouse mapping and the impact that sustained disk I/O has on it:
This image shows how disk I/O impacts processing time. For example, the image shows that with 125 MB, execution time is 23 minutes and 1 second. With 500 MB, execution time is 7 minutes and 38 seconds.
Informatica recommends a storage device with 500 Mbps disk throughput for a Data Integration workload with either partitioning enabled or concurrent executions. For detailed information on the various tuning options for these endpoints, see the appropriate connector guides. Performance tuning articles are available in the Informatica Knowledge Base for some connectors.

Mapping design and environment

When you design a mapping, follow best practices to optimize mapping performance.
Consider the following best practices:
Reduce data volume.
Enable source partitioning.
Enable source partitioning whenever possible. The mapping task divides the source data into partitions and processes the partitions concurrently.
Optimize data conversion.
Use local flat file staging.
When a mapping writes to or reads from a cloud data warehouse, you can optimize the mapping performance by configuring the Secure Agent to stage data locally in a temporary folder before writing to or reading from the cloud data warehouse end point.
In the Secure Agent properties, set the staging property INFA_DTM_STAGING_ENABLED_CONNECTORS for Tomcat to the plugin ID of the cloud data warehouse connector. Data Integration creates a flat file locally to stage the data and then loads the data from the staging file to the data warehouse target or unloads data from the data warehouse source and stages it locally. For more information, see the individual cloud connector performance tuning guides.
Tune hardware.
For example, improve network speed and use multiple CPUs.
Consider the Secure Agent virtual machine instance type.
Choose performant cloud instances such as Amazon Elastic Compute Cloud (EC2), Azure Virtual Machine (Azure VM), or Google Cloud Platform (GCP)​ based on the resource requirements.