Advanced Clusters > Advanced configurations > Resource requirements for cluster nodes
  

Resource requirements for cluster nodes

When you select instance types in an advanced configuration, make sure that the master and worker nodes have enough resources to run advanced jobs successfully.

Master node

The master node is recommended to have at least 8 GB of memory and 4 CPUs.
Note: Because processing on the master node is network-intensive, avoid T instance types in an AWS environment.

Worker nodes

Worker nodes are recommended to have at least 16 GB of memory and 8 CPUs.
The following table lists the default resource requirements for worker nodes:
Component
Default memory requirement
Default CPU requirement
Kubernetes system
1 GB per worker node
0.5 CPU per worker node with an additional 0.5 CPU across the cluster
Spark shuffle service
2 GB per worker node
1 CPU per worker node
Spark driver
4 GB
0.75 CPU
Spark executor
6 GB, or 3 GB per Spark executor core
1.5 CPUs, or 0.75 CPU per Spark executor core
Based on the default resource requirements, a cluster with one worker node requires 13 GB of memory and 4.25 CPUs.
When worker nodes are added to the cluster, each worker node reserves an additional 3 GB of memory and 1.5 CPU for the Kubernetes system and the Spark shuffle service. Therefore, a cluster with two worker nodes requires 16 GB of memory and 5.75 CPUs.

Reconfiguring resource requirements

If you cannot provision enough resources to fulfill the default requirements, you can reconfigure some of the requirements.
You can reconfigure the requirements for the following components:
Spark shuffle service
If you disable the shuffle service, the Spark engine cannot use dynamic allocation. For more details, contact Informatica Global Customer Support.
Spark driver
To reconfigure the amount of memory for the Spark driver, use the Spark session property spark.driver.memory in the mapping task. To set the memory in terms of GB, use a value such as 2G. To set the memory in terms of MB, use a value such as 1500m.
For information about reconfiguring the CPU requirement for the Spark driver, contact Informatica Global Customer Support.
Spark executor
To reconfigure the amount of memory for the Spark executor, use the Spark session property spark.executor.memory in the mapping task. Similar to the memory value for the Spark driver, you can specify the memory in GB or MB.
You can also change the number of Spark executor cores using the Spark session property spark.executor.cores. The default number of cores for GPU-enabled clusters is 4. The default number of cores for all other clusters is 2.
If you edit the number of cores, you change the number of Spark tasks that run concurrently. For example, two Spark tasks can run concurrently inside each Spark executor when you set spark.executor.cores=2.
For information about reconfiguring the CPU requirement for Spark executors, contact Informatica Global Customer Support.
Note: If you reduce the memory too low for the Spark driver and Spark executor, these components might encounter an OutOfMemoryException.
You cannot edit the resource requirements for the Kubernetes system. The resources are required to maintain a functional Kubernetes system.
For more information about the Spark session properties, see Tasks in the Data Integration help.

Resource requirements example

You have an advanced cluster with one worker node. The worker node has 16 GB of memory and 4 CPUs.
If you run an advanced job using the default requirements, the job fails. The Kubernetes system and the Spark shuffle service reserve 3 GB and 2 CPUs, so the cluster has a remaining 13 GB and 2 CPUs to run jobs. The job cannot run because the cluster requires 10 GB of memory and 2.25 CPUs to start the Spark driver and Spark executor.
If you cannot provision a larger instance type, you can reduce the CPU requirement by setting the following advanced session property in the mapping task:
When the number of Spark executor cores is 1, the Spark executor requires only 0.75 CPUs instead of 1.5 CPUs.
If you process a small amount of data, the Spark driver and executor require only a few hundred MB, so you might consider reducing the memory requirements for the driver and executor as well. You can reduce the requirements in the following way:
After you reconfigure the resource requirements, the cluster must have at least 5 GB of memory and 3.5 CPUs. One worker node with 16 GB and 4 CPUs fulfills the requirements to run the job successfully.