Resource requirements for cluster nodes

When you select instance types in an advanced configuration, make sure that the master and worker nodes have enough resources to run advanced jobs successfully.

Master node

The master node is recommended to have at least 8 GB of memory and 4 CPUs.

Note: Because processing on the master node is network-intensive, avoid T instance types in an AWS environment.

Worker nodes

Worker nodes are recommended to have at least 16 GB of memory and 8 CPUs.

The following table lists the default resource requirements for worker nodes:

Component	Default memory requirement	Default CPU requirement
Kubernetes system	1 GB per worker node	0.5 CPU per worker node with an additional 0.5 CPU across the cluster
Spark shuffle service	2 GB per worker node	1 CPU per worker node
Spark driver	4 GB	0.75 CPU
Spark executor	6 GB, or 3 GB per Spark executor core	1.5 CPUs, or 0.75 CPU per Spark executor core

Based on the default resource requirements, a cluster with one worker node requires 13 GB of memory and 4.25 CPUs.

When worker nodes are added to the cluster, each worker node reserves an additional 3 GB of memory and 1.5 CPU for the Kubernetes system and the Spark shuffle service. Therefore, a cluster with two worker nodes requires 16 GB of memory and 5.75 CPUs.

Reconfiguring resource requirements

If you cannot provision enough resources to fulfill the default requirements, you can reconfigure some of the requirements.

Note: If you reduce the memory too low for the Spark driver and Spark executor, these components might encounter an OutOfMemoryException.

You cannot edit the resource requirements for the Kubernetes system. The resources are required to maintain a functional Kubernetes system.

For more information about the Spark session properties, see Tasks in the Data Integration help.

Resource requirements example

You have an advanced cluster with one worker node. The worker node has 16 GB of memory and 4 CPUs.

If you run an advanced job using the default requirements, the job fails. The Kubernetes system and the Spark shuffle service reserve 3 GB and 2 CPUs, so the cluster has a remaining 13 GB and 2 CPUs to run jobs. The job cannot run because the cluster requires 10 GB of memory and 2.25 CPUs to start the Spark driver and Spark executor.

If you cannot provision a larger instance type, you can reduce the CPU requirement by setting the following advanced session property in the mapping task:

When the number of Spark executor cores is 1, the Spark executor requires only 0.75 CPUs instead of 1.5 CPUs.

If you process a small amount of data, the Spark driver and executor require only a few hundred MB, so you might consider reducing the memory requirements for the driver and executor as well. You can reduce the requirements in the following way:

After you reconfigure the resource requirements, the cluster must have at least 5 GB of memory and 3.5 CPUs. One worker node with 16 GB and 4 CPUs fulfills the requirements to run the job successfully.