Select worker node instance types based on the data logic that you process on the advanced cluster.
Using GPU instances
GPU instances can provide a 5x performance gain and 72% lower TCO. However, a significant number of operations must be able to run on GPU.
To find out which operations run on GPU, use the Spark event log and search for GPU-CPU data exchange operations such as GPUColumnarToRow and GPURowToColumnar.
Using Graviton instances
Graviton2 instances in an advanced cluster on AWS can be up to 26% faster for CPU-intensive jobs and 41% cheaper. Shuffle-intensive jobs that include the Aggregator, Joiner, Rank, and Sorter transformations might not see a difference.
Using AMD chipsets
AMD chipsets, such as AMD EPYC 7452, on master and worker nodes in an advanced cluster on Microsoft Azure can be 1.2x faster than Intel Xeon. Shuffle-intensive jobs that include the Aggregator, Joiner, Rank, and Sorter transformations as well as mappings with complex expressions can be 1.3-1.4x faster.