Spark configuration tips

This pages provides some help to configure Spark applications to run on Data Mechanics. This comes in complement with the API reference.

How to choose a Spark image?

If you don't need to use your own image, you can rely on Data Mechanics to choose one for you.

Here is the information used to select an image:

  • the application type type (possible values: Java, Scala, or Python),
  • the Spark version sparkVersion (possible values: 2.4.4 or 3.0.0).

How to use a custom Spark image?

The API lets you specify your own Spark images:

  • image sets the Docker image for both driver and executors
  • driver.image sets the Docker image for the driver only
  • executor.image sets the Docker image for the executors only

The image name must be a fully-qualified URI containing the registry, the image name and tag. Some examples:

  • AWS: <account_id>.dkr.ecr.<region>.amazonaws.com/<image_name>:<image_tag>
  • GCP: gcr.io/<project_name>/<image_name>:<image_tag>

If your registry requires authentication, the Images page of the Kubernetes documentation details different options:

EKS natively interacts with ECR to refresh tokens. As a consequence, your pods already have acess to Docker images stored in ECR, see here.

We strongly recommend that you use one of our images as base for your custom image. They have been designed to run on Kubernetes and come with some optimizations!

How to maximize resource utilization on the cluster?

When using Spark on Kubernetes, it is important to size executors so that they make the most use of the resources of the cluster. Unfortunately, this requires knowledge of both Spark and Kubernetes internals to do it properly. Below is our take on this complex topic!

There are roughly two situations to avoid: wasted capacity on nodes, and unschedulable pods.

Wasted capacity on nodes

Suppose that you have m5.xlarge nodes on a cluster deployed on EKS. These instances have 4 cores and 16GiB of memory each.

You have set the following size for your executor pods:

cores: 3
memory: "10g"

In this case, it is impossible for Kubernetes to schedule more than one executor per node. So every executor creation causes a new instance to be launched.

As a result, if your Spark app has 10 executors, Spark is using 30 cores when Kubernetes runs 10 m5.xlarge instances, which is 40 cores. 25% of the capacity is lost!

Unschedulable pods

Let's continue with the example of m5.xlarge instances.

To make full use of your cluster, you might be tempted to set the following size for your executor pods:

cores: 4
memory: "16g"

Kubernetes won't be able to schedule executor pods on the cluster with these settings.

The issue is that Kubernetes reserves some capacity on each node for the system daemons. Only the rest of the capacity, dubbed allocatable capacity, is available to Spark pods.

In our example, Kubernetes cannot allocate all 4 cores to a pod.

Good rules of thumb for the non-allocatable capacity are:

  • up to 25% of memory can be reserved by Kubernetes
  • up to 5% of CPU can be reserved by Kubernetes

One executor per node

This is a heuristic you can follow to have one executor per node and maximize the resource utilization of your cluster by executor pods.

cores: <number of cores in a node>
coreRequest: <number of cores in a node - 15%>
memory: <(memory in a node - 25%) / 1.4>

On a m5.xlarge (4 cores and 16GiB), this translates to the follow settings:

cores: 4
coreRequest: "3400m"
memory: "8.5g"

Here's the rationale behind the formulas:

  • when cores and coreRequest are both set, cores control the number of parallel tasks that a Spark executor runs, and coreRequest the actual amount of CPU requested from Kubernetes. Since a Spark executor pod almost never uses 100% of its resources, this works well.
  • The 15% of CPU account for 5% non-allocatable capacity and 10% of extra capacity for daemonsets.
  • The 25% of memory simply account for non-allocatable capacity.
  • We divide the amount of memory requested by 1.4 because Spark requests an extra 40% of memory from Kubernetes. This is controlled by the parameter memoryOverhead, but we do not advise to change this internal parameter of Spark.

To go further, the page Running Spark on Kubernetes contains some useful information.