Troubleshooting errors
This page explains gives a list of common errors and how to troubleshoot and fix them.
My Spark application never finishes
You need to call spark.stop()
at the end of your application code, where spark can be your Spark session or Spark context.
Otherwise your application may keep running indefinitely.
Python worker exited unexpectedly (crashed)
If you see this message in the Spark UI or in the Driver logs, it means that a Python process running inside a Spark executor has crashed while running a Spark task. The full error typically looks like this:
The root cause of this issue is that the container was running out of memory, and so the operating system decided to interrupt a Python process to free up some memory.
To fix it, yous should increase your container memory overhead. Increase your container memory size, for example by choosing a memory-optimized instance type, can also help.
Executing into Containers
Often when debugging applications, it can be useful to execute into containers directly to view configuration files, resource usage metrics, or manually kick off scripts. You can think of executing like using ssh to tunnel into a physical machine or cloud VM. You can execute using the two methods below:
Docker (for local containers)
This will give you shell access to a local docker container. To find the container ID, run docker ps
to see a list of running images. You can also execute into a new docker container by running:
Kubernetes
This command will allow you to execute into currently running Kubernetes containers. To see the list of all available docker containers within a namespace, run:
In the Data Mechanics Kubernetes cluster, your spark applications will be in the namespace spark-apps
. Other namespaces contain resources that enable Kubernetes or the Data Mechanics platform to operate, and you should avoid or take extreme caution when executing into these resources.