Run a first Spark application

This page describes how to run a first Spark application to test your cluster.

Run a Spark application

The command below will run the Monte-Carlo Pi computation contained in all Spark distributions.

curl -X POST \
http(s)://<your-cluster-url>/api/apps/ \
-H 'Content-Type: application/json' \
-d '{
"jobName": "spark-pi",
"configOverrides": {
"type": "Scala",
"sparkVersion": "3.0.0",
"mainApplicationFile": "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-preview.jar",
"mainClass": "org.apache.spark.examples.SparkPi",
"arguments": ["10000"],
"executor": {
"cores": 2
}
}
}'

Here's a breakdown of the payload:

  • We assign the jobName spark-pi to the application. Applications with the same jobName are grouped in the Data Mechanics dashboard. A job is typically an application that runs every day, every hour, etc.
  • This is a Scala application running Spark 3.0.0.
  • The command to run is specified by mainApplicationFile, mainClass, and arguments.
  • We override the default configuration and request 2 cores per executor.

The API should return something like:

{
"appName": "spark-pi-20191208-154504-xh6x5",
"jobName": "spark-pi",
"configTemplateName": "",
"config": {
"type": "Scala",
"sparkVersion": "3.0.0",
"mode": "cluster",
"image": "gcr.io/dm-docker/spark-gcs:3.0.0",
"mainClass": "org.apache.spark.examples.SparkPi",
"mainApplicationFile": "local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0-preview.jar",
"arguments": [
"10000"
],
"sparkConf": {
"spark.dynamicAllocation.enabled": "true",
"spark.dynamicAllocation.shuffleTracking.enabled": "true"
},
"driver": {
"serviceAccount": "spark-driver",
"cores": 1,
"memory": "1g"
},
"executor": {
"instances": 1,
"cores": 2,
"memory": "4g"
},
"restartPolicy": {
"type": "Never"
}
}
}

Note that some additional configurations are automatically set by Data Mechanics.

To know more about the API routes and parameters, check out the API reference or navigate to http(s):<your-cluster-url>/api/ in your browser.

The running application should automatically appear in the dashboard: An app running

Clicking on the application opens the application page. It shows the app configuration as a JSON blob and a live log stream. Application page

Note how the number of executors increases over time. Data Mechanics enables dynamic allocation by default when running Spark 3.0.0.

This example uses a JAR embedded in the Spark Docker image and neither reads nor writes data. For a more real-world use case, refer to this page.