Configuration templates

This page shows how to create and push templates in order to ease Spark applications submission. This page assumes that you know how to run a Spark application on Data Mechanics.

Data Mechanics provides a way to store configurations and use them when launching a Spark applications. This can be useful when you need to share a large configuration between applications, or when you simply don't want to store configurations on your side.

Config templates are Spark application configuration fragments stored in Data Mechanics. The API routes under http(s)://<your-cluster-url>/config-templates/ lets you manage them as a REST resource.

To know more about the API routes and parameters, check out the API reference or navigate to http(s)://<your-cluster-url>/api/ in your browser.

The following command creates a config template spark-resources containing this block of Spark application configuration:

curl -X POST \
https://<your-cluster-url>/api/config-templates/ \
-H 'Content-Type: application/json' \
-d '{
"name": "spark-resources",
"config": {
"driver": {
"cores": 1,
"memory": "1g"
},
"executor": {
"instances": 1,
"cores": 2,
"memory": "4g"
}
}
}'

It can now be referenced when submitting a Spark application with field configTemplateName:

curl -X POST \
http(s)://<your-cluster-url>/api/apps/ \
-H 'Content-Type: application/json' \
-d '{
"jobName": "word-count",
"configTemplateName": "spark-resources",
"configOverrides": {
"type": "Scala",
"sparkVersion": "3.0.0",
"mainApplicationFile": "gs://<your-bucket>/wordcount.jar",
"mainClass": "org.<your-org>.wordcount.WordCount",
"arguments": ["gs://<your-bucket>/input/*", "gs://<your-bucket>/output"]
}
}'

Data Mechanics merges the configurations in config template spark-resources and in configOverrides. Note that the configuration in configOverrides has higher precedence than the config template.