Authentication

If your cluster is accessible to the Internet or if you need to restrict who can run Spark applications in your company, you might want to set up an authentication mechanism on Data Mechanics.

Note that both authentication methods described below involve sending credentials (passwords, API keys) with HTTP requests. For this reason, it is strongly advised to set up TLS on your ingress.

Basic auth set up by the user

If you provided a Data Mechanics customer key when you installed the Helm chart on your cluster (see the Installation section), then the Data Mechanics platform does not authenticate connections:

  • requests to the API under http(s)://<your-cluster-url>/api/ do not need to specify credentials,
  • the dashboard under http(s)://<your-cluster-url>/dashboard/ does not ask for a user name and password, and
  • the notebook service under http(s)://<your-cluster-url>/notebooks/ does not need credentials.

To provide minimal protection, you can enforce basic auth by creating a custom ingress.

The manifest below is adapted from this example in the NGINX controller docs:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-with-auth
namespace: data-mechanics
annotations:
# type of authentication
nginx.ingress.kubernetes.io/auth-type: basic
# name of the secret that contains the user/password definitions
nginx.ingress.kubernetes.io/auth-secret: basic-auth
# message to display with an appropriate context why the authentication is required
nginx.ingress.kubernetes.io/auth-realm: 'Authentication Required'
spec:
rules:
- http:
paths:
- backend:
serviceName: submission-service
servicePort: 80

This assumes that a secret basic-auth exists in namespace data-mechanics (see here to create the secret).

Data Mechanics authentication

Data Mechanics comes with its own authentication mechanism. It is enforced if no Data Mechanics customer key was provided when the Helm chart was installed.

helm install data-mechanics data-mechanics.tgz \
--namespace data-mechanics \
--set customerKey=<your-customer-key> # Remove this line and authentication is activated

Here's how it works:

  • requests to the API under http(s)://<your-cluster-url>/api/ must specify a Data Mechanics customer key in header X-API-Key,
  • the dashboard under http(s)://<your-cluster-url>/dashboard/ asks for a user name and password upon connection, and
  • the notebook service under http(s)://<your-cluster-url>/notebooks/ needs a Data Mechanics customer key in header Authentication. You should not request this service manually though, see Jupyter notebooks.

The user name and password for the dashboard are stored on Data Mechanics side.