This page shows how to connect your Data Mechanics Spark applications and jobs to an external Hive metastore database.
The process involves the following steps:
- Make the jar file containing the JDBC driver of your database accessible to Data Mechanics
- Configure the Spark Config
JDBC Driver jar file
For PostgreSQL the jar file can be found here. You have different options to resolve the dependency, among them the following two:
Option 1: Download and Copy the JDBC driver jar file to your Data Mechanics image
Download the jar file and create a new Docker image like this:
Option 2: Reference the dependency to the jar file in your Data Mechanics template
For a complete reference on template attributes see here
You have different options to configure the credentials, among them the following two:
Option 1: Specify the connection in a Data Mechanics template
The configuration can be specified in a Data Mechanics template or in the core-site.xml file of the Hadoop configuration.
Additionnaly if you use older version of Hive you can add:
Option 2: Specify the connection in the core-site.xml file
You can also specify the confidential information in the core-site.xml file. The other parameters remain in the Data Mechanics template.
The core-site.xml file is in the $HADOOP_CONF_DIR path. See configuring environment variables.