How to dynamically change variables in a Databricks notebook based on to which environment was it deployed?

478 Views Asked by At

I want to move data from S3 bucket to Databricks. On both platforms I have separate environments for DEV, QA, and PROD.

I use a Databricks notebook which I deploy to Databricks using terraform.

Within the notebook there are some hardcoded variables, pointing at the specific AWS account and bucket.

I want to dynamically change those variables based on to which Databricks environment I deploy the notebook.

It probably can be achieved with Databricks secrets, but I'd rather not use Databricks CLI. Are there other options?

Does terraform provide control over specific code cells within a notebook?

2

There are 2 best solutions below

0
wookash On BEST ANSWER

I ended up using cluster's environment variables.

resource "databricks_job" "my_job" {
  # (...)
  new_cluster {
    # (...)
    spark_env_vars = {
      "ENVIRONMENT" : var.environment
    }
  }

  notebook_task {
    notebook_path = databricks_notebook.my_notebook.path
  }
}

Then in the notebook I hardcoded the constants in a dictionary and I select them by the cluster's environment variable:

from os import environ
db_env = environ["ENVIRONMENT"]

aws_account_ids = {
    "dev": 123,
    "qa": 456,
    "prod": 789,
}

aws_account_id = aws_account_ids[db_env]

1
Alex Ott On

There are different options to achieve this:

  • Hardcode all constants in the source code and then select what is necessary via widgets, something like this (you can select value interactively or pass it as a parameter of the notebook task in a job):
dbutils.widgets.dropdown("env", "dev", ["dev", "prod"])
# separate cell
env = dbutils.widgets.get("env")
if env == "dev":
  bucket = "..."
  ...
elif env == "prod":
  bucket = "..."
else:
 raise Exception("Unknown environment")
  • You can inject necessary variables into a notebook template using Terraform's built-in templatefile function when deploying to a specific environment

  • Or when you're using databricks_job, then you can simply pass all parameters in the base_parameters map, and then you can pull these parameters via dbutils.widgets.get.