Default values in Databricks deployment.yaml file

572 Views Asked by At

In our deployment.yaml file we have basically they same instructions for each environment, but there are some settings I might want to se differently per environment, e.g. schedules.

Can I e.g. define a default profile, where I would put the steps once and then have just have override values per environment?

  default:
    workflows:
      - name: "Load_Daily"
        schedule:
          quartz_cron_expression: "0 1 * * * ?" #
          timezone_id: "Europe/Helsinki"
          pause_status: "PAUSED"
        job_clusters:
          - job_cluster_key: "default"
            <<: *basic-static-cluster
        max_concurrent_runs: 1
  prod:
    workflows:
      - name: "Load_Daily"
        schedule:
          quartz_cron_expression: "0 */1 * * * ?" #
          pause_status: "UNPAUSED"
1

There are 1 best solutions below

2
On

Yes, you can :)
You just have to create a reference to the default workflow (like you did for basic-static-cluster).

Here is a working example based on yours:

default:
  workflows:
    - &default_workflow
      name: "Load_Daily"
      schedule:
        quartz_cron_expression: "0 1 * * * ?" #
        timezone_id: "Europe/Helsinki"
        pause_status: "PAUSED"
      job_clusters:
        - job_cluster_key: "default"
      max_concurrent_runs: 1

prod:
  workflows:
    - <<: *default_workflow
      schedule:
        quartz_cron_expression: "0 */1 * * * ?" #
        pause_status: "UNPAUSED"

If you load it and print it, you can see that it replaced only the schedule part as you requested (assuming you saved the yaml as yaml_test.yml:

import yaml
from pprint import pprint

with open("yaml_test.yml","r") as f:
    yaml_conf = yaml.safe_load(f)

pprint(yaml_conf)

The output is:

{'default': {'workflows': [{'job_clusters': [{'job_cluster_key': 'default'}],
                            'max_concurrent_runs': 1,
                            'name': 'Load_Daily',
                            'schedule': {'pause_status': 'PAUSED',
                                         'quartz_cron_expression': '0 1 * * * '
                                                                   '?',
                                         'timezone_id': 'Europe/Helsinki'}}]},
 'prod': {'workflows': [{'job_clusters': [{'job_cluster_key': 'default'}],
                         'max_concurrent_runs': 1,
                         'name': 'Load_Daily',
                         'schedule': {'pause_status': 'UNPAUSED',
                                      'quartz_cron_expression': '0 */1 * * * '
                                                                '?'}}]}}