Databricks DBX Artifact Location

397 Views Asked by At

We are using Databricks DBX the following way:

  • dbx execute for development in IDE.
  • Upload resulting package as Python wheel to GCS bucket using dbx deploy <workflow> --assets-only. We don't create a permanent job in Databricks workflows.
  • Execute Python wheel on Databricks job cluster through Airflow DatabricksSubmitRunOperator.

I have two questions related to the artifact location. This location is specified in the .dbx/project.json file.

Q1: Each time a dbx deploy is done, a new version of the wheel is uploaded to the GCS bucket. Is there a possibility to have no versions (as our code is versioned) and just overwrite the wheel each time on the same location? The multiple versions make it difficult to add the filepath to the wheel in our Airflow DatabricksSubmitRunOperator.

Q2: We have a GCS bucket for dev, test and prod. The artifact_location is hard-coded in the JSON file. Is there a way to parameterize this for different environments? Or what is the recommended pattern in CICD pipeline? Deploy wheel to DEV bucket using dbx deploy and then copy that wheel to TEST and PROD?

0

There are 0 best solutions below