We are using Databricks DBX the following way:
dbx execute
for development in IDE.- Upload resulting package as Python wheel to GCS bucket using
dbx deploy <workflow> --assets-only
. We don't create a permanent job in Databricks workflows. - Execute Python wheel on Databricks job cluster through Airflow
DatabricksSubmitRunOperator
.
I have two questions related to the artifact location. This location is specified in the .dbx/project.json file.
Q1: Each time a dbx deploy
is done, a new version of the wheel is uploaded to the GCS bucket. Is there a possibility to have no versions (as our code is versioned) and just overwrite the wheel each time on the same location? The multiple versions make it difficult to add the filepath to the wheel in our Airflow DatabricksSubmitRunOperator
.
Q2: We have a GCS bucket for dev, test and prod. The artifact_location
is hard-coded in the JSON file. Is there a way to parameterize this for different environments? Or what is the recommended pattern in CICD pipeline? Deploy wheel to DEV bucket using dbx deploy
and then copy that wheel to TEST and PROD?