I have a pyspark DataFrame
inside of a Databricks environment, which I'm trying to write to a specific schema in function of which target environment I'm using with Databricks Asset Bundles. The cluster is unity catalog enabled.
My bundle.yml looks like this:
bundle:
name: XXX
workspace:
host: XXX
targets:
dev:
mode: development
default: true
resources:
jobs:
YYY:
name: my_job
job_clusters:
- job_cluster_key: my_cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: Standard_D3_v2
tasks:
- task_key: my_task
job_cluster_key: my_cluster
spark_python_task:
python_file: do_task.py
And my code looks like this:
spark = SparkSession.builder.appName("my_application").getOrCreate()
schema_string = dev_username_bronze | prod_bronze
df = spark.read.parquet("/my/nice/location")
df.write.saveAsTable(f"my_catalog_name.{schema_string}.my_table")
I would be running my bundle using databricks bundle run -t dev
(using Databricks CLI v0.206.0)
Now, I wrote some pseudo code to define schema_string
but the idea is there: how can I use the value I supplied using the -t
flag inside of my Python code, so that I can define my schema_string
value properly?
For spark_python_task, you can pass parameter by using
parameters
reference: https://docs.databricks.com/en/workflows/jobs/jobs-2.0-api.html#jobssparkpythontask
in you yaml file
test.py
output: