I am trying to trigger a databricks job in a pipeline step where I use the job id passed as a variable from the previous step.
This is how I create the job id and pass it as a variable:
- script: |
job_id=$(databricks jobs create --json '{"name": "test", "existing_cluster_id" : "'"$(db_clusterid)"'", "notebook_task ": {"notebook_path": "'"$(nbpath)"'"}}')
echo "##vso[task.setvariable variable=db_job_id;]'"$job_id"'"
env:
DB_HOST: $(db_host)
DB_TOKEN: $(db_token)
displayName: 'Create Job'
When I echo the variable in the next step it looks as expected:
- script: |
echo $DB_JOB_ID
env:
DB_JOB_ID: $(db_job_id)
DB_HOST: $(db_host)
DB_TOKEN: $(db_token)
displayName: 'Echo Job ID'
Output from echo:
'{ "job_id": 123 }'
However, when I try to run the job as follows:
- script: |
databricks jobs run-now --job-id $DB_JOB_ID
env:
DB_JOB_ID: $(db_job_id)
DB_HOST: $(db_host)
DB_TOKEN: $(db_token)
displayName: 'Run Job'
The following error message arises:
Error: Got unexpected extra arguments ("job_id": 123}')
Instead of providing $DB_JOB_ID
I also tried "$DB_JOB_ID"
and "'"$DB_JOB_ID"'"
which did not work either.
What would be the correct statement?
Your problem is that you're putting the whole JSON that is returned, while the
run-now
is requiring only job ID that is number. You can replace the--job-id $DB_JOB_ID
with--job-id $(echo $DB_JOB_ID||sed -e 's|^.*:[ ]*\([0-9][0-9]*\)[ ]*.*$|\1|')
- it will extract only required job ID.P.S. Instead of the
databricks jobs create
as one step and then runningdatabricks jobs run-now
, it's better to usedatabricks jobs submit
(or use Run Submit REST API) - it just will run the job without creating it.You can also look onto the dbx package developed inside Databricks - it may simplify the way how do you schedule jobs, wait for results, etc.