In Azure databricks API I am trying to pull latest changes to main branch in each folder in Databricks repos by using Azure Databricks API. This is where I am refering to:

Azure Databricks repos

When I use postman to make the calls by posting requests to the following endpoint, it pulls successfully as shown below:

endpoint:

https://<databricks-workspace>.azuredatabricks.net/api/2.0/repos/<repo-id>

Successfully pull from postman

This is the header of the same request:

header of request in postman

To explain more, the header is constructed by sending a bearer token, a management token and another field which contains subscription, resource group and databricks workspace as shown below:

/subscriptions/<Azure-subscription-id>/resourceGroups/<resourcegroup-name>/providers/Microsoft.Databricks/workspaces/<databricks-workspace-name>

As shown above it works perfectly well when I call it on my local machine with postman. But when I use the same thing by using Azure DevOps it fails with the error:

Exception: b'{"error_code":"PERMISSION_DENIED","message":"Missing Git provider credentials. Go to User Settings > Git Integration to add your personal access token."}'

Note that following this link I have already generated a PAT token in Azure DevOps and added it to my Service Principal, otherwise it wouldn't have worked on my postman. Still it is giving this error in DevOps pipeline as shown below:

Azure DevOps pipeline fails

This pipeline is doing exactly what I already did with postman. This pipeline is calling a python script which is constructing the request header and body as shown above in postman. The python code is as below but I am almost sure it is not the python script that is causing the issue as I have used the same method to list repos, get specific repo, create clusters and many more by the same methodology. I think it must be some administrative problem which I cannot pin point.

The python script:

import requests
import os
import json


## Constructing the header request
DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']}


TRIGGERING_BRANCH = "\"" + os.environ["TRIGGERING_BRANCH"] + "\""
print("TRIGGERING_BRANCH path is {}".format(TRIGGERING_BRANCH))


## Constructing the body request
body_json = '''
    {
    "branch": "main"
    }
'''

## Checking the request body format is correct
print("Request body in json format:")
print(body_json)

## The prints are only for code tracing
DBRKS_REPOS_LIST_JSON = os.environ["DBRKS_REPOS_LIST"]
print("Type of DBRKS_REPOS_LIST_JSON is {}".format(type(DBRKS_REPOS_LIST_JSON)))

## This section extracts repo Ids from the variable containing repo Ids and branches to later construct url endpoint
str_obj = DBRKS_REPOS_LIST_JSON.replace('[','').replace(']','').replace('(','').replace(')','').replace('\'','').replace(' ','').split(',')
output = {}
str_to_list = [str_obj[i:i+2] for i in range(0, len(str_obj), 2)]
print("str_to_list")
print(str_to_list)

for e in str_to_list:
    output[e[0]] = e[1]

print("output")
print(output)

repo_ids_for_main_branch = []
for key, value in output.items():
    if value == 'main':
        repo_ids_for_main_branch.append(key)

print("repo_ids_for_main_branch")
print(repo_ids_for_main_branch)

## This is the main part which is making the API call like postman:
for repo_id in repo_ids_for_main_branch:
    dbrks_pull_repo_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.0/repos/"+str(repo_id)
    print("endpoint url is {}".format(dbrks_pull_repo_url))
    response = requests.patch(dbrks_pull_repo_url, headers=DBRKS_REQ_HEADERS, data=body_json) 

    if response.status_code == 200:
        print("Branch changed or pulled successfully!")
        print(response.status_code)
        print(response.content)
        print('####################')
    else:
        print("Couldn't pull or change branch!")
        raise Exception(response.content)

All the os variables in the code as passed from Azure DevOps pipeline to the script and I have checked their values by printing and they are all correct.

I would like to know what the root cause problem is and how I can resolve it.

1

There are 1 best solutions below

0
On

A small operation is there to implement. The major issue was with GIT integration. This can be resolved with the following steps.

Enable support for arbitrary files in Databricks Repos

This operation is needed to implement. Follow the link for the series of steps.