Databricks PySpark Connect to Azure SQL with Service Principal

502 Views Asked by At

We would like to connect azure DB from azure databricks notebook, can you please someone help us to provide samples, I saw many of the answer and story but not clear about for me below two points.

what is the service principle id and secrete, where should I get it and if we have AD access, what should I keep inform to my admin to enable for this.

where we can get it the above two point in azure portal.

below stack overflow reference does not have any answer.

https://stackoverflow.com/collectives/azure/articles/75189853/how-to-connect-from-azure-databricks-to-azure-sql-db-using-service-principal

as per above link, below code available, how to use it with secrete credentials .

Note : my requirement is azure databricks, we have new job cluster, it is not a fulltime job cluster. we have to use when we required, our job cluster will be start and stop.

1

There are 1 best solutions below

5
DileeprajnarayanThumula On

Thanks to @The data swamp. for the excellent documentation

The below approach will help you connect to Azure SQL database.

Step 1

You will need to install the SQL Spark Connector and the Microsoft Azure Active Directory Authentication Library for Python.

In your databricks cluster install com.microsoft.azure:spark-mssql-connector_2.12_3.0:1.0.0-alpha from Maven and adal from PyPI.

enter image description here

In the key vault, generate secrets that represent the values from the app registration.

The Client Id and Client Secret

Create a secret scope to your Key Vault for your client Id, secret, and tenant Id will be generated.

Learn More how to create Secretscope in databricks

Step 2

Create user (SPN) in the database and then grant permissions on the objects.

CREATE USER [thedataswamp-dbr-dev] FROM EXTERNAL PROVIDER  WITH DEFAULT_SCHEMA=[dbo]
GO
GRANT SELECT ON SCHEMA :: dbo TO [thedataswamp-dbr-dev];

Table:

CREATE TABLE Emperor 
(
    Id INT,
    Emperor NVARCHAR(25)
)

INSERT INTO dbo.Emperor (Id, Emperor)
VALUES (1, 'Augustus'),
       (2, 'Tiberius'),
       (3, 'Caligula'),
       (4, 'Claudius'),
       (4, 'Nero');

Step 3 - connecting to Azure SQL database

Retrieving service principal, secret, and tenant id from Key Vault. The database URL, database, and table

import adal
resource_app_id_url = "https://database.windows.net/"
service_principal_id = dbutils.secrets.get(scope = "defaultScope", key = "DatabricksSpnId")
service_principal_secret = dbutils.secrets.get(scope = "defaultScope", key = "DatabricksSpnSecret")
tenant_id = dbutils.secrets.get(scope = "defaultScope", key = "TenantId")
authority = "https://login.windows.net/" + tenant_id
azure_sql_url = "jdbc:sqlserver://thedataswamp.database.windows.net"
database_name = "dev"
db_table = "dbo.Emperor" 
encrypt = "true"
host_name_in_certificate = "*.database.windows.net"
context = adal.AuthenticationContext(authority)
token = context.acquire_token_with_client_credentials(resource_app_id_url, service_principal_id, service_principal_secret)
access_token = token["accessToken"]
emperorDf = spark.read \
             .format("com.microsoft.sqlserver.jdbc.spark") \
             .option("url", azure_sql_url) \
             .option("dbtable", db_table) \
             .option("databaseName", database_name) \
             .option("accessToken", access_token) \
             .option("encrypt", "true") \
             .option("hostNameInCertificate", "*.database.windows.net") \
             .load()
             display(emperorDf) 

enter image description here

enter image description here

  • This code performs connection to Azure SQL retrieves an authentication token from Active Directory application is registered.
  • Next it creates a df containing emperors using the configuration and authentication token retrieved
  • Displays df containing emperors