Mount Azure Storage Container to Databricks Workspace / Notebook results in AttributeError

1.9k Views Asked by At

I'm trying to mount an Azure Blob Storage Container to a Databricks workbook using a Key Vault-backed secret scope.

Setup:

  1. Created a Key Vault
  2. Created a Secret in Key Vault
  3. Created a Databricks Secret Scope
  • This is known-good.
    • Running dbutils.secrets.get(scope = dbrick_secret_scope, key = dbrick_secret_name) results in no errors
    • Viewing the secret in Databricks results in [REDACTED]

Cell in Databricks:

%python

dbrick_secret_scope = "dbricks_kv_dev"
dbrick_secret_name = "scrt-account-key"

storage_account_key = dbutils.secrets.get(scope = dbrick_secret_scope, key = dbrick_secret_name)
storage_container = 'abc-test'
storage_account = 'stgdev'

dbutils.fs.mount(
    source = f'abfss://{storage_container}@{storage_account}.dfs.core.windows.net/',
    mount_point = f'/mnt/{storage_account}',
    extra_configs = {f'fs.azure.accountkey.{storage_account}.dfs.core.windows.net:{storage_account_key}'}
)

Results:

  • Error: AttributeError: 'set' object has no attribute 'keys' with the mount_point line of dbutils.fs.mount() highlighted in red.
  • Full error:
AttributeError                            Traceback (most recent call last)
<command-3166320686381550> in <module>
      9     source = f'abfss://{storage_container}@{storage_account}.dfs.core.windows.net/',
     10     mount_point = f'/mnt/{storage_account}',
---> 11     extra_configs = {f'fs.azure.accountkey.{storage_account}.dfs.core.windows.net:{storage_account_key}'}
     12 )

/local_disk0/tmp/1625601199293-0/dbutils.py in f_with_exception_handling(*args, **kwargs)
    298             def f_with_exception_handling(*args, **kwargs):
    299                 try:
--> 300                     return f(*args, **kwargs)
    301                 except Py4JJavaError as e:
    302                     class ExecutionError(Exception):

/local_disk0/tmp/1625601199293-0/dbutils.py in mount(self, source, mount_point, encryption_type, owner, extra_configs)
    389                 self.check_types([(owner, string_types)])
    390             java_extra_configs = \
--> 391                 MapConverter().convert(extra_configs, self.sc._jvm._gateway_client)
    392             return self.print_return(self.dbcore.mount(source, mount_point,
    393                                                        encryption_type, owner,

/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py in convert(self, object, gateway_client)
    520         HashMap = JavaClass("java.util.HashMap", gateway_client)
    521         java_map = HashMap()
--> 522         for key in object.keys():
    523             java_map[key] = object[key]
    524         return java_map

AttributeError: 'set' object has no attribute 'keys'

Appears to be related to the extra_configs parameters, but I'm not exactly sure what. Can anyone see what I'm missing?

1

There are 1 best solutions below

2
On

Real error in your case is that you need to provide dictionary as extra_configs argument, but you're providing the set: {f'fs.azure.accountkey.{storage_account}.dfs.core.windows.net:{storage_account_key}'} - this happens because you don't have correct syntax (two ' are missing). Correct syntax will be: {f'fs.azure.accountkey.{storage_account}.dfs.core.windows.net':storage_account_key}

But really you can't mount with abfss protocol by using the storage account key - it's only supported for mounting with wasbs protocol. For abfss you must use service principal, and provide it's ID & secret, like this (see documentation):

configs = {"fs.azure.account.auth.type": "OAuth",
          "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
          "fs.azure.account.oauth2.client.id": "<application-id>",
          "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
          "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<directory-id>/oauth2/token"}

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/<mount-name>",
  extra_configs = configs)

And although you theoretically can mount the ADLS Gen2 storage using the wasbs protocol and storage key, it's not recommended as you can get problems (I hit that personally). Also, it's not recommended to use storage keys, it's better to use Shared Access Signature instead - it's more secure.