In a Delta Live Table in Azure Databricks, how can I import a module defined in an adjacent directory?

323 Views Asked by At

There is a module in a folder one level up

  • root
    • mylibs
      • mylib.py
    • pipelines
      • mypipeline.py

mypipeline defines a delta live table. How can I import the mylib module?

The problem is that I cannot get the directory of mypipeline.py since all commands are returning the current working directory.

1

There are 1 best solutions below

2
On

There is a difference when the code is running from a Repo vs. code is running from a notebook in a workspace:

  • when you use Repo, then the Python's sys.path is automatically populated with two entries: current directory and root of the Repo
  • when you use a notebook in a workspace, only the current directory is added to sys.path

So in you case you will need to have some code in your DLT notebook to add a directory with your package into the sys.path. There are two approaches to that:

  • detect the current directory automatically and manipulate it to point to mylibs. Something like this:
import sys
import os

pipelines_dir_name = "pipelines"
mypipelines_path = [p for p in sys.path
   if p.startswith("/Workspace/") and p.endswith(pipelines_dir_name)]
if len(mypipelines_path) == 0:
  raise Exception("Can't find pipeline directory in sys.path")
if len(mypipelines_path) > 1:
  raise Exception("There are multiple matches")
os.path.abspath(os.path.join(mypipelines_path[0], "..", "mylibs"))
  • explicitly pass the path to mylibs via DLT settings and just do:
sys.path.append(spark.conf.get("mylibs-path-setting"))