Importing dbutils package in python module on databricks

952 Views Asked by At

executing a notebook in Databricks Azure env, that imports the function

from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils

def myfunc(..., spark: Sparksession):

    dbutils = DBUtils(spark)

    for file in dbutils.fs.ls(folder):
    ...

from a python file results in the error

ModuleNotFoundError: No module named 'pyspark.dbutils

How to solve issues like this?

Thanks!

1

There are 1 best solutions below

0
On

On Databricks Repos, when you're working in your notebook, you automatically have access to spark and dbutils, but you won't have access to that in your modules.

You need to pass dbutils explicitly into your Python modules unless you abstract the process of obtaining dbutils into a dedicated function.

I came across this on another answer on Stack Overflow

from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils

def get_dbutils():
    spark = SparkSession.builder.getOrCreate()
    
    if spark.conf.get("spark.databricks.service.client.enabled") == "true":
        return DBUtils(spark)
    
    try:
        import IPython
        return IPython.get_ipython().user_ns["dbutils"]
    except ImportError:
        raise ImportError("IPython is not available. Make sure you're not in a non-IPython environment.")

with that, you can modify your code to work like this:

def myfunc(...):
    dbutils = get_dbutils()
    for file in dbutils.fs.ls(folder):