executing a notebook in Databricks Azure env, that imports the function
from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils
def myfunc(..., spark: Sparksession):
dbutils = DBUtils(spark)
for file in dbutils.fs.ls(folder):
...
from a python file results in the error
ModuleNotFoundError: No module named 'pyspark.dbutils
How to solve issues like this?
Thanks!
On Databricks Repos, when you're working in your notebook, you automatically have access to
sparkanddbutils, but you won't have access to that in your modules.You need to pass
dbutilsexplicitly into your Python modules unless you abstract the process of obtainingdbutilsinto a dedicated function.I came across this on another answer on Stack Overflow
with that, you can modify your code to work like this: