executing a notebook in Databricks Azure env, that imports the function
from pyspark.sql import SparkSession
from pyspark.dbutils import DBUtils
def myfunc(..., spark: Sparksession):
dbutils = DBUtils(spark)
for file in dbutils.fs.ls(folder):
...
from a python file results in the error
ModuleNotFoundError: No module named 'pyspark.dbutils
How to solve issues like this?
Thanks!
On Databricks Repos, when you're working in your notebook, you automatically have access to
spark
anddbutils
, but you won't have access to that in your modules.You need to pass
dbutils
explicitly into your Python modules unless you abstract the process of obtainingdbutils
into a dedicated function.I came across this on another answer on Stack Overflow
with that, you can modify your code to work like this: