How does importing a module in another file in a Databricks Repos work?

2.2k Views Asked by At

I am using databricks repos

I have two files, My function in a file called func.py in another folder called folder1

def lower_events(df):

 return df.withColumn("event",f.lower(f.col("event")))

My main notebook in which I am calling the lower_events

import pyspark.sql.functions as f
from pyspark.sql.functions import udf, col, lower
import sys
 
sys.path.append("..")
from folder1 import func
 
df_clean = func.lower_events(df)

This returns an error

NameError: name 'f' is not defined

But this is working

def lower_events(df):

 import pyspark.sql.functions as f
 from pyspark.sql.functions import col, when

 return df.withColumn("event",f.lower(f.col("event")))
1

There are 1 best solutions below

0
On BEST ANSWER

The error is correct as each individual Python module has its own imports and doesn't refer to the imports done in the main module or other modules (see Python docs for more details).

So your func.py should contain imports somewhere - not necessary in the function itself, it could be in the top-level of the file:

import pyspark.sql.functions as f
from pyspark.sql.functions import col, when

def lower_events(df):
 return df.withColumn("event",f.lower(f.col("event")))

P.S. You also may not need sys.path.append("..") - Databricks Repos will automatically add root of the repository to the sys.path.