I'm trying to use Spark Connect to create a Spark session on a remote Spark cluster with pyspark in Python 3.12:
ingress_ep = "..."
access_token = "..."
conn_string = f"sc://{ingress_ep}/;token={access_token}"
spark = SparkSession.builder.remote(conn_string).getOrCreate()
When running this I get a ModuleNotFoundError message:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[13], line 11
9 conn_string = f"sc://{ingress_ep}/;token={access_token}"
10 print(conn_string)
---> 11 spark = SparkSession.builder.remote(conn_string).getOrCreate()
File c:\Users\...\venv2\Lib\site-packages\pyspark\sql\session.py:464, in SparkSession.Builder.getOrCreate(self)
458 if (
459 "SPARK_CONNECT_MODE_ENABLED" in os.environ
460 or "SPARK_REMOTE" in os.environ
461 or "spark.remote" in opts
462 ):
463 with SparkContext._lock:
--> 464 from pyspark.sql.connect.session import SparkSession as RemoteSparkSession
466 if (
467 SparkContext._active_spark_context is None
468 and SparkSession._instantiatedSession is None
469 ):
470 url = opts.get("spark.remote", os.environ.get("SPARK_REMOTE"))
File c:\Users\...\venv2\Lib\site-packages\pyspark\sql\connect\session.py:19
1 #
2 # Licensed to the Apache Software Foundation (ASF) under one or more
3 # contributor license agreements. See the NOTICE file distributed with
...
---> 24 from distutils.version import LooseVersion
26 try:
27 import pandas
ModuleNotFoundError: No module named 'distutils'
I'm aware that that the distuils module has been removed from Python 3.12. So I have installed setuptools and set SETUPTOOLS_USE_DISTUTILS='local' as suggested in Why did I got an error ModuleNotFoundError: No module named 'distutils'? and No module named 'distutils' despite setuptools installed but I'm still getting the error.
Going back to an older version of Python is not an option for me. Am I missing something? How can I get this to work?
You probably need to
import setuptoolsbefore any attempt of importingdistutils.The long answer is that
setuptoolsemploys aMetaPathFinderto tell Python how to locatedistutils. ThisMetaPathFinderis only added tosys.meta_pathwhensetuptoolsis imported.This might be something to report to the library developers.
If the workaround described above still does not work, there might be another dependency that is trying to explicitly disable this
MetaPathFinder.