I have Apache Spark 3.2.1 docker container running and got the below code. 3.2.1 version includes pandas. So I have changed the import line as "from pyspark import pandas as ps
" but still I am getting the error
root@140f39049f6e:/opt/spark/work-dir# python3 extract.py
Traceback (most recent call last):
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 27, in require_minimum_pandas_version
ModuleNotFoundError: No module named 'pandas'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/spark/work-dir/extract.py", line 4, in <module>
from pyspark import pandas as ps
File "<frozen zipimport>", line 259, in load_module
File "/opt/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py", line 31, in <module>
File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 33, in require_minimum_pandas_version
ImportError: Pandas >= 0.23.2 must be installed; however, it was not found.
root@140f39049f6e:/opt/spark/work-dir# ^C
root@140f39049f6e:/opt/spark/work-dir#
----------------- my code imports--------------
import mysql.connector
from pyspark import pandas as ps
from pyspark.sql import SparkSession
from datetime import datetime
======================================================
ENV is below
JAVA_HOME=/usr/local/openjdk-11
PWD=/opt/spark/work-dir
HOME=/root
LANG=C.UTF-8
PYTHONPATH=/opt/spark/python/lib/py4j-0.10.9.3-src.zip:/opt/spark/python/lib/pyspark.zip:
TERM=xterm
SHLVL=1
SPARK_HOME=/opt/spark
PATH=/opt/spark/bin:/usr/local/openjdk-11/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
OLDPWD=/opt/spark
JAVA_VERSION=11.0.14.1
_=/usr/bin/env
Can you please hep on this? Thanks