I have Apache Spark 3.2.1 docker container running and got the below code. 3.2.1 version includes pandas. So I have changed the import line as "from pyspark import pandas as ps" but still I am getting the error

 root@140f39049f6e:/opt/spark/work-dir# python3 extract.py
Traceback (most recent call last):
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 27, in require_minimum_pandas_version
ModuleNotFoundError: No module named 'pandas'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/spark/work-dir/extract.py", line 4, in <module>
    from pyspark import pandas as ps
  File "<frozen zipimport>", line 259, in load_module
  File "/opt/spark/python/lib/pyspark.zip/pyspark/pandas/__init__.py", line 31, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/pandas/utils.py", line 33, in require_minimum_pandas_version
ImportError: Pandas >= 0.23.2 must be installed; however, it was not found.
root@140f39049f6e:/opt/spark/work-dir# ^C
root@140f39049f6e:/opt/spark/work-dir#

----------------- my code imports--------------

 import mysql.connector
   from pyspark import pandas as ps
   from pyspark.sql import SparkSession
   from datetime import datetime

======================================================

ENV is below

JAVA_HOME=/usr/local/openjdk-11
PWD=/opt/spark/work-dir
HOME=/root
LANG=C.UTF-8
PYTHONPATH=/opt/spark/python/lib/py4j-0.10.9.3-src.zip:/opt/spark/python/lib/pyspark.zip:
TERM=xterm
SHLVL=1
SPARK_HOME=/opt/spark
PATH=/opt/spark/bin:/usr/local/openjdk-11/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
OLDPWD=/opt/spark
JAVA_VERSION=11.0.14.1
_=/usr/bin/env

Can you please hep on this? Thanks

0

There are 0 best solutions below