Docker container on EMR

816 Views Asked by At

I am trying to run my python container on emr with a main.py, using spark-submit --master yarn --deploy-mode cluster --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.executorEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_TYPE=docker --conf spark.yarn.appMasterEnv.YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=$DOCKER_IMAGE_NAME --num-executors 2 main.py -v command

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("docker-numpy").getOrCreate()
sc = spark.sparkContext
import numpy as np
a = np.arange(15).reshape(3, 5)
print(a)
import sys
print("Python version")
print (sys.version)
print("Version info.")
print (sys.version_info)

This is my main.py and Dockerfile

FROM amazoncorretto:8
RUN yum -y update
RUN yum -y install yum-utils
RUN yum -y groupinstall development
RUN yum list python3*
RUN yum -y install python3 python3-dev python3-pip python3-virtualenv
RUN python -VRUN python3 -V
ENV PYSPARK_DRIVER_PYTHON python3
ENV PYSPARK_PYTHON python3
RUN pip3 install --upgrade pip
RUN pip3 install numpy pandas
RUN python3 -c "import numpy as np"

the execution result is giving me python version 3.7.8 which is not the python version of the container but the python version of my machine, I also tried to import pandas in code but got an error, I am not sure why it's not using my docker images environment

0

There are 0 best solutions below