How to debug a Pyspark script on EMR (EC2) using Pycharm?

27 Views Asked by growclip At 15 March 2024 at 21:55

I am trying to "debug" a pyspark script on EMR(EC2) cluster (v7.0.0), by stepping through the code using Pycharm Professional.

The script lives on the Master node of the EMR cluster, and is being run on yarn.

sc_conf = SparkConf()
sc_conf.setAppName(app_name)
sc_conf.setMaster('yarn')

Using a conda-installed pyspark (in an environment), I can step through the initial part of my code (creating the SparkContext etc.), right up to the point where it starts to read data from S3, where this error is received:

com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found

I guess this is related to: com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found on PySpark script on AWS EMR

However, removing the self-installed spark, pyspark is not found anymore and the code exits with the first import.

from pyspark import SparkConf, SparkContext

Is there a way to solve this? or to sort out the configurations?

Original Q&A

How to debug a Pyspark script on EMR (EC2) using Pycharm?

There are 0 best solutions below

Related Questions in PYSPARK

Related Questions in PYCHARM

Related Questions in CONDA

Related Questions in AMAZON-EMR

Trending Questions

Popular # Hahtags

Popular Questions