Task Result Loss when running emr-serverless spark job in VPC

139 Views Asked by At

I am receiving the error while running my emr-serverless pyspark sql code:

ERROR:root:An error occurred while calling o221.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 1 times, most recent failure: Lost task 0.0 in stage 53.0 (TID 176) (ip-10-1-20-165.ec2.internal executor driver): TaskResultLost (result lost from block manager)

I don't see this issue when run outside a vpc, but I do when I run in a vpc. When I run in a vpc with a small amount of rows (< 10k) I dont receive the error either.

I use sqlspark for some operations as well as dataframe functions. I parition the datadfp = df.repartition(200, "vehicle_id").

I start with the following configuration although emr-serverless should scale:

InitialCapacity:
        - Key: DRIVER
          Value:
            WorkerCount: 2
            WorkerConfiguration:
              Cpu: 16vCPU
              Memory: "64GB"
              Disk: "200GB"
        - Key: EXECUTOR
          Value:
            WorkerCount: 5
            WorkerConfiguration:
              Cpu: 16vCPU
              Memory: "64GB"
              Disk: "200GB"

Im expecting this code to work, I've run the same code previously in a provisioned emr container using the same vpc without issues.

0

There are 0 best solutions below