EMR Serverless Airflow Operator not allowing EMR custom images

658 Views Asked by At

I want to launch a Spark job on EMR Serverless from Airflow. I want to use Spark 3.3.0 and Scala 2.13 but the 6.9.0 EMR Release ships with Scala 2.12. I created a FAT jar including all Spark dependencies and it won't work either. As an alternative, I am trying to use an EMR custom image by creating an application using --image-configuration with the Airflow operator but it won't just pass through all the arguments from the boto API.

    create_app = EmrServerlessCreateApplicationOperator(
        task_id="create_my_app",
        job_type="SPARK",
        release_label="emr-6.9.0",
        config={"name": "data-ingestion",
                "imageConfiguration": {
                    "imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"}})

Airflow gives the following error message:

Unknown parameter in input: "imageConfiguration", must be one of:
name, releaseLabel, type, clientToken, initialCapacity, maximumCapacity, tags, autoStartConfiguration, autoStopConfiguration, networkConfiguration

This other config won't work either:

config={"name": "data-ingestion",
        "imageUri": "xxxxxxx.dkr.ecr.eu-west-1.amazonaws.com/emr-custom-image:0.0.1"})

Does anybody have any ideas other than downgrading my Scala version?

1

There are 1 best solutions below

1
On

Airflow operator passes the argument to the boto3 client, and this client create the application.

The configuration imageConfiguration is added to boto3 client in 1.26.44 (PR), and the other configuration are added in different version (please check the changelog).

So you can try to upgrade the version of boto3 in you Airflow server, provided that it is compatible with the others dependencies, and if not, you may need to upgrade your Airflow version.