Spark Session With Multiple s3 Roles

1.1k Views Asked by ehlJay At 08 June 2025 at 02:27

I have a Spark job that reads files from an s3 bucket, formats them, and places them in another s3 bucket. I'm using the (SparkSession) spark.read.csv and spark.write.csv functionality to accomplish this

When I read the files, I need to use one IAM role (assume role), and when I write the files, need to drop the assumed role and revert to my default role.

Is this possible within the same spark session? And if not, is there another way to do this?

Any and all help is appreciated!

Original Q&A

There are 1 best solutions below

stevel On 24 September 2020 at 13:17

For the S3A connector in Hadoop 2.8+, the S3A connector supports per-bucket settings, so you have different login options for different buckets

At some point (maybe around then, very much by hadoop 3) the AssumedRoleCredentialProvider takes a set of full credentials and calls AssumeRole for a given role ARN, so interacts with s3 under that role instead.

should be matter of

Make sure your hadoop-jars are recent
set the base settings with your full login
per bucket setting for the source bucket to use the assumed role credential provider with the chosen arn
make sure things work from the hadoop command line before trying to get submitted jobs to work.
then submit the job.

Spark Session With Multiple s3 Roles

There are 1 best solutions below

Related Questions in APACHE-SPARK

Related Questions in AMAZON-S3

Related Questions in PYSPARK

Related Questions in ASSUME-ROLE

Trending Questions

Popular # Hahtags

Popular Questions