I have a Spark job that reads files from an s3 bucket, formats them, and places them in another s3 bucket. I'm using the (SparkSession) spark.read.csv and spark.write.csv functionality to accomplish this
When I read the files, I need to use one IAM role (assume role), and when I write the files, need to drop the assumed role and revert to my default role.
Is this possible within the same spark session? And if not, is there another way to do this?
Any and all help is appreciated!
For the S3A connector in Hadoop 2.8+, the S3A connector supports per-bucket settings, so you have different login options for different buckets
At some point (maybe around then, very much by hadoop 3) the AssumedRoleCredentialProvider takes a set of full credentials and calls AssumeRole for a given role ARN, so interacts with s3 under that role instead.
should be matter of