I want my Spark app (Scala) to be able to read S3 files
spark.read.parquet("s3://my-bucket-name/my-object-key")
On my dev machine, I could access S3 files using awscli a pre-configured profile in ~/.aws/config or ~/.aws/credentials, like:
aws --profile my-profile s3 ls s3://my-bucket-name/my-object-key
But when trying to read those files from Spark, with the aws_profile provided as an env variable (AWS_PROFILE), I got the following error:
doesBucketExist on my-bucket-name: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: Unable to load credentials from service endpoint
Also tried to provide the profile as a JVM option (-Daws.profile=my-profile), with no luck.
Thanks for reading.
                        
The solution is to provide the spark property:
fs.s3a.aws.credentials.provider, setting it tocom.amazonaws.auth.profile.ProfileCredentialsProvider. If I could change the code to build the Spark Session, then something like:The other way is to provide the JVM option
-Dspark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.profile.ProfileCredentialsProvider.*NOTE the prefix
spark.hadoop