Running Pyspark locally to access parquet file in S3 Error: "Unable to load AWS credentials from any provider in the chain"

793 Views Asked by DataWrangler At 01 August 2025 at 16:07

I am trying to access the parquet file that's available in S3 bucket using Pyspark local via Pycharm. I have the AWS toolkit configured in Pycharm and I have the access key and security key added in my ~/.aws/credentials yet I see the credentials are not getting accessed. Which throws me the error "Unable to load AWS credentials from any provider in the chain"

import os
import pyspark
from pyspark.sql import SparkSession


os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk-pom:1.10.34,org.apache.hadoop:hadoop-aws:2.7.3 pyspark-shell'

spark = SparkSession.builder\
            .appName('Pyspark').getOrCreate()

my_df = spark.read.\
    parquet("s3a://<parquet_file_location>") --Using s3 gives me no file system error

my_df.printSchema()

Is there any alternative approach to try Pyspark locally and access the AWS resources.

Also I should be able to use s3 in parquet path but that seems to throw an error with file system not found. Does any dependency or jar file needs to be added for running the Pyspark locally

Original Q&A

There are 1 best solutions below

stevel On 09 September 2020 at 10:47

if you set the secrets in AWS_ env vars they will be picked up, and then propagated with the job. Otherwise you can set them in spark-defaults.conf with the appropriate spark.hadoop.fs.s3a.access.key and spark.hadoop.fs.s3a.secret.key.

Running Pyspark locally to access parquet file in S3 Error: "Unable to load AWS credentials from any provider in the chain"

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in AMAZON-WEB-SERVICES

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in AWS-TOOLKIT

Trending Questions

Popular # Hahtags

Popular Questions