I have tried with p12 keyfile, it is successfully working and I was able to fetch data from gcs bucket. But with json keyfile sparksession is not getting the json config values. Instead, It is going for default metadata. I am using maven and IntelliJ for development. Below is the code snippet
def main(args: Array[String]): Unit = {
System.out.println("hello gcp connect")
System.setProperty("hadoop.home.dir", "C:/hadoop/")
val sparkSession =
SparkSession.builder()
.appName("my first project")
.master("local[*]")
.config("spark.hadoop.fs.gs.project.id", "shaped-radius-297301")
.config("spark.hadoop.fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
.config("spark.hadoop.fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
.config("spark.hadoop.google.cloud.project.id", "shaped-radius-297301")
.config("spark.hadoop.google.cloud.auth.service.account.enable", "true")
.config("spark.hadoop.google.cloud.auth.service.account.email", "[email protected]")
.config("spark.hadoop.google.cloud.service.account.json.keyfile", "C:/Users/shaped-radius-297301-5bf673d7f0d2.json")
.getOrCreate()
sparkSession.sparkContext.addFile("gs://test_bucket/sample1.csv")
sparkSession.read.csv(SparkFiles.get("sample1.csv")).show()
You need to work on your configurations. From the image you provided, it looks like your service account email and service account key are not correct. Please make sure that you are using a correct service account email with Cloud Storage Admin role on IAM for example:
And the path of your service account key should be a directory that can be seen by your config, the "path to json" should be a directory where your key is currently located.
Also, make sure that you are using a bucket that exists on your project or else you'll get errors like "bucket does not exist" or "access denied".
UPDATE
OP updated the question, refer to this link. It is possible that
GOOGLE_APPLICATION_CREDENTIALS
is pointing to the wrong location, or may not have right IAM permissions.