How to authenticate to Azurite using pyspark?

301 Views Asked by At

I am building an application using two docker containers in the same network:

  • mcr.microsoft.com/azure-storage/azurite
  • jupyter/pyspark-notebook

Here is my docker-compose file:

version: "3.9"

services:
  azurite:
    image: mcr.microsoft.com/azure-storage/azurite:latest
    ports:
      - "10000:10000"
      - "10001:10001"
      - "10002:10002"
    volumes:
      - azurite_volume:/data
  
  pyspark:
    image: jupyter/pyspark-notebook:latest
    ports:
      - 10003:8888
    user: root
    working_dir: /home/${NB_USER}
    environment:
      - NB_USER=${NB_USER}
      - CHOWN_HOME=yes
      - GRANT_SUDO=yes
    command: start-notebook.sh --NotebookApp.password="" --NotebookApp.token=""
    volumes:
      - /my/local/folder:/home/${NB_USER}/work

volumes:
  azurite_volume:
    driver: local

from the jupyter notebook I am trying to connect to and read data from azurite. Here is my code:

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName('test') \
    .config(
        'fs.azure.account.key.devstoreaccount1.blob.core.windows.net',
        'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==') \
    .getOrCreate()

df = spark.read.json('wasb://my-container@devstoreaccount1/path/to/file.json')

However, this code returns an error:

 org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Unable to access container bronze in account devstoreaccount1 using anonymous credentials, and no credentials found for them  in the configuration.

The container in azurite has already been set to "public" although it wouldn`t be necessary because I am providing the credential in the spark config. Even though, the error tells me that I am using anonymous credentials...

I am probably setting the credentials wrongly but I couldn't find anywhere how to set them properly.

How can I set up the credentials to be able to read from azurite using pyspark?

0

There are 0 best solutions below