I am building an application using two docker containers in the same network:
- mcr.microsoft.com/azure-storage/azurite
- jupyter/pyspark-notebook
Here is my docker-compose file:
version: "3.9"
services:
azurite:
image: mcr.microsoft.com/azure-storage/azurite:latest
ports:
- "10000:10000"
- "10001:10001"
- "10002:10002"
volumes:
- azurite_volume:/data
pyspark:
image: jupyter/pyspark-notebook:latest
ports:
- 10003:8888
user: root
working_dir: /home/${NB_USER}
environment:
- NB_USER=${NB_USER}
- CHOWN_HOME=yes
- GRANT_SUDO=yes
command: start-notebook.sh --NotebookApp.password="" --NotebookApp.token=""
volumes:
- /my/local/folder:/home/${NB_USER}/work
volumes:
azurite_volume:
driver: local
from the jupyter notebook I am trying to connect to and read data from azurite. Here is my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName('test') \
.config(
'fs.azure.account.key.devstoreaccount1.blob.core.windows.net',
'Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==') \
.getOrCreate()
df = spark.read.json('wasb://my-container@devstoreaccount1/path/to/file.json')
However, this code returns an error:
org.apache.hadoop.fs.azure.AzureException: org.apache.hadoop.fs.azure.AzureException: Unable to access container bronze in account devstoreaccount1 using anonymous credentials, and no credentials found for them in the configuration.
The container in azurite has already been set to "public" although it wouldn`t be necessary because I am providing the credential in the spark config. Even though, the error tells me that I am using anonymous credentials...
I am probably setting the credentials wrongly but I couldn't find anywhere how to set them properly.
How can I set up the credentials to be able to read from azurite using pyspark?