According to How to you write polars data frames to Azure blob storage?,
we can write parquet using polars directly on Azure Storage such as basic storage containers.
In my case I was required to write in Delta format, which stands on top of parquet, so I modified the code a bit since polars also supports delta
import adlfs
import polars as pl
from azure.identity.aio import DefaultAzureCredential
# pdf: pl.DataFrame
# path: str
# account_name: str
# container_name: str
credential = DefaultAzureCredential()
fs = adlfs.AzureBlobFileSystem(account_name=account_name, credential=credential)
with fs.open(f"{container_name}/way/to/{path}", mode="wb") as f:
if path.endswith(".parquet"):
pdf.write_parquet(f)
else:
pdf.write_delta(f, mode="append")
Using this code, I was able to write on the Azure filesystem when I specified a path = path/to/1.parquet but not path = path/to/delta_folder/.
In the second case, my problem was only a 0 byte file was written to delta_folder on Azure storage, f being a file pointer.
What's more, If I just use the local filesystem using pdf.write_delta(path, mode="append") it just works.
How can I modify my code to support writing recursively in the delta_folder/ in the cloud?
The issue is that delta wants a folder to write to (potentially) multiple files so
fsspec's model of opening one file at a time isn't going to work.You'll need to do something like
See here for the key fields that are compatible with
credentials_dict