Writing in delta using Polars and adlfs

129 Views Asked by Michel Hua At 14 February 2024 at 18:25

According to How to you write polars data frames to Azure blob storage?, we can write parquet using polars directly on Azure Storage such as basic storage containers.

In my case I was required to write in Delta format, which stands on top of parquet, so I modified the code a bit since polars also supports delta

import adlfs
import polars as pl
from azure.identity.aio import DefaultAzureCredential

# pdf: pl.DataFrame
# path: str
# account_name: str
# container_name: str

credential = DefaultAzureCredential()
fs = adlfs.AzureBlobFileSystem(account_name=account_name, credential=credential)

with fs.open(f"{container_name}/way/to/{path}", mode="wb") as f:
    if path.endswith(".parquet"):
        pdf.write_parquet(f)
    else:
        pdf.write_delta(f, mode="append")

Using this code, I was able to write on the Azure filesystem when I specified a path = path/to/1.parquet but not path = path/to/delta_folder/.

In the second case, my problem was only a 0 byte file was written to delta_folder on Azure storage, f being a file pointer.

What's more, If I just use the local filesystem using pdf.write_delta(path, mode="append") it just works.

How can I modify my code to support writing recursively in the delta_folder/ in the cloud?

Original Q&A

There are 1 best solutions below

Dean MacGregor On 14 February 2024 at 19:58 BEST ANSWER

The issue is that delta wants a folder to write to (potentially) multiple files so fsspec's model of opening one file at a time isn't going to work.

You'll need to do something like

credential = DefaultAzureCredential()
credentials_dict = {} #objectstore syntax see link below
if path.endswith(".parquet"):
    with fs.open(f"{container_name}/way/to/{path}", mode="wb") as f:
        pdf.write_parquet(f)
else:
    pdf.write_delta(
         f"abfs://{container_name}/way/to/", 
         mode="append", 
         storage_options = credentials_dict
    )

See here for the key fields that are compatible with credentials_dict

Writing in delta using Polars and adlfs

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in AZURE-STORAGE

Related Questions in PYTHON-POLARS

Related Questions in FSSPEC

Trending Questions

Popular # Hahtags

Popular Questions