connection error when accessing deltatable in s3 using delta-rs python deltalake pkg

1.7k Views Asked by At

I am trying to read a delta table in s3 using delta-rs deltalake lib , but to access the s3 I need to pass aws_access_key_id , aws_secret_access_key and a ssl certificate. with below code(without using a certificate)

from deltalake import DeltaTable
from pyarrow import fs

storage_options = {"AWS_ACCESS_KEY_ID": "aws_my_key",
                   "AWS_SECRET_ACCESS_KEY": "my_secret",
                   "AWS_ENDPOINT_URL": "https://domain.company.com"}


url = 's3://bucket/sales_data/'

raw_fs, normalized_path = fs.FileSystem.from_uri(url)
filesystem = fs.SubTreeFileSystem(normalized_path, raw_fs)

dt = DeltaTable(url, storage_options=storage_options)
ds = dt.to_pyarrow_dataset(filesystem=filesystem)

I am facing following error

deltalake.PyDeltaTableError: Failed to load checkpoint: Failed to read checkpoint content: Generic S3 error: Error performing get request sales_data/_delta_log/_last_checkpoint: response error "request error", after 0 retries: error sending request for url (https://domain.company.com/sales_data/_delta_log/_last_checkpoint): error trying to connect: The certificate was not trusted.

I have checked the delta-rs package and the documentation, but could not able to find a way to pass a certificate to it. From error message I guess that it is checking some default path for certificate, but no mention of the path in the error message.

I wonder is there a way to pass a certificate to it during initialization.

1

There are 1 best solutions below

0
On

Passing AWS credentials should work in v0.6.0 of delta-rs and newer, documentation [here]:

from deltalake import DeltaTable

storage_options = {"AWS_ACCESS_KEY_ID": "...", "AWS_SECRET_ACCESS_KEY": "..."}
dt = DeltaTable("s3://<bucket>/<path>", storage_options=storage_options)
dt.to_pyarrow_table().to_pydict()