DatasetGenerationError: An error occurred while generating the dataset

1.3k Views Asked by At

Im trying to load my Publaynet dataset from s3 bucket to data bricks using huggingface datasets like this:

dataset_id = "/dbfs/mnt/ocr/dataset/publaynet"
dataset = load_dataset(dataset_id, data_files={"train": "/dbfs/mnt/ocr/dataset/publaynet/train.json", "validation": "/dbfs/mnt/ocr/dataset/publaynet/val.json"}, split="train", cache_dir="./cache")

My S3 bucket is in formal like below screenshot:

enter image description here

Im getting this error in databricks:

enter image description here

0

There are 0 best solutions below