how to load to Colab a split of common_voice "spanish" dataset from HF?

24 Views Asked by At

I am trying to load the dataset "spanish" from Colab in just a 10% since it is too large, however it is still downloading the complete dataset from HuggingFace. I have tried two ways, by percentage or by slicing, none of them worked, it is still downloading the whole dataset, so that it brokes. How to solve this?

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10%]", use_auth_token=True)

Not even downloading only 10 rows worked:

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10]", use_auth_token=True)

I also tried:

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es",
                                     split=ReadInstruction('train', to=10, unit='%'))

I tried to download a small slice of a dataset

0

There are 0 best solutions below