how to load to Colab a split of common_voice "spanish" dataset from HF?

48 Views Asked by Carlos Axel García Vega At 27 November 2025 at 00:39

I am trying to load the dataset "spanish" from Colab in just a 10% since it is too large, however it is still downloading the complete dataset from HuggingFace. I have tried two ways, by percentage or by slicing, none of them worked, it is still downloading the whole dataset, so that it brokes. How to solve this?

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10%]", use_auth_token=True)

Not even downloading only 10 rows worked:

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10]", use_auth_token=True)

I also tried:

common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es",
                                     split=ReadInstruction('train', to=10, unit='%'))

I tried to download a small slice of a dataset

Original Q&A

how to load to Colab a split of common_voice "spanish" dataset from HF?

There are 0 best solutions below

Related Questions in GOOGLE-COLABORATORY

Related Questions in HUGGINGFACE

Related Questions in HUGGINGFACE-DATASETS

Related Questions in AUTOMATIC-SPEECH-RECOGNITION

Trending Questions

Popular # Hahtags

Popular Questions