I am trying to load the dataset "spanish" from Colab in just a 10% since it is too large, however it is still downloading the complete dataset from HuggingFace. I have tried two ways, by percentage or by slicing, none of them worked, it is still downloading the whole dataset, so that it brokes. How to solve this?
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10%]", use_auth_token=True)
Not even downloading only 10 rows worked:
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es", split="train[:10]", use_auth_token=True)
I also tried:
common_voice["train"] = load_dataset("mozilla-foundation/common_voice_16_1", "es",
split=ReadInstruction('train', to=10, unit='%'))
I tried to download a small slice of a dataset