Error generating dataset from huggingface

173 Views Asked by At

I am trying to load the following dataset from huggingface: bitext/Bitext-customer-support-llm-chatbot-training-dataset. However, each time I try to run this code, I keep coming across the same error over and over again. It is not completing the loading process. This is the code for loading the dataset and extracting the information, which is to be stored in a json file:

import json
from datasets import load_dataset

# load the dataset
dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", download_mode="force_redownload")
print(dataset)
# dataset = dataset['train'][0]


# Initialize an empty intents list
intents = []


# Loop through the dataset
for data in dataset['train']:
    instruction = data['instruction']
    response = data['response']
    intent = data['intent']


    # Check if intent exists
    intent_exists = False
    for existing_intent in intents:
        if existing_intent['tag'] == intent:
            intent_exists = True
            break
        

    # If intent does not exist, create a new one
    if not intent_exists:
        new_intent = {
            "tag": intent,
            "patterns": [],
            "responses": []
        }
        intents.append(new_intent)
        
        
    # add question to patterns list of corresponding intent
    for existing_intent in intents:
        if existing_intent['tag'] == intent:
            existing_intent['patterns'].append(instruction)
            break
        
        
    # add response to responses list of corresponding intent
    for existing_intent in intents:
        if existing_intent['tag'] == intent:
            existing_intent['responses'].append(response)
            break
        
        
# save intents to json file
with open("intents/intents-bitext-01.json", "w") as f:
    json.dump(intents, f, indent=4)
    
print("Extraction complete")

However, each time I run this code, I get the following error:

Downloading readme: 100%|█████████████████████████████████████████████████████████| 11.3k/11.3k [00:00<00:00, 11.3MB/s]
Downloading data: 100%|████████████████████████████████████████████████████████████| 19.2M/19.2M [01:34<00:00, 203kB/s]
Downloading data files: 100%|████████████████████████████████████████████████████████████| 1/1 [01:34<00:00, 94.54s/it]
Extracting data files: 100%|████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 111.42it/s]
Generating train split: 0 examples [00:00, ? examples/s]
Traceback (most recent call last):
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\builder.py", line 1908, in _prepare_split_single
    writer = writer_class(
             ^^^^^^^^^^^^^
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\arrow_writer.py", line 335, in __init__
    self.stream = self._fs.open(fs_token_paths[2][0], "wb")
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\fsspec\spec.py", line 1307, in open
    f = self._open(
        ^^^^^^^^^^^
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\fsspec\implementations\local.py", line 180, in _open
    return LocalFileOpener(path, mode, fs=self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\fsspec\implementations\local.py", line 302, in __init__
    self._open()
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\fsspec\implementations\local.py", line 307, in _open
    self.f = open(self.path, mode=self.mode)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/ASAA/.cache/huggingface/datasets/bitext___bitext-customer-support-llm-chatbot-training-dataset/default-21d4b5f37915169d/0.0.0/eea64c71ca8b46dd3f537ed218fc9bf495d5707789152eb2764f5c78fa66d59d.incomplete/bitext-customer-support-llm-chatbot-training-dataset-train-00000-00000-of-NNNNN.arrow'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Programming\CSB\extract.py", line 5, in <module>
    dataset = load_dataset("bitext/Bitext-customer-support-llm-chatbot-training-dataset", download_mode="force_redownload")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\load.py", line 2152, in load_dataset
    builder_instance.download_and_prepare(
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\builder.py", line 948, in download_and_prepare
    self._download_and_prepare(
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\builder.py", line 1043, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\builder.py", line 1805, in _prepare_split
    for job_id, done, content in self._prepare_split_single(
  File "C:\Programming\CSB\csb_venv\Lib\site-packages\datasets\builder.py", line 1950, in _prepare_split_single
    raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

What could be the problem, and how can I fix it?

0

There are 0 best solutions below