Python random seed over pandas and huggingface

70 Views Asked by Mehdi Abbassi At 28 June 2025 at 15:15

I am currently working on reproducing the results of a research paper, which has made its dataset available on Hugging Face Hub. The paper outlines a specific method and random generator seed, using pandas, to split the dataset into training and testing sets. Here is the code snippet used in the paper:

import pandas as pd
train_size = 0.8            
train_dataset = new_df.sample(frac=train_size, random_state=200)
test_dataset = new_df.drop(train_dataset.index).reset_index(drop=True)
train_dataset = train_dataset.reset_index(drop=True)

However, since I cannot use the pandas’ method on a DatasetDict, I attempted to split the dataset using a different method with the same random generator seed. Unfortunately, this produced different results. Here is the code snippet for my approach:

ds = dataset["train"].train_test_split(test_size=0.2, seed=200, shuffle=False)

Could you please suggest a way to split the dataset that would result in the same training and testing sets specified in the paper?

Original Q&A

Python random seed over pandas and huggingface

There are 0 best solutions below

Related Questions in PANDAS

Related Questions in RANDOM-SEED

Related Questions in HUGGINGFACE-DATASETS

Related Questions in HUGGINGFACE-HUB

Trending Questions

Popular # Hahtags

Popular Questions