I am currently data wrangling on a very new project, and it is proving a challenge.

I have EEG data that has been preprocessed in eeglab in MATLAB, and I would like to load it into python to use it to train a classifier. I also have a .csv file with the subject IDs of each individual, along with a number (1, 2 or 3) corresponding to which third of the sample they are in.

Currently, I have the data saved as .mat files, one for each individual (104 in total), each containing an array shaped 64x2000x700 (64 channels, 2000 data points per 2 second segment (sampling frequency of 1000Hz), 700 segments). I would like to load each participant's data into the dataframe alongside their subject ID and classification score.

I tried this:

all_files = glob.glob(os.path.join(path, "*.mat"))
 
lang_class= pd.read_csv("TestLangLabels.csv")
               
df_dict = {}


for file in all_files:
    file_name = os.path.splitext(os.path.basename(file))[0]
    df_dict[file]
    df_dict[file_name]= loadmat(file,appendmat=False)
    # Setting the file name (without extension) as the index name
    df_dict[file_name].index.name = file_name

But the files are so large that this maxes out my memory and doesn't complete.

Then, I attempted to loop it using pandas using the following:


main_dataframe = pd.DataFrame(loadmat(all_files[0]))
  
for i in range(1,len(all_files)):
    data = loadmat(all_files[i])
    df = pd.DataFrame(data)
    main_dataframe = pd.concat([main_dataframe,df],axis=1)

At which point I got the error: ValueError: Data must be 1-dimensional

Is there a way of doing this that I am overlooking, or will downsampling be inevitable?

subjectID Data Class
AA123 64x2000x700 2

I believe that something like this could then be used as a test/train dataset for my model, but welcome any and all advice!

Thank you in advance.

1

There are 1 best solutions below

0
On

Is there a reason you have such a high sampling rate? I don't believe Ive heard a compelling reason to go over 512hz and normally take it down to 256hz. I don't know if it matters for ML, but most other approach really don't need that. Going from 1000hz to 500hz or even 250hz might help.