I am currently data wrangling on a very new project, and it is proving a challenge.
I have EEG data that has been preprocessed in eeglab in MATLAB, and I would like to load it into python to use it to train a classifier. I also have a .csv file with the subject IDs of each individual, along with a number (1, 2 or 3) corresponding to which third of the sample they are in.
Currently, I have the data saved as .mat files, one for each individual (104 in total), each containing an array shaped 64x2000x700 (64 channels, 2000 data points per 2 second segment (sampling frequency of 1000Hz), 700 segments). I would like to load each participant's data into the dataframe alongside their subject ID and classification score.
I tried this:
all_files = glob.glob(os.path.join(path, "*.mat"))
lang_class= pd.read_csv("TestLangLabels.csv")
df_dict = {}
for file in all_files:
file_name = os.path.splitext(os.path.basename(file))[0]
df_dict[file]
df_dict[file_name]= loadmat(file,appendmat=False)
# Setting the file name (without extension) as the index name
df_dict[file_name].index.name = file_name
But the files are so large that this maxes out my memory and doesn't complete.
Then, I attempted to loop it using pandas using the following:
main_dataframe = pd.DataFrame(loadmat(all_files[0]))
for i in range(1,len(all_files)):
data = loadmat(all_files[i])
df = pd.DataFrame(data)
main_dataframe = pd.concat([main_dataframe,df],axis=1)
At which point I got the error:
ValueError: Data must be 1-dimensional
Is there a way of doing this that I am overlooking, or will downsampling be inevitable?
subjectID | Data | Class |
---|---|---|
AA123 | 64x2000x700 | 2 |
I believe that something like this could then be used as a test/train dataset for my model, but welcome any and all advice!
Thank you in advance.
Is there a reason you have such a high sampling rate? I don't believe Ive heard a compelling reason to go over 512hz and normally take it down to 256hz. I don't know if it matters for ML, but most other approach really don't need that. Going from 1000hz to 500hz or even 250hz might help.