I am currently in the process of utilizing a dataset in the .mat format. However, I have encountered a challenge as the dataset contains nested arrays, and I am in need of utilizing the data in a CSV format.
I am seeking guidance on the most effective approach to convert this nested .mat dataset into a CSV format. Your expertise in this matter would be greatly appreciated. my dataset link: https://ora.ox.ac.uk/objects/uuid:03ba4b01-cfed-46d3-9b1a-7d4a7bdf6fac/files/m5ac36a1e2073852e4f1f7dee647909a7
import numpy as np
import pandas as pd
import scipy.io as sio
mat = sio.loadmat('Oxford_Battery_Degradation_Dataset_1.mat')
mat
my output
{'__header__': b'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Mon Jun 05 11:16:25 2017',
'__version__': '1.0',
'__globals__': [],
'Cell1': array([[(array([[(array([[(array([[735954.85896553],
[735954.8589771 ],
[735954.85898867],
...,
[735954.8995558 ],
dtype=[('t', 'O'), ('v', 'O'), ('q', 'O'), ('T', 'O')])) ]],
dtype=[('C1ch', 'O'), ('C1dc', 'O'), ('OCVch', 'O'), ('OCVdc', 'O')])) ]],
dtype=[('cyc0000', 'O'), ('cyc0100', 'O'), ('cyc0300', 'O'), ('cyc0400', 'O'), ('cyc0500', 'O'), ('cyc0600', 'O'), ('cyc0700', 'O'), ('cyc0800', 'O'), ('cyc0900', 'O'), ('cyc1000', 'O'), ('cyc1100', 'O'), ('cyc1200', 'O'), ('cyc1300', 'O'), ('cyc1400', 'O'), ('cyc1600', 'O'), ('cyc1800', 'O'), ('cyc1900', 'O'), ('cyc2000', 'O'), ('cyc2100', 'O'), ('cyc2200', 'O'), ('cyc2300', 'O'), ('cyc2400', 'O'), ('cyc2500', 'O'), ('cyc2600', 'O'), ('cyc2700', 'O'), ('cyc2800', 'O'), ('cyc2900', 'O'), ('cyc3000', 'O'), ('cyc3100', 'O'), ('cyc3200', 'O'), ('cyc3300', 'O'), ('cyc3500', 'O'), ('cyc3600', 'O'), ('cyc3700', 'O'), ('cyc3800', 'O'), ('cyc3900', 'O'), ('cyc4000', 'O'), ('cyc4100', 'O'), ('cyc4200', 'O'), ('cyc4300', 'O'), ('cyc4400', 'O'), ('cyc4500', 'O'), ('cyc4600', 'O'), ('cyc4800', 'O'), ('cyc5000', 'O'), ('cyc5100', 'O'), ('cyc5200', 'O'), ('cyc5300', 'O'), ('cyc5400', 'O'), ('cyc5500', 'O'), ('cyc5600', 'O'), ('cyc5700', 'O'), ('cyc5800', 'O'), ('cyc5900', 'O'), ('cyc6000', 'O'), ('cyc6100', 'O'), ('cyc6200', 'O'), ('cyc6300', 'O'), ('cyc6400', 'O'), ('cyc6500', 'O'), ('cyc6600', 'O'), ('cyc6700', 'O'), ('cyc6800', 'O'), ('cyc6900', 'O'), ('cyc7000', 'O'), ('cyc7100', 'O'), ('cyc7200', 'O'), ('cyc7300', 'O'), ('cyc7400', 'O'), ('cyc7500', 'O'), ('cyc7600', 'O'), ('cyc7700', 'O'), ('cyc7800', 'O'), ('cyc7900', 'O'), ('cyc8000', 'O'), ('cyc8100', 'O')])}
In fact, I should have eight datasets in this format, where the columns are associated with 't', 'v', 'q', and 'T' within the arrays. There is a sample representing the expected result for one cell dataset:
cell8= pd.DataFrame(columns=['Time','Voltage','Capacity','Temperature'])
cell8
I'm not sure you realize the volume of data you have here. I have code that can extract the data, but there are just over 61 million data items here. Printed as a CSV file, that comes out to about 2.5 gigabytes.
The start of this file looks like:
The first column runs Cell1 through Cell8. The second column has between 70 and 80 entries,
cyc0000,cyc0100, etc. The third column has 4 entries,C1ch,C1dc,OCVch,OCVdc. The fourth column has 4 entries,t,v,q,T. You can't run the numbers across, because the size of the last dimension varies considerably, from 2,500 to 10,000 entries.FOLLOWUP
Here is code that converts the mat file into a set of nested dicts. You can see on the last line how to access this. Maybe this will work for your purposes.