I am collecting time series data, which can be separated into "tasks" based on a particular target value. These tasks can be numbered based on the associated target. However, the lengths of data associated with each task will differ because it may take less time or more time for a "task" to be completed. Right now in MATLAB, this data is separated by the target number into a MATLAB cell, which is extremely convenient as the analysis on this time-series data will be the same for each set of data associated with each target, and thus I can complete data analysis simply by using a for loop to go through each cell in the cell array. My knowledge on the closest equivalent of this in Python would be to generate a ragged array. However, through my research on answering this question, I have found that automatic setting of a ragged array has been deprecated, and that if you want to generate a ragged array you must set dtype = object. I have a few questions surrounding this scenario:
Does setting dtype=object for the ragged array come with any inherent limitations on how one will access the data within the array?
Is there a more convenient way of saving these ragged arrays as numpy files besides reducing dimensionality from 3D to 2D and also saving a file of the associated index? This would be fairly inconvenient I think as I have thousands of files for which it would be convenient to save as a ragged array.
Related to 2, is saving the data as a .npz file any different in practice in terms of saving an associated index? More specifically, would I be able to unpack the ragged arrays automatically based on a technically separate .npy file for each one and being able to assume that each set of data associated with each target is stored in the same way for every file?
Most importantly, is using ragged arrays really the best equivalent set-up for my task, or do I get the deprecation warning about setting dtype=object because manipulating data in this way has become redundant and Python3 has a better method for dealing with stacked arrays of varying size?
I have decided to move forward with a known solution to my problem, and it seems to be adapting well. I organize each set of separate data into it's own array, and then store them in a sequence in a list as I would with cells in MATLAB. To save this information, when I separated out the data I stored the subsequent index value in a list. By this I mean that:
This solution is working quite well. I hope this helps someone in the future.