I am trying to classify accelerometer data (sampled with a frequency of 100Hz) into 4 different transportation modes (0,1,2,3). I have 41 different CSV files, each representing a time series. I stored every file in a list called subjects. Each CSV-file looks as follows:
# Check if the label mapping worked
test = subjects[0]
print(test.head())
print(test.info())
print(len(test))
x y z label
0 -0.154881 0.383397 -0.653029 0
1 -0.189302 0.410185 -0.597840 0
2 -0.202931 0.408217 -0.490296 0
3 -0.205011 0.407853 -0.360820 0
4 -0.196665 0.430047 -0.147033 0
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 128628 entries, 0 to 128627
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 x 128628 non-null float64
1 y 128628 non-null float64
2 z 128628 non-null float64
3 label 128628 non-null int64
dtypes: float64(3), int64(1)
memory usage: 3.9 MB
None
128628
At first, I would like to start with implementing a Random Forest Algorithm. However I am not sure how to create the train and test dataset for this, as I have different CSV-files.
How can I create the train and test files for this task? At first I thought about concat all CSV-files together, but as each file represents a time series, I am not sure if this is the correct way to do this.
Thanks in advance for helping!
Here's a rough example of what you want to do: