Accelerometer Data Classification

94 Views Asked by At

I am trying to classify accelerometer data (sampled with a frequency of 100Hz) into 4 different transportation modes (0,1,2,3). I have 41 different CSV files, each representing a time series. I stored every file in a list called subjects. Each CSV-file looks as follows:

    # Check if the label mapping worked
    test = subjects[0]
    print(test.head())
    print(test.info())
    print(len(test))
              x         y         z  label
    0 -0.154881  0.383397 -0.653029      0
    1 -0.189302  0.410185 -0.597840      0
    2 -0.202931  0.408217 -0.490296      0
    3 -0.205011  0.407853 -0.360820      0
    4 -0.196665  0.430047 -0.147033      0

    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 128628 entries, 0 to 128627
    Data columns (total 4 columns):
     #   Column  Non-Null Count   Dtype  
    ---  ------  --------------   -----  
     0   x       128628 non-null  float64
     1   y       128628 non-null  float64
     2   z       128628 non-null  float64
     3   label   128628 non-null  int64  
    dtypes: float64(3), int64(1)
    memory usage: 3.9 MB
    None

    128628

At first, I would like to start with implementing a Random Forest Algorithm. However I am not sure how to create the train and test dataset for this, as I have different CSV-files.

How can I create the train and test files for this task? At first I thought about concat all CSV-files together, but as each file represents a time series, I am not sure if this is the correct way to do this.

Thanks in advance for helping!

1

There are 1 best solutions below

2
Musabbir Arrafi On

Here's a rough example of what you want to do:

# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# concat the list of your dataframes
df = pd.concat(list_of_your_dataframes)
print(df.head())

# Split the data into features (X) and target labels (y)
X = df[['x', 'y', 'z']]
y = df['label']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Random Forest Classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Fit the classifier to the training data
clf.fit(X_train, y_train)

# Make predictions on the test data
y_pred = clf.predict(X_test)

# Evaluate the classifier's performance
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print("Accuracy:", accuracy)
print("Classification Report:\n", report)