I have several datasets consisting of multiple time series data, such as acceleration data over specific times. Each set of acceleration data corresponds to a specific speed. I aim to train an LSTM model to process the entire series of acceleration data and its corresponding speed so that later, I can input just the acceleration data and predict the specific speed (Please see the attached image for the problem defination).
Example:
X_train= x1, x2, x3, x4....xn (Time series data-These are all stacked in one column) Y_train= y, y, y, y, ...y (same for each series)
now I will feed multile responses like these. Then I will input an unseen dataset of X_test= x1, x2, x3, x4....xn and predict y_test = (One Single value) ?
- How should I compile the dataset?
- Is this the correct format for compiling the dataset, as shown in the attached figure?
- What other algorithms, besides LSTM, can address a similar problem?
- Can you provide a sample code for a similar problem? Thank you for your help.
LSTM models are commonly used to predict future time steps based on past information — for example, predicting X_3 at time step t+1 using X_1 and X_2 at time step t. However, in this scenario, the goal is different; the objective is not to forecast future events, but to predict a new variable (e.g., speed) at time steps that correspond to those in the input time series (e.g., X_1 and X_2 for acceleration data).enter image description here
I have tried this code but it doesn't give an accurate result. The dataset looks something like the attached imageTrain_Disp.xlsx
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout
# Read the dataset
# df = pd.read_csv('Train.csv')
# Read the dataset
df = pd.read_excel('Train_Disp.xlsx')
# Training Data
No_data_total = len(df)
No_data_train = len(df) - 200
df_train = df.iloc[:No_data_train, 1:] # Training data without time
print(df_train.shape)
# Normalizing the training data
scX = StandardScaler()
df_trainX_norm = scX.fit_transform(df_train)
scY = StandardScaler()
df_trainY_norm = scY.fit_transform(df_train[['Load']])
# The input and output data
seq_len = 8
X_train, Y_train = [], []
for i in range(seq_len, len(df_trainX_norm)):
X_train.append(df_trainX_norm[i-seq_len:i])
Y_train.append(df_trainY_norm[i][0])
X_train, Y_train = np.array(X_train), np.array(Y_train)
print(X_train.shape)
print(Y_train.shape)
# Building the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.1))
model.add(LSTM(units=50))
model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')
# Training the model
model.fit(X_train, Y_train, epochs=3, batch_size=32)
# Preparing Data for Prediction (including all data points)
df_allX_norm = scX.transform(df.iloc[:, 1:]) # Normalizing all data
X_test_pred = []
for i in range(seq_len, len(df_allX_norm)):
X_test_pred.append(df_allX_norm[i-seq_len:i])
X_test_pred = np.array(X_test_pred)
# Prediction
Y_test_pred = model.predict(X_test_pred)
Y_test_pred = scY.inverse_transform(Y_test_pred)