How can I format multiple time series inputs (e.g., acceleration data) for an LSTM to predict a single output (e.g., speed)

28 Views Asked by At

I have several datasets consisting of multiple time series data, such as acceleration data over specific times. Each set of acceleration data corresponds to a specific speed. I aim to train an LSTM model to process the entire series of acceleration data and its corresponding speed so that later, I can input just the acceleration data and predict the specific speed (Please see the attached image for the problem defination).

Example:

X_train= x1, x2, x3, x4....xn (Time series data-These are all stacked in one column) Y_train= y, y, y, y, ...y (same for each series)

now I will feed multile responses like these. Then I will input an unseen dataset of X_test= x1, x2, x3, x4....xn and predict y_test = (One Single value) ?

  1. How should I compile the dataset?
  2. Is this the correct format for compiling the dataset, as shown in the attached figure?
  3. What other algorithms, besides LSTM, can address a similar problem?
  4. Can you provide a sample code for a similar problem? Thank you for your help.

LSTM models are commonly used to predict future time steps based on past information — for example, predicting X_3 at time step t+1 using X_1 and X_2 at time step t. However, in this scenario, the goal is different; the objective is not to forecast future events, but to predict a new variable (e.g., speed) at time steps that correspond to those in the input time series (e.g., X_1 and X_2 for acceleration data).enter image description here

I have tried this code but it doesn't give an accurate result. The dataset looks something like the attached imageTrain_Disp.xlsx

# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.preprocessing import StandardScaler
from keras.models import Sequential
from keras.layers import LSTM, Dense, Dropout

# Read the dataset
# df = pd.read_csv('Train.csv')
# Read the dataset
df = pd.read_excel('Train_Disp.xlsx')



# Training Data
No_data_total = len(df)
No_data_train = len(df) - 200
df_train = df.iloc[:No_data_train, 1:] # Training data without time
print(df_train.shape)

# Normalizing the training data
scX = StandardScaler()
df_trainX_norm = scX.fit_transform(df_train) 
scY = StandardScaler()
df_trainY_norm = scY.fit_transform(df_train[['Load']])

# The input and output data
seq_len = 8
X_train, Y_train = [], []
for i in range(seq_len, len(df_trainX_norm)):
    X_train.append(df_trainX_norm[i-seq_len:i])
    Y_train.append(df_trainY_norm[i][0])
X_train, Y_train = np.array(X_train), np.array(Y_train)
print(X_train.shape)
print(Y_train.shape)

# Building the model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.1))
model.add(LSTM(units=50))
model.add(Dropout(0.1))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mean_squared_error')

# Training the model
model.fit(X_train, Y_train, epochs=3, batch_size=32)

# Preparing Data for Prediction (including all data points)
df_allX_norm = scX.transform(df.iloc[:, 1:]) # Normalizing all data
X_test_pred = []
for i in range(seq_len, len(df_allX_norm)):
    X_test_pred.append(df_allX_norm[i-seq_len:i])
X_test_pred = np.array(X_test_pred)

# Prediction
Y_test_pred = model.predict(X_test_pred)
Y_test_pred = scY.inverse_transform(Y_test_pred)
0

There are 0 best solutions below