Inconsistent Results in Reconstruction Error Calculation for Anomaly Detection with LSTM

14 Views Asked by At

I encountered an issue while using an LSTM model for anomaly detection. Despite training and testing on different datasets with distinct distributions, I noticed variations in the reconstruction error for the attack (test) data. Can someone help me understand why this discrepancy occurs and suggest potential solutions?

I am encountering an issue with my anomaly detection system, specifically related to the calculation of reconstruction error. I have implemented an autoencoder-based anomaly detection model using TensorFlow/Keras. The model is trained on a dataset containing both normal and anomalous data samples.

The problem I am facing is that each time I run the model and calculate the reconstruction error for the anomalous data samples (reconstruction_error_attack), I get different results. However, I have not made any changes to the code or the dataset between runs.

Here is a summary of the steps I am taking:

  • Preprocessing the data: Scaling the data using StandardScaler and reshaping it into sequences.
  • Building the autoencoder model: I am using a simple LSTM-based autoencoder architecture.
  • Training the model: I train the autoencoder on the normal data samples (df_normal).
  • Testing the model: I calculate the reconstruction error for the anomalous data samples (df_attack) using the trained autoencoder.

Each time I calculate the reconstruction error for the anomalous data samples, I get different results. However, I do not encounter this issue when calculating the reconstruction error for the normal data samples (reconstruction_error_normal). Is it possible that the reconstruction error for the anomalous data samples is being influenced by the reconstruction error for the normal data samples?

I canot find top and bottom as static : dynamic_threshold_bottom, dynamic_threshold_top

I tried to define dynamic thresholds based on percentiles of the attack reconstruction error and then identify anomaly points above and below these thresholds. I expected to visualize the reconstruction error along with the dynamic thresholds and highlight the anomaly points accordingly. However, the resulting anomaly points seemed inconsistent with what I anticipated.

scaler = StandardScaler()
df_normal_scaled = scaler.fit_transform(df_normal)
df_attack_scaled = scaler.transform(df_attack)

# Reshape data into sequences
def reshape_data(data):
    return data.reshape(data.shape[0], 1, data.shape[1])
X_train = reshape_data(df_normal_scaled)
X_test = reshape_data(df_attack_scaled)
# Testing
def calculate_reconstruction_error(model, data):
    reconstructed = model.predict(data)
    error = np.mean(np.square(data - reconstructed), axis=(1,2))
    return error

attack_reconstruction_error = calculate_reconstruction_error(autoencoder, X_test)

# Define dynamic thresholds
dynamic_threshold_top = np.percentile(attack_reconstruction_error, 99)
dynamic_threshold_bottom = np.percentile(attack_reconstruction_error, 30)
# dynamic_threshold_top = np.percentile(attack_reconstruction_error, 70)
# dynamic_threshold_bottom = np.percentile(attack_reconstruction_error, 1)
# Find anomaly points
anomaly_points_top = np.where(attack_reconstruction_error > dynamic_threshold_top)[0]
anomaly_points_bottom = np.where(attack_reconstruction_error < dynamic_threshold_bottom)[0] 

which top and bottom is correct?

0

There are 0 best solutions below