I have a Python script to plot a dataset with the following metrics: "99th percentile," "50th percentile," "Mean," "2xx," "4xx/5xx." The 'IsError' column is the label indicating 'Service_Status,' indicating the normal or anomalous state of the service.
I create a pair plot to provide a better visual understanding of the dataset. However, I have a problem where, in many cases, the anomalous and normal behaviors overlap in the graph, giving the visual impression that only one case exists. For example, in the plot, when '3_True' and '3_False' have the same behavior, only '3_False' is visible, giving the false idea that there are no anomalous occurrences. I've tried adjusting the alpha to the maximum but without success. Any suggestions on how to address this issue?
Script:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = pd.read_csv('../datasets/application anomalies/data.csv')
data = data.drop('Time', axis=1)
data['Service_Status'] = data['Service'].astype(str) + '_' + data['IsError'].astype(str)
markers = {'0_False': 'D', '1_False': 'D', '2_False': 'D', '3_True': 'X', '4_False': 'D', '5_False': 'D', '6_False': 'D', '3_False': 'D', '0_True': 'X', '4_True': 'X', '1_True': 'X','5_True':'X','6_True':'X'}
custom_palette = sns.color_palette("Set1", len(data['Service_Status'].unique()))
scatter_kws = {'s': 100, 'alpha': 0.2, 'style': data['Service_Status'], 'markers': markers}
sns.pairplot(data, hue='Service_Status', vars=["99th percentile", "50th percentile", "Mean", "2xx", "4xx/5xx"], diag_kind="kde", plot_kws=scatter_kws, corner=False)
plt.show()
Dataset sample:
https://github.com/jnobre/data-sample/blob/main/data-sample.csv