Ways to fullfil NaN Values for Intrusion Detection with ML, Unsupervised ML

75 Views Asked by AudioBubble At 27 July 2025 at 17:28

I created a CSV file. It contains 250800 rows and 75 columns. I am doing an EDA analysis to use the data for ML.

It looks like this. All of the columns are float or integer values. (df.info()) When I do :

df.dropna()

It removes NaN values but the issue is that columns like protocol lose all unique values and just have one value, same for dstport and this is not something I want, losing data is not welcoming. As suggested here, I did this:

df = df.dropna(subset = ["Protocol","DstPort", "State"])

But the result is the same, still same NaN values, and cannot apply Kmeans clustering for example.

I would like to ask for your suggestion. What should I do? Can I fill these values somehow, but I don't know in which sense? Which machine learning model I should choose?

Original Q&A

There are 1 best solutions below

AudioBubble On 15 April 2022 at 23:04

I found 3 common ways to fill NaN values.

Average: df.fillna((df.mean()), inplace=True)
Most Frequent: df[‘col’].fillna(df[‘col’].mode().iloc[0], inplace=True)
Median: df.fillna((df.median()), inplace=True)

I am not sure if this is the correct approach for my data since it is network traffic but just wanted to share.

Ways to fullfil NaN Values for Intrusion Detection with ML, Unsupervised ML

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in MACHINE-LEARNING

Related Questions in INTRUSION-DETECTION

Related Questions in NETWORK-FLOW

Trending Questions

Popular # Hahtags

Popular Questions