Machine learning with handling features which are suppose to have missing data

399 Views Asked by Sankalpa Wijewickrama At 28 December 2022 at 08:25

I am currently working in a project for my MSc and I am having this issue with that dataset. I don't have previous experience in machine learning and this is my first exposure.

In my dataset I started doing my EDA (Exploratory Data Analysis) and I have a categorical feature with missing data which is Province_State. This column has 52360 missing values and as a percentage that is a 5.40%. I guess that is not too bad and according to what I learnt, I should impute these missing values or delete the column if I have reasonable reasonings.

My logical reasoning is that, not every country has provinces. Therefore that is pretty normal that there are missing values. I clearly don't see a point in imputing these missing values with a random value because that is not logically and it will also lead inaccuracy within the model because we cannot come up with a value which does not practically exist for that particular country.

I think I should do one of the following:

Impute all the missing values to a constant value such as -1 or "NotApplicable"
Remove the feature from the dataset

Please help me with a solution and thank you very much in advance.

(This dataset can be accessed from this link)

Original Q&A

There are 1 best solutions below

Aashutosh sinha On 29 December 2022 at 06:37

There are many ways to handle missing data .Deleting the whole column is not a good idea in most cases as you will be discarding information, however if you still want to delete the feature perform univariate analysis on that feature and see if its useful and decide accordingly. Instead of removing the feature you can use any of the following ways:

Impute missing values with Mean/Median.
Predict missing values.
Impute all the missing values to -1.
Use algorithms that support missing values.

Machine learning with handling features which are suppose to have missing data

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in STATISTICS

Related Questions in DATA-SCIENCE

Related Questions in BIGDATA

Related Questions in EXPLORATORY-DATA-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions