I recently created a code to remove outliers from my dataset. The weird thing is the code is removing rows from Excel sheet when I run the below code
calculate_outliers <- function(df, value_col) {
Q1 <- quantile(df[[value_col]], probs=0.25, na.rm = TRUE)
Q3 <- quantile(df[[value_col]], probs=0.75, na.rm = TRUE)
IQR <- Q3 - Q1
lower <- Q1 - 1.5 * IQR
upper <- Q3 + 1.5 * IQR
outliersdf <- df[df[[value_col]] > lower & df[[value_col]] < upper, ]
return(outliersdf)
}
So when I run this code after importing my excel sheet the number of rows in the two excelsheet is reducing. The co2 excel sheet is reducing from 43,068 to 34,861 and the tbdata excel is reducing from 4557 to 3897.
When I change the above code logic to record only the outliers in the outliersdf by this
outliersdf <- df[df[[value_col]] < lower & df[[value_col]] > upper, ]
return(outliersdf)
the outliersdf is not having any observations can you help me with what logical mistake I am making ? The code is working one way but not the other if on the initial way its able to reduce the records then when I change the code the deleted records should be stored in the dataframe right ?
I have uploaded the excel sheet in the below link for reference you can download and check it with the data if this is something to do with the data
https://anonymfile.com/4YARa/all-formsoftbincidenceestimated-1.csv https://anonymfile.com/396RJ/co2-terr.csv