I am having a logical issue while running a code

28 Views Asked by At

I recently created a code to remove outliers from my dataset. The weird thing is the code is removing rows from Excel sheet when I run the below code

  calculate_outliers <- function(df, value_col) {
  Q1 <- quantile(df[[value_col]], probs=0.25, na.rm = TRUE)
  Q3 <- quantile(df[[value_col]], probs=0.75, na.rm = TRUE)
  IQR <- Q3 - Q1

  lower <- Q1 - 1.5 * IQR
  upper <- Q3 + 1.5 * IQR

  outliersdf <- df[df[[value_col]] > lower & df[[value_col]] < upper, ]
  return(outliersdf)
}

So when I run this code after importing my excel sheet the number of rows in the two excelsheet is reducing. The co2 excel sheet is reducing from 43,068 to 34,861 and the tbdata excel is reducing from 4557 to 3897.

When I change the above code logic to record only the outliers in the outliersdf by this

    outliersdf <- df[df[[value_col]] < lower & df[[value_col]] > upper, ]
  return(outliersdf)

the outliersdf is not having any observations can you help me with what logical mistake I am making ? The code is working one way but not the other if on the initial way its able to reduce the records then when I change the code the deleted records should be stored in the dataframe right ?

I have uploaded the excel sheet in the below link for reference you can download and check it with the data if this is something to do with the data

https://anonymfile.com/4YARa/all-formsoftbincidenceestimated-1.csv https://anonymfile.com/396RJ/co2-terr.csv

0

There are 0 best solutions below