I have a dataset that has 2 columns; column A is State_Name and has 5 different options of state, and column B is Total_Spend which has the average total spend of that state per day. There are 365 observations for each state.
What I want to do is count the number of outliers PER STATE using the 1.5 IQR rule and save the count of outliers per state to a new df or table.
So I would expect an output something like:
| State | Outlier Count |
|---|---|
| ATL | 5 |
| GA | 20 |
| MI | 11 |
| NY | 50 |
| TX | 23 |
I have managed to get it to work by doing it one state at a time but I can't figure out what to do to achieve this in a single go.
Here is my code at the moment (to return the result for a single state):
daily_agg %>%
select(State_Name, Total_Spend) %>%
filter(State_Name == "NY")
outlier_NY <- length(boxplot.stats(outlier_df$Total_Spend)$out)
Any help would be appreciated.
Thanks!
EDIT WITH TEST DATASET
outlier_mtcars <-
df %>%
select(cyl, disp) %>%
filter(cyl == "6")
outliers <- length(boxplot.stats(outlier_mtcars$disp)$out)
The above shows me 1 outlier for 6 cyl cars but I want a table that shows how many outliers for 4, 6, 8 cyl cars
Since I'm not very familiar with the function
boxplot.stats, I didn't use this in my solution and instead manually calculates 1.5 * IQR + upper quantile.Here
mtcarswas used as an example. For the records that are outliers, they are "flagged" asTRUE, where we can sum them up insummarize.