Getting the median by date using dplyr's summarise() in R

27.5k Views Asked by At

I have a data frame of integer-count observations listed by date and time interval. I want to find the median of these observations by date using the dplyr package. I've already formatted the date column correctly, and used group_by like so:

data.bydate <- group_by(data.raw, date)

When I use summarise() to find the median of each date group, all I'm getting are a bunch of zeroes. There are NA's in the data, so I've been stripping them with na.rm = TRUE.

data.median <- summarise(data.bydate, median = median(count, na.rm = TRUE)

Is there another way I should be doing this?

3

There are 3 best solutions below

0
On

You can do something like,

data.raw %>% group_by(date) %>% summarise(median = median(count, na.rm = TRUE))
0
On

example how I made this using dplyr

data.median<-data.bydate%>% summarise(median = median(count, na.rm = TRUE))

0
On

It's possible each group has too many zero values. Try to identify number of unique value in each group to check whether the groups have too many zeros in them. The below code could help to see the number of unique values and total values available for count variable in each group.

summarise(data.bydate, unique_code = n_distinct(count), total_count = n(count))