How to assign a different bin size to the head and tails of a distribution in hist?

92 Views Asked by At

I am trying to create an histogram where I can assign a different bin sizes to the data depending on whether they lie in the head or the tail of the distribution.

I tried to create the following function my_f to use as the input to the argument breaks=, but it is not working. Here's my code, along with the error I get.

x <- rnorm(1000, 10, 275)
my_f <- function(x){
  loc <- list(x[x < -500], x[x >= -500 & x <= 500], x[x > 500])
  dx <- c(5, 1, 5)
  breaks <- sapply(1:length(x), function(i) if(x[i] %in% loc[[1]])
     {seq(min(loc[[1]]), max(loc[[1]])+dx[1], dx[1])} else
       if(x[i] %in% loc[[2]]){seq(min(loc[[2]]), max(loc[[2]])+dx[2], dx[2])} else
         {seq(min(loc[[3]]), max(loc[[3]])+dx[3], dx[3])})
  return(breaks)
}

h <- hist(x, breaks = my_f)

Error in hist.default(x, breaks = my_f, plot = F) : 
  c("Invalid breakpoints produced by 'breaks(x)': 200.1702, 210.1702, ....

I also tried without the sapply function but I didn't get anynothing out of this. Any suggestion on how to solve/get around this issue?

1

There are 1 best solutions below

2
On BEST ANSWER

I believe you're thinking too complicated and what you want is this.

my_f2 <- function(x) {
  c(seq(min(x), max(x[x < -500]), 5),
    seq(-500, 500, 1), 
    seq(min(x[x > 500]), max(x), 5), 
    max(x))
}

set.seed(666)
x <- rnorm(1000, 10, 275)
hist(x, my_f2)

enter image description here

Note, however, that the last bin is somewhat dynamic, because max(x) isn't effectively included in seq(min(x[x > 500]), max(x), 5) and we therefore have to include it extra.