I am trying to create an histogram where I can assign a different bin sizes to the data depending on whether they lie in the head or the tail of the distribution.
I tried to create the following function my_f
to use as the input to the argument breaks=
, but it is not working. Here's my code, along with the error I get.
x <- rnorm(1000, 10, 275)
my_f <- function(x){
loc <- list(x[x < -500], x[x >= -500 & x <= 500], x[x > 500])
dx <- c(5, 1, 5)
breaks <- sapply(1:length(x), function(i) if(x[i] %in% loc[[1]])
{seq(min(loc[[1]]), max(loc[[1]])+dx[1], dx[1])} else
if(x[i] %in% loc[[2]]){seq(min(loc[[2]]), max(loc[[2]])+dx[2], dx[2])} else
{seq(min(loc[[3]]), max(loc[[3]])+dx[3], dx[3])})
return(breaks)
}
h <- hist(x, breaks = my_f)
Error in hist.default(x, breaks = my_f, plot = F) :
c("Invalid breakpoints produced by 'breaks(x)': 200.1702, 210.1702, ....
I also tried without the sapply
function but I didn't get anynothing out of this. Any suggestion on how to solve/get around this issue?
I believe you're thinking too complicated and what you want is this.
Note, however, that the last bin is somewhat dynamic, because
max(x)
isn't effectively included inseq(min(x[x > 500]), max(x), 5)
and we therefore have to include it extra.