Label a dataset according to bins of a histogram

84 Views Asked by At

I have a dataframe with over 40k rows. This dataset has 2 columns, AccountNumber and NumberOfContacts. I created a histogram using the following code:

p <- ggplot() + aes(contactsInfo$NumberOfContacts) + geom_histogram(binwidth=10) + xlim(10,300)+
  xlab("Number of contacts") + ylab("Number of accounts")
p

I would now like to create an additional column called 'Bin' to my original dataframe according to the bins.

For example:

If an AccountNumber has within 0-10 contacts, then the column Bin should be equal to 1 for that AccountNumber.

Similarly, if an AccountNumber has between 50-60 contacts, then Bin should be equal to 5, and so on...

I can think of a ridiculous ifelse statement combination which will be extremely lengthy to achieve this task. I was hoping if there's an easier way to achieve this.

Any help would be much appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

I don't know all the details of your dataset, but using mutate in the dplyr package:

mutate(contactsInfo, bin = floor(NumberOfContacts / 10))
0
On

You can use something like

contactsInfo$binned <- cut(contactsInfo$NumberOfContacts, breaks = seq(0, 100, 10), labels = FALSE)