I want to discretize a column which contains of a continous variable.
the data looks like ;
c(0,25,77,423,6,8,3,65,32,22,10,0,8,0,15,0,10,1,2,4,5,5,6)
I want turn the numbers into categorical by discretizing, but zeros represent a different category. Sometimes directly discretizing could keep different numbers with zero.
I thought if I keep zeros out then discretize my wish comes true. But in a dataframe column I can't do it because of indexes:
here is an example dput()
output
structure(list(dummy_column = c(0, 25, 77, 423, 6, 8, 3, 65,
32, 22, 10, 0, 8, 0, 15, 0, 10, 1, 2, 4, 5, 5, 6)), class = "data.frame", row.names = c(NA,
-23L))
for example, if I'd like to use 2 breaks, categories should be; zero and the other 3 discretized ones, totally 4 categories. it should be better if I could write function that discretizes a column that can be directly created with dplyr::mutate()
thanks in advance.
If I understood it correctly, your goal is to keep "0" as a separate category when discretizing. Here's a solution using
arules::discretize
to make a new function that can accomplish this: