Labels for levels of a categorical variable after using discretize

381 Views Asked by At

I tried to convert a variable using discretize function using arules packages. But the output has very awkward labels. Can anyone suggest how to convert these labels into something like "low", "medium", "high" or simply 1, 2, 3.

library(arules)
#> Warning: package 'arules' was built under R version 3.6.3
#> Loading required package: Matrix
#> 
#> Attaching package: 'arules'
#> The following objects are masked from 'package:base':
#> 
#>     abbreviate, write
discretize(iris[,1], breaks = 3)
#>   [1] [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [5.4,6.3) [4.3,5.4)
#>   [8] [4.3,5.4) [4.3,5.4) [4.3,5.4) [5.4,6.3) [4.3,5.4) [4.3,5.4) [4.3,5.4)
#>  [15] [5.4,6.3) [5.4,6.3) [5.4,6.3) [4.3,5.4) [5.4,6.3) [4.3,5.4) [5.4,6.3)
#>  [22] [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4)
#>  [29] [4.3,5.4) [4.3,5.4) [4.3,5.4) [5.4,6.3) [4.3,5.4) [5.4,6.3) [4.3,5.4)
#>  [36] [4.3,5.4) [5.4,6.3) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4)
#>  [43] [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4) [4.3,5.4)
#>  [50] [4.3,5.4) [6.3,7.9] [6.3,7.9] [6.3,7.9] [5.4,6.3) [6.3,7.9] [5.4,6.3)
#>  [57] [6.3,7.9] [4.3,5.4) [6.3,7.9] [4.3,5.4) [4.3,5.4) [5.4,6.3) [5.4,6.3)
#>  [64] [5.4,6.3) [5.4,6.3) [6.3,7.9] [5.4,6.3) [5.4,6.3) [5.4,6.3) [5.4,6.3)
#>  [71] [5.4,6.3) [5.4,6.3) [6.3,7.9] [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9]
#>  [78] [6.3,7.9] [5.4,6.3) [5.4,6.3) [5.4,6.3) [5.4,6.3) [5.4,6.3) [5.4,6.3)
#>  [85] [5.4,6.3) [5.4,6.3) [6.3,7.9] [6.3,7.9] [5.4,6.3) [5.4,6.3) [5.4,6.3)
#>  [92] [5.4,6.3) [5.4,6.3) [4.3,5.4) [5.4,6.3) [5.4,6.3) [5.4,6.3) [5.4,6.3)
#>  [99] [4.3,5.4) [5.4,6.3) [6.3,7.9] [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [106] [6.3,7.9] [4.3,5.4) [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [113] [6.3,7.9] [5.4,6.3) [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [120] [5.4,6.3) [6.3,7.9] [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [127] [5.4,6.3) [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [134] [6.3,7.9] [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9] [5.4,6.3) [6.3,7.9]
#> [141] [6.3,7.9] [6.3,7.9] [5.4,6.3) [6.3,7.9] [6.3,7.9] [6.3,7.9] [6.3,7.9]
#> [148] [6.3,7.9] [5.4,6.3) [5.4,6.3)
#> attr(,"discretized:breaks")
#> [1] 4.3 5.4 6.3 7.9
#> attr(,"discretized:method")
#> [1] frequency
#> Levels: [4.3,5.4) [5.4,6.3) [6.3,7.9]
table(discretize(iris[,1], breaks = 3))
#> 
#> [4.3,5.4) [5.4,6.3) [6.3,7.9] 
#>        46        53        51
2

There are 2 best solutions below

2
On

If I read your objective correctly, you can do the same thing with the base cut function. E.g.,

cut(iris$Sepal.Length, breaks = c(4.3, 5.4, 6.3, 7.9), labels = c('lo', 'med', 'hi'))

If you want to replace the values with the cuts:

cuts <- cut(iris$Sepal.Length, breaks = c(4.3, 5.4, 6.3, 7.9), labels = c('lo', 'med', 'hi'))
iris$Sepal.Length <- cuts

Just replace the labels with your own.

0
On

For one column you can do:

discretize(iris[,1], breaks = 3,labels=c(letters[1:3]))

For the data.frame you pass the commands using the default= argument :

discretizeDF(iris, default = list(method = "interval", breaks = 3,labels=1:3))

These can be found in the example provided in the help page.