I am using the dataset "adult". http://archive.ics.uci.edu/ml/datasets/Adult I have retrieved frequent rules using apriori and sorted them by lift.
library(arules)
trans = read.transactions("adult.data", format = "basket", sep = ",", rm.duplicates = TRUE)
rules <- apriori(trans)
rules.lift <- sort(rules, decreasing = TRUE, by="lift")
When I execute
inspect(head(rules.lift,100))
I obtain the following:
lhs rhs support confidence lift
1 { 13,
Male,
United-States} => { Bachelors} 0.1024507 0.9976077 6.066125
2 { 0,
13,
Male,
United-States} => { Bachelors} 0.1024507 0.9976077 6.066125
ETC
For example, in the rule:
{ 0,
13,
Male,
United-States} => { Bachelors}
How can I know which attribute that 0
and that 13
are? I have looked at the description of the data set and to the data itself so I guess that 13
is the education-num and 0
is the capital-loss but sometimes two or more attributes can have the same ranges so I would not know how to distinguish them.
>class(rules.lift)
[1] "rules"
attr(,"package")
[1] "arules"
I've read here: How could we know the ColumnName /attribute of items generated in Rules that the problem is I haven't preprocessed the data. So, how can I do that?
Thank you very much!