I generated association rules using the apriori algorithm in R. Normally, I can visualize the rules generated using the code below:
# Load the arules package
library(arules)
library(arulesViz)
# Generate example transaction data
example_transactions <- list(
c("item1", "item2", "item3"),
c("item1", "item3"),
c("item2", "item4"),
c("item1", "item2", "item4"),
c("item2", "item3")
)
# Convert the transaction data to a transactions object
example_transactions <- as(example_transactions, "transactions")
# Mine association rules
example_rules <- apriori(example_transactions,
parameter = list(support = 0.02, confidence = 0.2),
control = list(verbose = FALSE))
# Plot the association rules using the "paracoord" method
plot(example_rules, method = "paracoord")
I exported the rules to CSV and performed a manual rule selection using the Lift Increase Criterion (LIC) since I could not do this in R. Applying the LIC criterion led to a reduction in the number of rules (LIC involves selecting rules which experience an increase - by a particular threshold - in the lift of their parent rule when a new item is added to the pre-existing item in the lhs of parent rule).
Now, I want to import the manually selected rules into R again and visualize them using plot(). However, when I do this, the code does not work properly.
write(example_rules,
file = "rules.csv",
sep = ",",
quote = TRUE,
row.names = FALSE)
# Read the CSV file into a data frame
rules_df <- read.csv("rules.csv")
# Create a rules object
rules <- as(rules_df, "rules")
The error received is"
Error in as(rules_df, "rules") :
no method or default for coercing “data.frame” to “rules”
I also tried to convert the data frame into a transaction, but this did not work either.
rules_df <- read.csv("rules.csv")
rules_df <- as.data.frame(unlist(rules_df))
rules_trans <- as(rules_df, "transactions")
plot(rules_trans, method = "paracoord")
I noticed that the structure of the object created by the apriori function and the data frame are different. But how can I work around this and generate the plot?
> str(example_rules)
Formal class 'rules' [package "arules"] with 4 slots
..@ lhs :Formal class 'itemMatrix' [package "arules"] with 3 slots
.. .. ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. .. .. ..@ i : int [1:22] 3 0 3 1 2 0 2 1 0 1 ...
.. .. .. .. ..@ p : int [1:21] 0 0 0 0 0 1 2 3 4 5 ...
.. .. .. .. ..@ Dim : int [1:2] 4 20
.. .. .. .. ..@ Dimnames:List of 2
.. .. .. .. .. ..$ : NULL
.. .. .. .. .. ..$ : NULL
.. .. .. .. ..@ factors : list()
.. .. ..@ itemInfo :'data.frame': 4 obs. of 1 variable:
.. .. .. ..$ labels: chr [1:4] "item1" "item2" "item3" "item4"
.. .. ..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
..@ rhs :Formal class 'itemMatrix' [package "arules"] with 3 slots
.. .. ..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. .. .. ..@ i : int [1:20] 3 2 0 1 0 3 1 3 0 2 ...
.. .. .. .. ..@ p : int [1:21] 0 1 2 3 4 5 6 7 8 9 ...
.. .. .. .. ..@ Dim : int [1:2] 4 20
.. .. .. .. ..@ Dimnames:List of 2
.. .. .. .. .. ..$ : NULL
.. .. .. .. .. ..$ : NULL
.. .. .. .. ..@ factors : list()
.. .. ..@ itemInfo :'data.frame': 4 obs. of 1 variable:
.. .. .. ..$ labels: chr [1:4] "item1" "item2" "item3" "item4"
.. .. ..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
..@ quality:'data.frame': 20 obs. of 5 variables:
.. ..$ support : num [1:20] 0.4 0.6 0.6 0.8 0.2 0.2 0.4 0.4 0.4 0.4 ...
.. ..$ confidence: num [1:20] 0.4 0.6 0.6 0.8 0.5 ...
.. ..$ coverage : num [1:20] 1 1 1 1 0.4 0.6 0.4 0.8 0.6 0.6 ...
.. ..$ lift : num [1:20] 1 1 1 1 0.833 ...
.. ..$ count : int [1:20] 2 3 3 4 1 1 2 2 2 2 ...
..@ info :List of 5
.. ..$ data : symbol example_transactions
.. ..$ ntransactions: int 5
.. ..$ support : num 0.02
.. ..$ confidence : num 0.2
.. ..$ call : chr "apriori(data = example_transactions, parameter = list(support = 0.02, confidence = 0.2), control = list(verbose = FALSE))"
> str(rules_df)
'data.frame': 20 obs. of 6 variables:
$ rules : chr "{} => {item4}" "{} => {item3}" "{} => {item1}" "{} => {item2}" ...
$ support : num 0.4 0.6 0.6 0.8 0.2 0.2 0.4 0.4 0.4 0.4 ...
$ confidence: num 0.4 0.6 0.6 0.8 0.5 ...
$ coverage : num 1 1 1 1 0.4 0.6 0.4 0.8 0.6 0.6 ...
$ lift : num 1 1 1 1 0.833 ...
$ count : int 2 3 3 4 1 1 2 2 2 2 ...
Does anyone know how to deal with this issue? I would immensely appreciate your support.
Once rules are written in a human-readable form using
write()
, then they cannot be easily read back in since lots of information is lost. One option is to use PMML instead (seewrite.PMML
). However, the result is an XML file which you cannot use in Excel.It is easy to filter using LIC in package
arules
.Will remove all rules with LIC <= 1. It technically uses improvement, but that has the same effect.