Visualizing association rules imported from a .csv file

71 Views Asked by At

I generated association rules using the apriori algorithm in R. Normally, I can visualize the rules generated using the code below:

# Load the arules package
library(arules)
library(arulesViz)

# Generate example transaction data
example_transactions <- list(
  c("item1", "item2", "item3"),
  c("item1", "item3"),
  c("item2", "item4"),
  c("item1", "item2", "item4"),
  c("item2", "item3")
)

# Convert the transaction data to a transactions object
example_transactions <- as(example_transactions, "transactions")

# Mine association rules
example_rules <- apriori(example_transactions, 
                         parameter = list(support = 0.02, confidence = 0.2),
                         control = list(verbose = FALSE))

# Plot the association rules using the "paracoord" method
plot(example_rules, method = "paracoord")

enter image description here

I exported the rules to CSV and performed a manual rule selection using the Lift Increase Criterion (LIC) since I could not do this in R. Applying the LIC criterion led to a reduction in the number of rules (LIC involves selecting rules which experience an increase - by a particular threshold - in the lift of their parent rule when a new item is added to the pre-existing item in the lhs of parent rule).

Now, I want to import the manually selected rules into R again and visualize them using plot(). However, when I do this, the code does not work properly.

write(example_rules,
      file = "rules.csv",
      sep = ",",
      quote = TRUE,
      row.names = FALSE)


# Read the CSV file into a data frame
rules_df <- read.csv("rules.csv")

# Create a rules object
rules <- as(rules_df, "rules")

The error received is"

Error in as(rules_df, "rules") : 
  no method or default for coercing “data.frame” to “rules”

I also tried to convert the data frame into a transaction, but this did not work either.

rules_df <- read.csv("rules.csv")

rules_df <- as.data.frame(unlist(rules_df))
rules_trans <- as(rules_df, "transactions")
plot(rules_trans, method = "paracoord")

enter image description here

I noticed that the structure of the object created by the apriori function and the data frame are different. But how can I work around this and generate the plot?

> str(example_rules)
Formal class 'rules' [package "arules"] with 4 slots
  ..@ lhs    :Formal class 'itemMatrix' [package "arules"] with 3 slots
  .. .. ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. .. .. ..@ i       : int [1:22] 3 0 3 1 2 0 2 1 0 1 ...
  .. .. .. .. ..@ p       : int [1:21] 0 0 0 0 0 1 2 3 4 5 ...
  .. .. .. .. ..@ Dim     : int [1:2] 4 20
  .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. ..@ factors : list()
  .. .. ..@ itemInfo   :'data.frame':   4 obs. of  1 variable:
  .. .. .. ..$ labels: chr [1:4] "item1" "item2" "item3" "item4"
  .. .. ..@ itemsetInfo:'data.frame':   0 obs. of  0 variables
  ..@ rhs    :Formal class 'itemMatrix' [package "arules"] with 3 slots
  .. .. ..@ data       :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
  .. .. .. .. ..@ i       : int [1:20] 3 2 0 1 0 3 1 3 0 2 ...
  .. .. .. .. ..@ p       : int [1:21] 0 1 2 3 4 5 6 7 8 9 ...
  .. .. .. .. ..@ Dim     : int [1:2] 4 20
  .. .. .. .. ..@ Dimnames:List of 2
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. .. ..$ : NULL
  .. .. .. .. ..@ factors : list()
  .. .. ..@ itemInfo   :'data.frame':   4 obs. of  1 variable:
  .. .. .. ..$ labels: chr [1:4] "item1" "item2" "item3" "item4"
  .. .. ..@ itemsetInfo:'data.frame':   0 obs. of  0 variables
  ..@ quality:'data.frame': 20 obs. of  5 variables:
  .. ..$ support   : num [1:20] 0.4 0.6 0.6 0.8 0.2 0.2 0.4 0.4 0.4 0.4 ...
  .. ..$ confidence: num [1:20] 0.4 0.6 0.6 0.8 0.5 ...
  .. ..$ coverage  : num [1:20] 1 1 1 1 0.4 0.6 0.4 0.8 0.6 0.6 ...
  .. ..$ lift      : num [1:20] 1 1 1 1 0.833 ...
  .. ..$ count     : int [1:20] 2 3 3 4 1 1 2 2 2 2 ...
  ..@ info   :List of 5
  .. ..$ data         : symbol example_transactions
  .. ..$ ntransactions: int 5
  .. ..$ support      : num 0.02
  .. ..$ confidence   : num 0.2
  .. ..$ call         : chr "apriori(data = example_transactions, parameter = list(support = 0.02, confidence = 0.2), control = list(verbose = FALSE))"
> str(rules_df)
'data.frame':   20 obs. of  6 variables:
 $ rules     : chr  "{} => {item4}" "{} => {item3}" "{} => {item1}" "{} => {item2}" ...
 $ support   : num  0.4 0.6 0.6 0.8 0.2 0.2 0.4 0.4 0.4 0.4 ...
 $ confidence: num  0.4 0.6 0.6 0.8 0.5 ...
 $ coverage  : num  1 1 1 1 0.4 0.6 0.4 0.8 0.6 0.6 ...
 $ lift      : num  1 1 1 1 0.833 ...
 $ count     : int  2 3 3 4 1 1 2 2 2 2 ...

Does anyone know how to deal with this issue? I would immensely appreciate your support.

1

There are 1 best solutions below

0
On

Once rules are written in a human-readable form using write(), then they cannot be easily read back in since lots of information is lost. One option is to use PMML instead (see write.PMML). However, the result is an XML file which you cannot use in Excel.

It is easy to filter using LIC in package arules.

example_rules[!is.redundant(example_rules, measure = "lift")]

Will remove all rules with LIC <= 1. It technically uses improvement, but that has the same effect.