Read Transactions (arules) has less items than R Data Frame?

118 Views Asked by At

I have a R DataFrame that consists of the columns ID and Items. If I filter for e.g. a certain term like 'RABBIT NIGHT LIGHT' I get 178 records. But if I read in the data with read.transactions(), I only get 167 records for the term. What is the reason for this and how can I correct it? I suspect that there is an encoding problem. You can download my dataset here.

MWE:

df <- read.csv(file = 'orders.csv', header = TRUE, sep = ",", strip.white = TRUE)
dim(df[df$Items == 'RABBIT NIGHT LIGHT',])
>Output: 178, 2

write.csv(df, "_transactions.csv", row.names=FALSE)
transactions <- read.transactions(
        file = "_transactions.csv",
        format = "single",
        sep = ",",
        cols=c("Id","Items"),
        rm.duplicates = T,
        header = TRUE
)

summary(transactions)

>Output: 167 Items for RABBIT NIGHT LIGHT 

How can the difference be explained and what might be a fix to this?

0

There are 0 best solutions below