I have a R DataFrame that consists of the columns ID
and Items
. If I filter for e.g. a certain term like 'RABBIT NIGHT LIGHT' I get 178 records.
But if I read in the data with read.transactions(), I only get 167 records for the term. What is the reason for this and how can I correct it? I suspect that there is an encoding problem.
You can download my dataset here.
MWE:
df <- read.csv(file = 'orders.csv', header = TRUE, sep = ",", strip.white = TRUE)
dim(df[df$Items == 'RABBIT NIGHT LIGHT',])
>Output: 178, 2
write.csv(df, "_transactions.csv", row.names=FALSE)
transactions <- read.transactions(
file = "_transactions.csv",
format = "single",
sep = ",",
cols=c("Id","Items"),
rm.duplicates = T,
header = TRUE
)
summary(transactions)
>Output: 167 Items for RABBIT NIGHT LIGHT
How can the difference be explained and what might be a fix to this?