Convert arules Transaction Data to an item matrix in R programming

3.8k Views Asked by At

I have a dataset with 100,000 rows in a transaction format as below

B038-82C81778E81C   Toy Story
B038-82C81778E81C   Planet of the apes
B038-82C81778E81C   Iron Man
9C05-EE9B44E8C18F   Bruce Almighty
9C05-EE9B44E8C18F   Iron Man
9C05-EE9B44E8C18F   Toy Story
8F59-9956070D8005   Toy Story
8F59-9956070D8005   Gravity
8F59-9956070D8005   Iron Man
8F59-9956070D8005   Gone
B52F-9936734525AF   Planet of the Apes
B52F-9936734525AF   Bruce Almighty

I want to convert it in a matrix format as below (or TRUE/ FALSE Flag)

Matrix              Toy Story  Planet of the Apes  Iron Man  Bruce Almighty   Gone  Gravity
B038-82C81778E81C    1             1                 1             0            0     0
9C05-EE9B44E8C18F    1             0                 1             1            0     0 
8F59-9956070D8005    1             0                 1             0            1     1
B52F-9936734525AF    0             1                 0             1            0     0

I have tried the following steps

TrnsDataset1<-read.transactions("~/Desktop/movieswid_1Copy.txt", format= c("single"), sep="\t", cols = c(1,2), rm.duplicates=TRUE);
L <- as(TrnsDataset1,"list");
M <- as(L,"matrix")
CM<- as (M,"ngCMatrix");

But, in my List conversion I am getting the output as

B038-82C81778E81C   c("Toy Story\nB038-82C81778E81C\tPlanet of the apes\nB038-82C81778E81C\tIron Man")
9C05-EE9B44E8C18F   c("Bruce Almighty","Iron Man","Toy Story")

So some rows are perfect but in some the Unique id is being added in the movie list with \t and \n

I want the list in the below format 9C05-EE9B44E8C18F c("Bruce Almighty","Iron Man","Toy Story")

this way I believe I will be easily achieve the required result. Would really appreciate your help.

1

There are 1 best solutions below

0
On

I'm a bit confused because you say you want two things. If you just want the sparse matrix, then you can skip the list and standard matrix transformation. you can just do

TrnsDataset1 <- read.transactions(...);
mm <- t(as(TrnsDataset1,"ngCMatrix"))

This results in

4 x 6 sparse Matrix of class "ngCMatrix"
                  Bruce Almighty Gone Gravity Iron Man Planet Toy Story
8F59-9956070D8005              .    |       |        |      .         |
9C05-EE9B44E8C18F              |    .       .        |      .         |
B038-82C81778E81C              .    .       .        |      |         |
B52F-9936734525AF              |    .       .        .      |         .

which is a matrix of true/false values (here abbreviate to fit in space). There is no need to go through the list form at all.