I have a dataset with 100,000 rows in a transaction format as below
B038-82C81778E81C Toy Story
B038-82C81778E81C Planet of the apes
B038-82C81778E81C Iron Man
9C05-EE9B44E8C18F Bruce Almighty
9C05-EE9B44E8C18F Iron Man
9C05-EE9B44E8C18F Toy Story
8F59-9956070D8005 Toy Story
8F59-9956070D8005 Gravity
8F59-9956070D8005 Iron Man
8F59-9956070D8005 Gone
B52F-9936734525AF Planet of the Apes
B52F-9936734525AF Bruce Almighty
I want to convert it in a matrix format as below (or TRUE/ FALSE Flag)
Matrix Toy Story Planet of the Apes Iron Man Bruce Almighty Gone Gravity
B038-82C81778E81C 1 1 1 0 0 0
9C05-EE9B44E8C18F 1 0 1 1 0 0
8F59-9956070D8005 1 0 1 0 1 1
B52F-9936734525AF 0 1 0 1 0 0
I have tried the following steps
TrnsDataset1<-read.transactions("~/Desktop/movieswid_1Copy.txt", format= c("single"), sep="\t", cols = c(1,2), rm.duplicates=TRUE);
L <- as(TrnsDataset1,"list");
M <- as(L,"matrix")
CM<- as (M,"ngCMatrix");
But, in my List conversion I am getting the output as
B038-82C81778E81C c("Toy Story\nB038-82C81778E81C\tPlanet of the apes\nB038-82C81778E81C\tIron Man")
9C05-EE9B44E8C18F c("Bruce Almighty","Iron Man","Toy Story")
So some rows are perfect but in some the Unique id is being added in the movie list with \t and \n
I want the list in the below format 9C05-EE9B44E8C18F c("Bruce Almighty","Iron Man","Toy Story")
this way I believe I will be easily achieve the required result. Would really appreciate your help.
I'm a bit confused because you say you want two things. If you just want the sparse matrix, then you can skip the list and standard matrix transformation. you can just do
This results in
which is a matrix of true/false values (here abbreviate to fit in space). There is no need to go through the list form at all.