How to add total number of same string with new column in data matrix with R

139 Views Asked by At

Suppose I have a matrix, 5 by 5 with fruit names (5 class fruits). I want to add 5 new columns in this existing matrix with the total number of single fruits in each of the rows, and finally one extra row to show the summation of each same kind of fruits. the data matrix is like this,

    [,1]   [,2]   [,3]   [,4]   [,5]
[1,]mango        banana         mango
[2,]apple  kiwi         banana
[3,]            mango
[4,]mango       apple
[5,]                    orange

I want to get output (data frame) like this,

    [,1]  [,2]  [,3]  [,4]  [,5] [apple] [banana] [kiwi] [mango] [orange]
[1,]mango      banana       mango   0        1       0      2        0
[2,]apple kiwi       banana         1        1       1      0        0
[3,]           mango                0        0       0      1        0
[4,]mango      apple                1        0       0      1        0   
[5,]                 orange         0        0       0      0        1
[6,]                                2        2       1      4        1

I have tried grep, it is breaking down the whole matrix into a column vector. I actually do not have idea how to do it for whole data matrix with R. Here is the code,

fruits <- matrix(c("mango", "", "banana", "", "mango", "apple", "kiwi", "", "banana", "","", "", "mango", "", "", "mango", "", "apple", "", "", "", "", "", "orange", ""), nrow = 5, ncol = 5, byrow = TRUE)
fruits$apple <- length(grep("apple", fruits[1:nrow(fruits), 1:ncol(fruits)]))
fruits$banana <- length(grep("banana", fruits[1:nrow(fruits), 1:ncol(fruits)]))
fruits$kiwi <- length(grep("kiwi", fruits[1:nrow(fruits), 1:ncol(fruits)]))
fruits$mango <- length(grep("mango", fruits[1:nrow(fruits), 1:ncol(fruits)]))
fruits$orange <- length(grep("orange", fruits[1:nrow(fruits), 1:ncol(fruits)]))

Please help.

2

There are 2 best solutions below

1
On BEST ANSWER

We can also melt and cast the data frame with counts. Then add a row of sums:

library(reshape2)
library(tidyr)

#melt fruits matrix
g <- gather(as.data.frame(t(fruits)))

#cast data wide and bind to original matrix
d <- cbind(fruits, dcast(g, key~value)[-(1:2)])

#add row of sums
rbind(d,c(rep("", 5),colSums(d[-(1:5)])))
#       1    2      3      4     5 apple banana kiwi mango orange
# 1 mango      banana        mango     0      1    0     2      0
# 2 apple kiwi        banana           1      1    1     0      0
# 3             mango                  0      0    0     1      0
# 4 mango       apple                  1      0    0     1      0
# 5                   orange           0      0    0     0      1
# 6                                    2      2    1     4      1
2
On

It is not possible to create the output that you specify using a matrix, because a matrix contains values of a single type. The counts would thus be converted to characters, which is a solution, but maybe not what you want. I propose that you use a data frame to store your results.

I propose the following solution in four steps.

  1. Create a vector of all the fruit names in your matrix. I use an extra step to remove the empty string from that vector.

    all_fruits <- unique(as.vector(fruits))
    all_fruits <- all_fruits[nchar(all_fruits) > 0]
    
  2. Create a list that contains the counts per row of each fruit in all_fruits.

    fruit_count <- lapply(all_fruits, function(fruit)
                      rowSums(matrix(grepl(fruit, fruits), nrow = nrow(fruits))))
    names(fruit_count) <- all_fruits
    

    This part is a bit tricky, so I add a few words. You need to use grepl in order to return a logical vector. Unfortunately, the dimension argument of fruits is lost and a simple vector is returned that must be converted back to a matrix. rowSums is then used to sum up the number of times that the search term (i.e., the name of the fruit) has been found in each row. This works, because TRUE is converted to 1 and FALSE to 0 when doing this.

  3. Convert fruits to a data frame and add additional row with empty characters. Convert fruit_count to data frame and add columns sums.

    fruits_df <- rbind(as.data.frame(fruits), "")
    fruit_count_df <- as.data.frame(fruit_count)
    fruit_count_df[nrow(fruits) + 1, ] <- colSums(fruit_count_df)
    
  4. Put both data frames together.

    out <- data.frame(fruits_df, fruit_count_df)
    out
    ##      X1   X2     X3     X4    X5 mango apple kiwi banana orange
    ## 1 mango      banana        mango     2     0    0      1      0
    ## 2 apple kiwi        banana           0     1    1      1      0
    ## 3             mango                  1     0    0      0      0
    ## 4 mango       apple                  1     1    0      0      0
    ## 5                   orange           0     0    0      0      1
    ## 6                                    4     2    1      2      0