Count freq of word and which column it appears in using R

60 Views Asked by At

I have a dataset of a series of names in different columns. Each column determines the time in which the names were entered into the system. Is it possible to find the number of times ALL the names appear and the most recent column entry. I added a picture to show how the dataset works.

enter image description here

1

There are 1 best solutions below

3
On

Here's one method:

library(dplyr)
set.seed(42)
dat <- setNames(as.data.frame(replicate(4, sample(letters, size = 10, replace = TRUE))), 1:4)
dat
#    1 2 3 4
# 1  q x c c
# 2  e g i z
# 3  a d y a
# 4  y y d j
# 5  j e e x
# 6  d n m k
# 7  r t e o
# 8  z z t v
# 9  q r b z
# 10 o o h h

tidyverse

library(dplyr)
library(tidyr)
pivot_longer(dat, everything(), names_to = "colname", values_to = "word") %>%
  mutate(colname = as.integer(colname)) %>%
  group_by(word) %>%
  summarize(n = n(), latest = max(colname), .groups = "drop")
# # A tibble: 20 x 3
#    word      n latest
#    <chr> <int>  <int>
#  1 a         2      4
#  2 b         1      3
#  3 c         2      4
#  4 d         3      3
#  5 e         4      3
#  6 g         1      2
#  7 h         2      4
#  8 i         1      3
#  9 j         2      4
# 10 k         1      4
# 11 m         1      3
# 12 n         1      2
# 13 o         3      4
# 14 q         2      1
# 15 r         2      2
# 16 t         2      3
# 17 v         1      4
# 18 x         2      4
# 19 y         3      3
# 20 z         4      4

data.table

library(data.table)
melt(as.data.table(dat), integer(0), variable.name = "colname", value.name = "word")[
  , colname := as.integer(colname)
    ][, .(n = .N, latest = max(colname)), by = .(word) ]

(though it is not sorted by word, the values are the same)