I have a dataframe as follows:
df <- data.frame(A = c(2, 0, 1), B = c(0, 3, 2))
# A B
# 1 2 0
# 2 0 3
# 3 1 2
The number in each cell indicates the times for which the corresponding column name should repeat. The replicates should be spliced by semicolons(;) to a single string. The expected output turns out to
# A B
# 1 A;A <NA>
# 2 <NA> B;B;B
# 3 A B;B
I'm searching a efficient way to deal with a much larger dataset:
set.seed(1234)
df <- as.data.frame(matrix(sample(0:5, 1e4*26, replace = TRUE), 1e4, 26))
names(df) <- LETTERS
# A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
# 1 3 0 3 5 4 0 3 2 2 1 3 3 4 3 2 4 0 1 4 5 2 5 5 2 0 0
# 2 1 5 1 0 3 3 2 0 1 5 5 2 5 0 2 5 1 1 2 4 5 5 0 5 0 0
# 3 5 5 2 0 1 4 5 4 0 5 5 1 1 1 2 2 4 5 4 5 5 5 0 4 0 0
# ...
# [ reached 'max' / getOption("max.print") -- omitted 9997 rows ]
I prefer base or tidyverse solutions. data.table is welcome but I'm unfamiliar to it though.
One
baseoption: