How to combine two melt / gather / pivot_longer commands in R into one command?

38 Views Asked by At

I have a for my understanding rather simple problem, but I could not find a solution so far.

I have a dataset with 4 columns that start with time (time1, time2...) and 4 columns that start with count (count1, count2...).

#Step 1: Create data
time_cols <- data.frame(time1 = sample(1:50, 10, replace = TRUE),
                        time2 = sample(1:50, 10, replace = TRUE),
                        time3 = sample(1:50, 10, replace = TRUE),
                        time4 = sample(1:50, 10, replace = TRUE))

count_cols <- data.frame(count1 = sample(1:4, 10, replace = TRUE),
                         count2 = sample(1:4, 10, replace = TRUE),
                         count3 = sample(1:4, 10, replace = TRUE),
                         count4 = sample(1:4, 10, replace = TRUE))

# Step 2: Create the "id" column
id <- 1:10

# Step 3: Combine all the columns to form the dataframe
df <- cbind(time_cols, count_cols, id)


   time1 time2 time3 time4 count1 count2 count3 count4 id
1     40    47    24    48      1      2      3      2  1
2     16    15    39    16      1      4      2      1  2
3     16    41    16    21      1      3      1      4  3
4     16    47    14     3      4      2      1      1  4
5     31    28    29    30      3      4      4      1  5
6      5    15    41    13      4      2      2      4  6
7     46    19    29    30      1      1      3      1  7
8     28    43    10    27      2      2      3      2  8
9     23    37    35    49      2      4      2      2  9
10    43    28     6    20      3      3      3      1 10

I want to reshape the dataset from wide to long and within one command: I want to gather the time columns into a variable column "timepoint" and a value column "time_count_value". At the same time, I also want to gather the count columns the same way into "count_no" and "count_value". The output should look like this:

  timepoint time_count_value count_no count_value id
1      time1               40   count1           1  1
2      time2               47   count2           2  1
3      time3               24   count3           3  1
4      time4               48   count4           2  1
5      time1               16   count1           1  2
6      time2               15   count2           4  2
7      time3               39   count3           2  2
8      time4               16   count4           1  2
9      time1               16   count1           1  3
10     time2               41   count2           3  3

So I do not want to gather multiple columns into one (for this problem, there are multiple answers on stackoverflow), but in some way I want to perfom "two melt/gather functions simultaneously".

I tried the following with melt:

example_trials2 <- setnames(
  reshape2::melt(example_trials2, measure = data.table:::patterns("^time", "^count"),
                 value.name =  c("time_count_value", "count_value"),
                 variable.name = c("timepoint", "count_no")))

However, it first told me "could not find function "patterns". I therefore added data.table::: before patterns, now it says "Error: Pattern(s) not found". I also tried reinstalling data.table and installing an older version of the package, without success.

I am confident that this function is actually doing what I am searching for, but I don't get why it always throws an error with the regular expression/patterns.

So: I would be happy to hear how to fix the error with patterns or if you have a different idea for using melt or another package, e.g. pivot_longer.

2

There are 2 best solutions below

2
ThomasIsCoding On

Here is a base R option using reshape + transform

dfout <- transform(
    reshape(
        setNames(df, gsub("(\\d+)", ".\\1", names(df))),
        direction = "long",
        idvar = "id",
        varying = -length(df),
        timevar = "grp"
    ),
    timepoint = paste0("time", grp),
    count_no = paste0("count", grp)
)[c("timepoint", "time", "count_no", "count", "id")]


dfout <- `row.names<-`(dfout[order(dfout$id), ], NULL)

and you will see

> head(dfout, 10)
   timepoint time count_no count id
1      time1   14   count1     4  1
2      time2   21   count2     3  1
3      time3   41   count3     2  1
4      time4   33   count4     4  1
5      time1    4   count1     2  2
6      time2   21   count2     4  2
7      time3   25   count3     2  2
8      time4   20   count4     2  2
9      time1   39   count1     4  3
10     time2   42   count2     2  3

data

> dput(df)
structure(list(time1 = c(14L, 4L, 39L, 1L, 34L, 23L, 43L, 14L,
18L, 33L), time2 = c(21L, 21L, 42L, 46L, 10L, 7L, 9L, 15L, 21L,
37L), time3 = c(41L, 25L, 46L, 37L, 37L, 34L, 42L, 25L, 44L,
15L), time4 = c(33L, 20L, 35L, 6L, 10L, 42L, 38L, 47L, 20L, 28L
), count1 = c(4L, 2L, 4L, 1L, 3L, 2L, 1L, 4L, 4L, 1L), count2 = c(3L,
4L, 2L, 2L, 3L, 3L, 2L, 2L, 4L, 4L), count3 = c(2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 3L, 3L), count4 = c(4L, 2L, 3L, 3L, 4L, 2L, 4L,
3L, 3L, 1L), id = 1:10), class = "data.frame", row.names = c(NA,
-10L))

> df
   time1 time2 time3 time4 count1 count2 count3 count4 id
1     14    21    41    33      4      3      2      4  1
2      4    21    25    20      2      4      2      2  2
3     39    42    46    35      4      2      1      3  3
4      1    46    37     6      1      2      2      3  4
5     34    10    37    10      3      3      2      4  5
6     23     7    34    42      2      3      2      2  6
7     43     9    42    38      1      2      2      4  7
8     14    15    25    47      4      2      1      3  8
9     18    21    44    20      4      4      3      3  9
10    33    37    15    28      1      4      3      1 10
0
Wimpel On

I believe this is a valid data.table solution.

Using the sample data provided by @ThomasIsCoding (and make sure you use data.table::melt(), not reshape2::melt() !)

library(data.table)
# melt
final <- data.table::melt(setDT(df), id.vars = "id", measure.vars=measure(value.name, number=as.integer, pattern="(.*)([0-9])"))
# order on id
setkey(final, id)

Key: <id>
       id number  time count
    <int>  <int> <int> <int>
 1:     1      1    14     4
 2:     1      2    21     3
 3:     1      3    41     2
 4:     1      4    33     4
 5:     2      1     4     2
 6:     2      2    21     4
 7:     2      3    25     2
 8:     2      4    20     2
 9:     3      1    39     4
10:     3      2    42     2
...

the number-column indicates what time/count combination is presented on each row.