I have a dataframe including the values for each cell (possibly more than one value) and the row and the column index.


df = data.frame(values = c(1,"Sven", 20,"Mueller","sept",2,30,"John","Mar","Hynes","Marc"), 
                colI = c(1,2,3,2,4,1,3,2,4,2,2), rowI = c(1,1,1,1,1,2,2,2,2,2,2))

And I want to get something similar to the following data.frame:

df_final= data.frame(Index = c(1,2), name = c("Sven, Mueller", "John, Hynes, Marc"), age = c(20,30), 
                     month = c("sept","Mar"))

However, I haven't been able to do anything and I have also not found a solution online. I am not able to find a solution to bring the values to their corresponding place in the dataframe and I think that it is even a bigger challenge that the cells can contain a different amount of values.

Thanks for your help.

3

There are 3 best solutions below

1
On

You could use dplyr, tidyr and stringr, all included in tidyverse:

df %>%
# bring your data into a wider format
  pivot_wider(id_cols=rowI, names_from=colI, values_from=values, values_fn=list) %>% 
# remove the nested listing
  unnest(everything()) %>%
# rename the columns
  select(Index = rowI, name=`2`, age=`3`, month=`4`) %>%
# group all rows based on the index
  group_by(Index) %>%
# concatenate the name column
  mutate(name=str_c(name, collapse=", ")) %>%
# remove duplicates
  distinct()

returns

# A tibble: 2 x 4
# Groups:   Index [2]
  Index name              age   month
  <dbl> <chr>             <chr> <chr>
1     1 Sven, Mueller     20    sept 
2     2 John, Hynes, Marc 30    Mar 

Note: I changed your input data a little bit and added a single 2 to your rowI column (see Maurits Evers' comment).

1
On

Using base R, you can first aggregate your data to one comma separated string for each row and column index and then use unstack.

temp <- aggregate(values~colI + rowI, df, toString)
unstack(temp, values~colI)

#  X1                X2 X3   X4
#1  1     Sven, Mueller 20 sept
#2  2 John, Hynes, Marc 30  Mar

data

df <- structure(list(values = c("1", "Sven", "20", "Mueller", "sept", 
"2", "30", "John", "Mar", "Hynes", "Marc"), colI = c(1, 2, 3, 
2, 4, 1, 3, 2, 4, 2, 2), rowI = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 
2, 2)), class = "data.frame", row.names = c(NA, -11L))
0
On

Another solution

df %>% 
  pivot_wider(rowI, names_from = colI, values_from = values, values_fn = toString) %>% 
  select(-rowI) %>% 
  purrr::set_names(c("ID", "name", "age", "month"))