How to transform a dataframes row into columns in R?

1.1k Views Asked by At

I have a data frame which I need to transform. I need to change the rows into unique columns based on the value of a column.

ex:

The Input DataFrame

| column_1 | column_2 |
-----------------------
|   A      |     B    |
|   A      |     C    |
|   B      |     E    |
|   B      |     C    |
|   C      |     F    |
|   C      |     G    |

The Output DataFrame

| column_1 | column_2 | column_3 |
----------------------------------
|   A      |     B    |     C    |
|   B      |     E    |     C    |
|   C      |     F    |     G    |

The final DataFrame should have all the unique values in column_1 and the values from column_2 from input DataFrame will be added as new columns in new DataFrame i.e. Column_2 and Column_3.

I have tried to use reshape and melt packages in R but I am getting erroneous data frame.

3

There are 3 best solutions below

0
On BEST ANSWER

We can use the dplyr and cSplit function from the splitstackshape. It will also work for cases when there are more than two values per group.

library(dplyr)
library(splitstackshape)
dt2 <- dt %>%
  group_by(column_1) %>%
  summarise(column_2 = toString(column_2)) %>%
  cSplit("column_2") %>%
  setNames(paste0("column_", 1:ncol(.)))

dt2
   column_1 column_2 column_3
1:        A        B        C
2:        B        E        C
3:        C        F        G

Data

dt <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C"),
                 column_2 = c("B", "C", "E", "C", "F", "G"),
                 stringsAsFactors = FALSE)
1
On

Assuming there are always 2 lines for each value in column_1.

Extract the first line for each column_1 element in a first data.table, then the last line in a second data.table, finally merge them into a new data.table

library(data.table)

df <- data.frame(column_1=c("A","A","B","B","C","C"),column_2=c("B","C","E","C","F","G"))
df <- as.data.table(df)
setkey(df,column_1)
first_part <- df[J(unique(column_1)), mult = "first"]
second_part <- df[J(unique(column_1)), mult = "last"]
setnames(second_part,"column_2","column_3")

new_df <- merge(first_part,second_part, by="column_1")

   column_1 column_2 column_3
1:        A        B        C
2:        B        E        C
3:        C        F        G
0
On

Here's a short solution with dplyr and tidyr :

library(dplyr)
library(tidyr)
df %>% mutate(col = c("column_2","column_3")[duplicated(column_1)+1]) %>%
  spread(col,column_2)

#   column_1 column_2 column_3
# 1        A        B        C
# 2        B        E        C
# 3        C        F        G

And a general solution :

df <- data.frame(column_1 = c("A", "A", "B", "B", "C", "C","A","B","C"),
                 column_2 = c("B", "C", "E", "C", "F", "G","X","Y","Z"),
                 stringsAsFactors = FALSE)

df %>% group_by(column_1) %>%
  mutate(col=paste0("column_",row_number()+1)) %>%
  spread(col,column_2) %>% ungroup

# # A tibble: 3 x 4
#   column_1 column_2 column_3 column_4
# *    <chr>    <chr>    <chr>    <chr>
# 1        A        B        C        X
# 2        B        E        C        Y
# 3        C        F        G        Z