R - Transposing portions of a data frame

1.1k Views Asked by At

I have a .dbf file I exported from ArcGIS 10.1 and I need to reorganize the it. An example of the data is:

       V1               V2
40.000000000000000 41.000000000000000

40.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000

41.000000000000000 42.000000000000000

41.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000

42.000000000000000 41.000000000000000

42.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000

43.000000000000000 42.000000000000000

I need the data in a format where there is only one row for each unique value in the first column, with all of the corresponding values from the second column now appearing in that row, for example:

  V1                   V2              V3                  V4
40.000000000000000 41.000000000000000 42.000000000000000

41.000000000000000 40.000000000000000 42.000000000000000 43.000000000000000

42.000000000000000 40.000000000000000 41.000000000000000 43.000000000000000

43.000000000000000 41.000000000000000 42.000000000000000

If anyone can help me with this problem I would appreciate it. Thanks!

3

There are 3 best solutions below

1
On

You can split up your data frame using the split function on the first column and the use lapply to extract your vectors:

dat = data.frame(X1=c(40, 40, 41, 41, 41, 42, 42, 42, 43, 43),
                 X2=c(41, 42, 40, 42, 43, 40, 41, 43, 41, 42))
res <- lapply(split(dat, dat[,1]), function(d) c(d[1,1], sort(unique(d[,2]))))
res
# $`40`
# [1] 40 41 42
# 
# $`41`
# [1] 41 40 42 43
# 
# $`42`
# [1] 42 40 41 43
# 
# $`43`
# [1] 43 41 42

Most would probably prefer to keep the data in this format, but you can also combine the list into a matrix, right-padding the vectors with NA values:

max.len <- max(unlist(lapply(res, length)))
do.call(rbind, lapply(res, function(x) { length(x) <- max.len ; x }))
#    [,1] [,2] [,3] [,4]
# 40   40   41   42   NA
# 41   41   40   42   43
# 42   42   40   41   43
# 43   43   41   42   NA
4
On

This is essentially a reshape problem, but you don't have a "time" variable.

You can easily create a "time" variable as follows:

dat$time <- with(dat, ave(X1, X1, FUN = seq_along))

From there, use reshape from base R...

reshape(dat, direction = "wide", idvar="X1", timevar="time")
#   X1 X2.1 X2.2 X2.3
# 1 40   41   42   NA
# 3 41   40   42   43
# 6 42   40   41   43
# 9 43   41   42   NA

... or dcast from "reshape2"...

library(reshape2)
dcast(dat, X1 ~ time, value.var="X2")
#   X1  1  2  3
# 1 40 41 42 NA
# 2 41 40 42 43
# 3 42 40 41 43
# 4 43 41 42 NA
0
On

You could also do this in dplyr

library(dplyr)
library(tidyr)
  dat%>% 
  group_by(X1) %>%
  mutate(Time=seq_along(X1)) 
  %>%spread(Time,X2)
 #Source: local data frame [4 x 4]

 #X1  1  2  3
#1 40 41 42 NA
#2 41 40 42 43
#3 42 40 41 43
#4 43 41 42 NA