R Spread function with duplicates- still can't get to work after adding transient row

Question

R Spread function with duplicates- still can't get to work after adding transient row

208 Views Asked by user5691676 At 16 December 2016 at 15:21

trying to get the spread() function to work with duplicates in the key column- yes, this has been covered before but I can't seem to get it to work and I've spent the better part of a day on it (somewhat new to R).

I have two columns of data. The first column 'snowday' represents the first day of a winter season, with the corresponding snow depth in the 'depth' column. This is several years of data (~62 years). So there should be sixty two years of first, second, third, etc days for the snowday column- this produces duplicates in snowday:

    snowday row depth
       1   1     0
       1   2     0
       1   3     0
       1   4     0
       1   5     0
       1   6     0
...

      75 4633    24
      75 4634     4
      75 4635     6
      75 4636    20
      75 4637    29
      75 4638     1

I added a "row" column to make the data frame more transient (which I vaguely understand to be hones so 1:4638 rows is the total measurements taken over ~62 years at 75 days per year . Now i'd like to spread it wide:

wide <- spread(seasondata, key = snowday, value =  depth, fill = 0)

and i get all zeros:

row 1 2 3 4 5 6 7 8 9 10 11 12 13 14 
 1 0 0 0 0 0 0 0 0 0  0  0   0  0 0
 2 0 0 0 0 0 0 0 0 0  0  0   0  0 0
 3 0 0 0 0 0 0 0 0 0  0  0   0  0 0

what I want it to look like is something like this (the columns are defined by the "snowday" and the row values are the various depths recorded on for that particular day over the various years- e.g. days 1 through 11 :

   1 2 3 4 5 6 7 8 9 10 11 12 13 14 
   2 1 3 4 0 0 1 0 2  8  9 19  0 3
   0 8 0 0 0 4 0 6 6  0  1  0  2 0
   3 5 0 0 0 2 0 1 0  2  7  0 12 4

I think I'm fundamentally missing something here- I've tried working through drop=TRUE or convert = TRUE, and the output values are either all zeros or NA's depending on how I tinker. Also, all values in the data.frame(seasondata) are integers. Any thoughts?

Original Q&A

There are 1 best solutions below

**Weihuang Wong** · Answer 1 · 2016-12-17T05:17:23.443000

It seems to me what you wish to do is to split up the the depth column according to values of snowday, and then bind all the 75 columns together.

There is a complication, in that 62*75 is not 4638, so I assume we do not observe 75 snowdays in some years. That is, some of the 75 columns (snowdays) will not have 62 observations. We'll make sure all 75 columns are 62 entries long by filling short columns up with NAs.

I make some fake data as an example. We observe 3 "years" of data for snowdays 1 and 2, but only 2 "years" of data for snowdays 3 and 4.

set.seed(1)
seasondata <- data.frame(
  snowday = c(rep(1:2, each = 3), rep(3:4, each = 2)),
  depth = round(runif(10, 0, 10), 0))
#    snowday depth
# 1        1     3
# 2        1     4
# 3        1     6
# 4        2     9
# 5        2     2
# 6        2     9
# 7        3     9
# 8        3     7
# 9        4     6
# 10       4     1

We first figure out how long a column should be. In your case, m == 62. In my example, m == 3 (the years of data).

m <- max(table(seasondata$snowday))

Now, we use the by function to split up depth by values of snowdays, and fill short columns with NAs, and finally cbind all the columns together:

out <- do.call(cbind, 
  by(seasondata$depth, seasondata$snowday,
    function(x) {
      c(x, rep(NA, m - length(x)))
    }
  )
)
out
#      1 2  3  4
# [1,] 3 9  9  6
# [2,] 4 2  7  1
# [3,] 6 9 NA NA

Using spread:

You can use spread if you wish. In this case, you have to define row correctly. row should be 1 for the first first snowday (snowday == 1), 2 for the second first snowday, etc. row should also be 1 for the first second snowday, 2 for the second second snowday, etc.

seasondata$row <- unlist(sapply(rle(seasondata$snowday)$lengths, seq_len))
seasondata
#    snowday depth row
# 1        1     3   1
# 2        1     4   2
# 3        1     6   3
# 4        2     9   1
# 5        2     2   2
# 6        2     9   3
# 7        3     9   1
# 8        3     7   2
# 9        4     6   1
# 10       4     1   2

Now we can use spread:

library(tidyr)
spread(seasondata, key = snowday, value = depth, fill = NA)
#   row 1 2  3  4
# 1   1 3 9  9  6
# 2   2 4 2  7  1
# 3   3 6 9 NA NA

R Spread function with duplicates- still can't get to work after adding transient row

There are 1 best solutions below

Related Questions in R

Related Questions in DUPLICATES

Related Questions in TIDYR

Related Questions in SPREAD

Trending Questions

Popular # Hahtags

Popular Questions