trying to get the spread() function to work with duplicates in the key column- yes, this has been covered before but I can't seem to get it to work and I've spent the better part of a day on it (somewhat new to R).
I have two columns of data. The first column 'snowday' represents the first day of a winter season, with the corresponding snow depth in the 'depth' column. This is several years of data (~62 years). So there should be sixty two years of first, second, third, etc days for the snowday column- this produces duplicates in snowday:
snowday row depth
1 1 0
1 2 0
1 3 0
1 4 0
1 5 0
1 6 0
...
75 4633 24
75 4634 4
75 4635 6
75 4636 20
75 4637 29
75 4638 1
I added a "row" column to make the data frame more transient (which I vaguely understand to be hones so 1:4638 rows is the total measurements taken over ~62 years at 75 days per year . Now i'd like to spread it wide:
wide <- spread(seasondata, key = snowday, value = depth, fill = 0)
and i get all zeros:
row 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0
what I want it to look like is something like this (the columns are defined by the "snowday" and the row values are the various depths recorded on for that particular day over the various years- e.g. days 1 through 11 :
1 2 3 4 5 6 7 8 9 10 11 12 13 14
2 1 3 4 0 0 1 0 2 8 9 19 0 3
0 8 0 0 0 4 0 6 6 0 1 0 2 0
3 5 0 0 0 2 0 1 0 2 7 0 12 4
I think I'm fundamentally missing something here- I've tried working through drop=TRUE or convert = TRUE, and the output values are either all zeros or NA's depending on how I tinker. Also, all values in the data.frame(seasondata) are integers. Any thoughts?
It seems to me what you wish to do is to split up the the
depthcolumn according to values ofsnowday, and then bind all the 75 columns together.There is a complication, in that 62*75 is not 4638, so I assume we do not observe 75 snowdays in some years. That is, some of the 75 columns (snowdays) will not have 62 observations. We'll make sure all 75 columns are 62 entries long by filling short columns up with
NAs.I make some fake data as an example. We observe 3 "years" of data for snowdays 1 and 2, but only 2 "years" of data for snowdays 3 and 4.
We first figure out how long a column should be. In your case,
m == 62. In my example,m == 3(the years of data).Now, we use the
byfunction to split updepthby values ofsnowdays, and fill short columns with NAs, and finallycbindall the columns together:Using
spread:You can use
spreadif you wish. In this case, you have to definerowcorrectly.rowshould be 1 for the first first snowday (snowday == 1), 2 for the second first snowday, etc.rowshould also be 1 for the first second snowday, 2 for the second second snowday, etc.Now we can use
spread: