Duplicate rows while using Merge function in R - but I dont want the sum

514 Views Asked by At

So here's my problem, I have about 40 datasets, all csv files that contain only two columns, (a) Date and (b) Price (for each dataset the price column is named as its country).. I used the merge function as follows to consolidate all data into a single dataset with one date column and several price columns. This was the function I used:

merged <- Reduce(function(x, y) merge(x, y, by="Date", all=TRUE), list(a,b,c,d,e,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag,ah,ai,aj,ak,al,am,an))

What has happened is I have for instance in date column, 3 values for same date but the corresponding country values are split. e.g.:

# Date          India China South Korea

# 01-Jan-2000   5445   NA   4445   NA
# 01-Jan-2000   NA    1234  NA     NA
# 01-Jan-2000   NA     NA   NA    5678

I actually want

# 01-Jan-2000   5445  1234  4445  5678

I dont know how to get this, as the other questions related to this topic ask for summation of values which I clearly do not need. This is a simple example. Unfortunately I have daily data from Jan 2000 to November 2016 for about 43 countries, all messed up. Any help to solve this would be appreciated.

1

There are 1 best solutions below

0
On

I would append all dataframes using rbind and reshape the result with spread(). As merging depends on the dataframe you start with.

Reproducable example:

library(dplyr)

a <- data.frame(date = Sys.Date()-1:10, cntry = "China", price=round(rnorm(10,20,5),2))
b <- data.frame(date = Sys.Date()-6:15, cntry = "Netherlands", price=round(rnorm(10,50,10),2))
c <- data.frame(date = Sys.Date()-11:20, cntry = "USA", price=round(rnorm(10,70,25),2))


all <- do.call(rbind, list(a,b,c))


all %>% group_by(date) %>% spread(cntry, price)

results in:

         date China Netherlands   USA
*      <date> <dbl>       <dbl> <dbl>
1  2016-11-29    NA          NA 78.75
2  2016-11-30    NA          NA 66.22
3  2016-12-01    NA          NA 86.04
4  2016-12-02    NA          NA 17.07
5  2016-12-03    NA          NA 75.72
6  2016-12-04    NA       46.90 39.57
7  2016-12-05    NA       51.80 65.11
8  2016-12-06    NA       57.50 96.36
9  2016-12-07    NA       46.42 46.93
10 2016-12-08    NA       45.71 57.63
11 2016-12-09 15.41       60.09    NA
12 2016-12-10 16.66       60.07    NA
13 2016-12-11 23.72       66.21    NA
14 2016-12-12 19.82       45.46    NA
15 2016-12-13 14.22       45.07    NA
16 2016-12-14 27.26          NA    NA
17 2016-12-15 20.08          NA    NA
18 2016-12-16 15.79          NA    NA
19 2016-12-17 17.66          NA    NA
20 2016-12-18 26.77          NA    NA