Creating Time Series columns in R from Long to Wide format considering Date Range

1.5k Views Asked by At

To start with I've successfully converted my data from long to wide format. The data is as below.

+======+==========+======+======+
| Name |   Date   | Val1 | Val2 |
+======+==========+======+======+
| A    | 1/1/2018 |    1 |    2 |
+------+----------+------+------+
| B    | 1/1/2018 |    2 |    3 |
+------+----------+------+------+
| C    | 1/1/2018 |    3 |    4 |
+------+----------+------+------+
| D    | 1/4/2018 |    4 |    5 |
+------+----------+------+------+
| A    | 1/4/2018 |    5 |    6 |
+------+----------+------+------+
| B    | 1/4/2018 |    6 |    7 |
+------+----------+------+------+
| C    | 1/4/2018 |    7 |    8 |
+------+----------+------+------+

To convert the above table from long to wide format I've used following lines of code:

test_wide <- reshape(test_data, idvar = 'Name', timevar = 'Date', direction = "wide" )

The result from the above code is as follows:

+======+===============+===============+===============+===============+
| Name | Val1.1/1/2018 | Val2.1/1/2018 | Val1.1/4/2018 | Val2.1/4/2018 |
+======+===============+===============+===============+===============+
| A    | 1             | 2             |             5 |             6 |
+------+---------------+---------------+---------------+---------------+
| B    | 2             | 3             |             6 |             7 |
+------+---------------+---------------+---------------+---------------+
| C    | 3             | 4             |             7 |             8 |
+------+---------------+---------------+---------------+---------------+
| D    | NA            | NA            |             4 |             5 |
+------+---------------+---------------+---------------+---------------+

The issue that I am facing is that I need R to consider Date column in date format. The date column ranges from 1/1/2018 to 1/4/2018 since there is no value in date 1/2/2018 and 1/3/2018I won't see any columns as Val1.1/2/2018, Val2.1/3/2018, Val3.1/2/2018 and Val3.1/3/2018.

I want to convert into wide format such that I can get columns for date 1/2/2018 and 1/3/2018, eventhough those columns would contain only NULLS.

Reason for doing it is that I need to use the data as time series.

Edit:

Initial data in copy and pasted:

Name Date Val1 Val2
A 1/1/2018 1 2
B  1/1/2018 2 3
C 1/1/2018 3 4
D 1/4/2018 4 5
A 1/4/2018 5 6
B  1/4/2018 6 7
C 1/4/2018 7 8
", header=TRUE)

Converted data copy and pasted:

Name,Val1.1/1/2018,Val2.1/1/2018,Val1.1/4/2018,Val2.1/4/2018
A,1,2,5,6
B,2,3,6,7
C,3,4,7,8
D,NA,NA,4,5

dput(test_data) results:

structure(list(Name = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("A", 
"B ", "C", "D"), class = "factor"), Date = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("1/1/2018", "1/4/2018"), class = "factor"), 
    Val1 = 1:7, Val2 = 2:8), class = "data.frame", row.names = c(NA, 
-7L))
2

There are 2 best solutions below

0
On BEST ANSWER
library(dplyr)
library(tidyr) #complete
library(data.table) #dcast and setDT
df %>% mutate(Date=as.Date(Date,'%m/%d/%Y')) %>% 
       complete(Name, nesting(Date=full_seq(Date,1))) %>%
       setDT(.) %>% dcast(Name ~ Date, value.var=c('Val2','Val1'))
0
On

A tidyverse option

library(lubridate)
library(tidyverse)

df %>% 
  mutate(Date=mdy(Date)) %>% 
  #Or you can do as.Date(Date,'%m/%d/%Y') to avoid loading `lubridate`
  complete(Name, Date = seq(min(Date), max(Date), 1)) %>%
  gather(key, value, -Name, -Date) %>%
  unite(Date, key, Date, sep = ".") %>%
  spread(Date, value)