How can I impute observations on one variable in a list of dataframes? (dyadic time series)

Question

How can I impute observations on one variable in a list of dataframes? (dyadic time series)

45 Views Asked by dorokal At 15 February 2024 at 17:47

I have several individual csv-files on specific country pairs and their trade volumes for the years 1870-2020 (using the COW trade dataset, smoothtotrade variable here). Unfortunately, the dataset is only available until 2014, so all other values are set NA.

After trying several things to impute/forecast the missing data, I've decided it might be best to just carry forward the last available value (i.e., smoothtotrade in 2014). However, I can't get it to work. I've been using the imputeTS package here, using the na_locf function. Can someone help me out?

The list of data frames is called data_frames. My current code:

library(imputeTS)

*Imputation function using carry forward of the average of the last three non-missing values*

impute_smoothtotrade <- function(ts_data) {

  ts_data_imputed <- na.locf(ts_data, option = "locf")
  
  return(ts_data_imputed)
}

*Loop through each data frame (time series) in the list*

for (i in seq_along(data_frames)) {
  
  data_frames[[i]]$smoothtotrade <- impute_smoothtotrade(data_frames[[i]]$smoothtotrade)
}

this is the result of a random country pair, showing clearly that the 2014 value was evidently not carried forward as intended.

51    AUT    CMR 2010     11.484859  
52    AUT    CMR 2011     10.393110  
53    AUT    CMR 2012      6.902980  
54    AUT    CMR 2013      4.058900  
55    AUT    CMR 2014      9.018300  
89    AUT    CMR 2015      2.582298  
90    AUT    CMR 2016      2.582298  
91    AUT    CMR 2017      2.582298  
92    AUT    CMR 2018      2.582298  
93    AUT    CMR 2019      2.582298  
94    AUT    CMR 2020      2.582298

Original Q&A

There are 1 best solutions below

**L Tyrone** · Answer 1 · 2024-02-18T19:26:09.723000

Two (of many) options:

Sample data

# Sample dataframes and data_frame list
df1 <- data.frame(country = c(rep("AAA", 11)), year = 2010:2020,
                  smoothtotrade = c(11.484859, 10.393110, 6.902980, 4.058900, 9.018300, rep(NA, 6)))

df2 <- data.frame(country = c(rep("BBB", 11)), year = 2010:2020,
                  smoothtotrade = c(12.484859, 1.393110, 3.902980, 8.058900, 5.018300, rep(NA, 6)))

df3 <- data.frame(country = c(rep("CCC", 11)), year = 2010:2020,
                  smoothtotrade = c(8.484859, 9.393110, 10.902980, 9.058900, 8.018300, rep(NA, 6)))

data_frames <- list(df1, df2, df3)

Option 1: Using the dplyr and tidyr packages

library(dplyr)
library(tidyr)

# Single df with all dataframes
df4 <- bind_rows(data_frames, .id = "column_label")

result <- df4 %>%
  group_by(country) %>%
  fill(smoothtotrade, .direction = c("down")) %>%
  ungroup()

result
# A tibble: 33 × 4
   column_label country  year smoothtotrade
   <chr>        <chr>   <int>         <dbl>
 1 1            AAA      2010          11.5 
 2 1            AAA      2011          10.4 
 3 1            AAA      2012          6.90
 4 1            AAA      2013          4.06
 5 1            AAA      2014          9.02
 6 1            AAA      2015          9.02
 7 1            AAA      2016          9.02
 8 1            AAA      2017          9.02
 9 1            AAA      2018          9.02
10 1            AAA      2019          9.02
# ℹ 23 more rows
# ℹ Use `print(n = ...)` to see more rows

Option 2: Using your original method

for (i in seq_along(data_frames)) {
  
  data_frames[[i]]$smoothtotrade <- 
    ifelse(is.na(data_frames[[i]]$smoothtotrade),
           data_frames[[i]]$smoothtotrade[max(which(!is.na(data_frames[[i]]$smoothtotrade)))],
           data_frames[[i]]$smoothtotrade)
  
}

data_frames
[[1]]
   country year smoothtotrade
1      AAA 2010      11.48486
2      AAA 2011      10.39311
3      AAA 2012       6.90298
4      AAA 2013       4.05890
5      AAA 2014       9.01830
6      AAA 2015       9.01830
7      AAA 2016       9.01830
8      AAA 2017       9.01830
9      AAA 2018       9.01830
10     AAA 2019       9.01830
11     AAA 2020       9.01830

[[2]]
   country year smoothtotrade
1      BBB 2010      12.48486
2      BBB 2011       1.39311
3      BBB 2012       3.90298
4      BBB 2013       8.05890
5      BBB 2014       5.01830
6      BBB 2015       5.01830
7      BBB 2016       5.01830
8      BBB 2017       5.01830
9      BBB 2018       5.01830
10     BBB 2019       5.01830
11     BBB 2020       5.01830

[[3]]
   country year smoothtotrade
1      CCC 2010      8.484859
2      CCC 2011      9.393110
3      CCC 2012     10.902980
4      CCC 2013      9.058900
5      CCC 2014      8.018300
6      CCC 2015      8.018300
7      CCC 2016      8.018300
8      CCC 2017      8.018300
9      CCC 2018      8.018300
10     CCC 2019      8.018300
11     CCC 2020      8.018300

How can I impute observations on one variable in a list of dataframes? (dyadic time series)

There are 1 best solutions below

Related Questions in R

Related Questions in FORECASTING

Related Questions in IMPUTATION

Related Questions in IMPUTETS

Trending Questions

Popular # Hahtags

Popular Questions