I have several individual csv-files on specific country pairs and their trade volumes for the years 1870-2020 (using the COW trade dataset, smoothtotrade variable here). Unfortunately, the dataset is only available until 2014, so all other values are set NA.
After trying several things to impute/forecast the missing data, I've decided it might be best to just carry forward the last available value (i.e., smoothtotrade in 2014). However, I can't get it to work. I've been using the imputeTS package here, using the na_locf function. Can someone help me out?
The list of data frames is called data_frames. My current code:
library(imputeTS)
*Imputation function using carry forward of the average of the last three non-missing values*
impute_smoothtotrade <- function(ts_data) {
ts_data_imputed <- na.locf(ts_data, option = "locf")
return(ts_data_imputed)
}
*Loop through each data frame (time series) in the list*
for (i in seq_along(data_frames)) {
data_frames[[i]]$smoothtotrade <- impute_smoothtotrade(data_frames[[i]]$smoothtotrade)
}
this is the result of a random country pair, showing clearly that the 2014 value was evidently not carried forward as intended.
51 AUT CMR 2010 11.484859
52 AUT CMR 2011 10.393110
53 AUT CMR 2012 6.902980
54 AUT CMR 2013 4.058900
55 AUT CMR 2014 9.018300
89 AUT CMR 2015 2.582298
90 AUT CMR 2016 2.582298
91 AUT CMR 2017 2.582298
92 AUT CMR 2018 2.582298
93 AUT CMR 2019 2.582298
94 AUT CMR 2020 2.582298
Two (of many) options:
Sample data
Option 1: Using the
dplyrandtidyrpackagesOption 2: Using your original method