I have data that consists of 10 different variables. The nature of my data is compositional so each variable contributes a proportion with the total sum being 1. Here is an example of what that looks like
Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | Var7 | Var8 | Var9 | Var10 |
---|---|---|---|---|---|---|---|---|---|
0.1 | 0.2 | 0.3 | 0.4 | 0 | 0 | 0 | 0 | 0 | 0 |
0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
0.1 | 0.1 | 0.2 | 0.1 | 0 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 |
My main aim is to conduct time series forecasting on compositional data. Due to the nature of my data, I want to use a model that can take a matrix as its input, instead of having to run on each vector (variable) one at a time. I found that Vector Autoregression model in R is able to do this.
Due to the compositional nature of my data, I took the following steps to use VAR in R
- Performed CLR transformations to bring my compositional data values to real space to prevent the simplex constraint
- Next, I took training data that is at daily frequency (about 4 years of data for training) and used the
vars::VAR
to fit the model and chose lag parameter based on AIC criterion - Then I went used the
predict()
to forecast the next 365 days for each of my variables.
Problem I am encountering:
When I use the predict()
function and check the results, I am getting NA's for the forecasts for every variable. I tried troubleshooting by rerunning the model with just 2 variables and then I was able to get forecasts. So then I kept adding 1 additional variable. In doing so, I was able to get forecasts up till the I ran with 9 variables. When I add the 10th variable, I get no forecasts. I tried mixing and matching the variables to see if 1 was to blame but could not find anything wrong. The forecasts stop when I have 10 variables. What could be causing this?
Additionally, any suggestions for other time series forecasting models that can work with compositional data and work with matrix as input. I appreciate any guidance.