I would like to make a prediction of a wave with autoregression. And I have fitted a AR model to the data and tested the prediction on test data but my problem is that the mean squared error is lowest for first order polynomial.. what dose that mean?
And I cant predict more that the length of the training set... When I use a longer training set I get only zeros at the end.. Is this correct or what we expect and then why? See picture
% load data
clear;
close all;
load Cv52.mat;
format long
eta_snl = Cv52.PG.eta_snl;
t_snl_dnum = Cv52.PG.t_datenum;
t_snl_time = datetime(t_snl_dnum','Format','HH:mm:ss.SSS','convertFrom','datenum');
initial_t = datetime('09 07, 2019, 17:11:43.000','Format','MM dd, uuuu, HH:mm:ss.SSS');
final_t = datetime('09 07, 2019, 17:18:59.000','Format','MM dd, uuuu, HH:mm:ss.SSS');
% splite date, training/validation/test
init = 1;
init_val = 2;
final = 2000;
split = 400;
ssize = 2;
data = eta_snl(1:final);
trein_data = data(init:ssize:split);
val_data = data(init_val:ssize:split);
test_data = data(split:ssize:final);
t_snl_time = t_snl_time(init:final);
t_train = t_snl_time(init:ssize:split);
t_val = t_snl_time(init_val:ssize:split);
t_test = t_snl_time(split:ssize:final);
% hyperparameters
deg = 40; % degree of polynomial
%
trein_score = zeros(1, length(deg));
test_score = zeros(1, length(deg));
for j = 1 : deg
% construct matrices
m = length(trein_data)- j;
X = zeros(m, j);
Y = zeros(1, m);
for p = 1:m
X(p, 1:j) = trein_data(p+1:p+j);
Y(p) = trein_data(p);
end
% AR model
% weights, beta is determined by minimize the least square problem
beta = (X'*X)\ X' * Y';
Y_pred_trein = beta' * X';
% test and validation of model
% construct matrices
X_val = zeros(length(val_data) - j, j);
X_test = zeros(length(test_data) - j, j);
for p = 1:length(val_data) - j
X_val(p, 1:j) = val_data(p+1:p+j);
X_test(p, 1:j) = test_data(p+1:p+j);
end
% predict
Y_pred_val = X_val * beta;
Y_pred_test = X_test * beta;
% error / validation score
test_score(j) = mean((Y_pred_test - test_data(1:length(Y_pred_test))).^2);
trein_score(j) = mean((Y_pred_trein - Y).^2);
if j == 10
figure;
hold on;
plot(t_snl_time, data, 'b')
plot(t_train(1:length(Y_pred_trein)), Y_pred_trein, 'r')
plot(t_val(1:length(Y_pred_val)), Y_pred_val, 'k')
plot(t_test(1:length(Y_pred_test)), Y_pred_test, 'r')
legend('Actual Data', 'Model fit' , 'Model validation', 'Model test');
xlabel('Time');
title('AR-model, CV52 pole 2');
end
end
figure;
hold on;
plot(1:1:deg, trein_score, 'b.-')
plot(1:1:deg, test_score, 'r.-')
legend('Treining score', 'Test score', 'Location','Best');
xlabel('Polynomial degree');
ylabel('Mean squared error');
title('AR-model evaluation');
This is the code I have written in MATLAB.