How to compare ARIMA model in R to actual observations used to create the model?

1.6k Views Asked by At

I've been using the R forecast package's auto.arima() function to fit an ARIMA model to my time series data. I want to see how good of a fit the ARIMA model is to my original data. I hope to plot my original time series and the ARIMA simulation on the same plot and see how well they match up. How can I do this?

1

There are 1 best solutions below

0
On

I'm not really sure about what you are looking for, but I think you want to have some indication about the accuracy of your model. You may found details on the topic in the online book of Hyndman and Athanasopoulos: https://www.otexts.org/fpp/2/5

Here an example based on the dataset "AirPassengers"

library(forecast)

data(AirPassengers)

# inspect the series

AirPassengers

plot(AirPassengers)


# data are assigned to a convenient vector

series <- AirPassengers

# the accuracy of forecasts can only be determined 
# by considering how well a model performs on new data that were not used when fitting the model.
# The size of the test set is typically about 20% of the total sample 

# training set
# use data from 1949 to 1956 for forecasting

sr = window(series, start=1949, end=c(1956,12))

# test set
# use remaining data from 1957 to 1960 to test accuracy

ser = window(series, start=1957, end=c(1960,12))



######################################################################
# plot training set
######################################################################

plot(sr, main="AirPassengers", ylab="", xlab="Months")

# plot forecasting for 5 years according to four methods
lines(meanf(sr,h=48)$mean, col=4)
lines(rwf(sr,h=48)$mean, col=2)
lines(rwf(sr,drift=TRUE,h=48)$mean, col=3)
lines(snaive(sr,h=48)$mean, col=5)

# legend
legend("topleft", lty=1, col=c(4,2,3, 5),
legend=c("Mean method","Naive method","Drift method", "Seasonal naïve method"),bty="n")

# the test set
lines(ser, col="red")


# accuracy for forecasting of sr (forecasted data) on ser (original data)
# the best model had the lowest error (particularly the MAPE, Mean absolute percentage error)

# Mean method
accuracy(meanf(sr,h=48), ser)

# Naive method
accuracy(rwf(sr,h=48), ser)

# Drift method
accuracy(rwf(sr,drift=TRUE,h=48), ser)

# Seasonal naïve method
accuracy(snaive(sr,h=48), ser)



######################################################################
# plot test set only with the predictions
######################################################################

# calculate the forecasting

sr.mean <- meanf(sr,h=48)$mean
sr.naive <- rwf(sr,h=48)$mean
sr.drift <- rwf(sr,drift=TRUE,h=48)$mean
sr.seas <- snaive(sr,h=48)$mean

# plot the test set
plot(ser, main="AirPassengers", ylab="", xlab="Months", ylim = c(200,600))

# plot forecasting for 4 years according to four methods
lines(sr.mean, col=4)
lines(sr.naive, col=2)
lines(sr.drift, col=3)
lines(sr.seas, col=5)

# legend
legend("topleft", lty=1, col=c(4,2,3,5),
legend=c("Mean method","Naive method","Drift method", "Seasonal naïve method"),bty="n")



########################################################################
# for ARIMA; Hyndman suggest to use auto-arima without stepwise 
########################################################################

library(fpp)

trainData <- sr
testData <- ser

#  the default value in auto.arima() is test="kpss". 
# A KPSS test has a null hypothesis of stationarity
# In general, all the defaults are set to the values that give the best forecasts on average.

# CAUTION! Takes a while to compute

arimaMod <- auto.arima(trainData, stepwise=FALSE, approximation=FALSE)
arimaMod.Fr <-forecast(arimaMod,h=48)

# plot of the prediction and of the test set

plot(arimaMod.Fr)
lines(testData, col="red")
legend("topleft",lty=1,bty = "n",col=c("red","blue"),c("testData","ARIMAPred"))



# plot of the test set and its prediction only

AR.mean <-forecast(arimaMod,h=48)$mean

plot(testData, main="AirPassengers", ylab="", xlab="Months", col="darkblue")  
lines(AR.mean, col="red")

# accuracy

accuracy(arimaMod.Fr,testData)

# test residues of arima

tsdisplay(residuals(arimaMod))