I have time series data and I would like to build an ARIMA forecasting model. I have split my data into train-test. I will be training the model only on the training set and evaluate on testing set.
So my question is when I am plotting the ACF and PACF to get an idea of the appropriate p and q parameters, should I plot the ACF and PACF on my training set or the whole data? and about Auto Arima, should I feed the whole data or just the training set?
I tried with both the training data and the whole data. and they give different results(for both ACF - PACF plots and Auto ARIMA). So which data should I use?
The purpose of splitting dataset into a training and testing set is to simulate real-world scenarios where you train your model on historical data and evaluate its performance on unseen future data. Although it is only for the purpose of picking the parameters for ARIMA not training itself, we can say that it is more correct to use the training dataset only to align with real-world scenario.