I'm currently trying to train a neural network using cross validation, but I'm not sure if I'm getting how cross validation works. I understand the concept, but I can't totally see yet how the concept translates to code implementation. The following is a description of what I've got implemented, which is more-or-less guesswork.
I split the entire data set into K-folds, where 1 fold is the validation set, 1 fold is the testing set, and the data in the remaining folds are dumped into the training set.
Then, I loop K times, each time reassigning the validation and testing sets to other folds. Within each loop, I continuously train the network (update the weights) using only the training set until the error produced by the network meets some threshold. However, the error that is used to decide when to stop training is produced using the validation set, not the training set. After training is done, the error is once again produced, but this time using the testing set. This error from the testing set is recorded. Lastly, all the weights are re-initialized (using the same random number generator used to initialize them originally) or reset in some fashion to undo the learning that was done before moving on to the next set of validation, training, and testing sets.
Once all K loops finish, the errors recorded in each iteration of the K-loop are averaged.
I have bolded the parts where I'm most confused about. Please let me know if I made any mistakes!
I believe your implementation of Cross Validation is generally correct. To answer your questions:
You want to use the error on the validation set because it's reduces overfitting. This is the reason you always want to have a validation set. If you would do as you suggested, you could have a lower threshold, your algorithm will achieve a higher training accuracy than validation accuracy. However, this would generalize poorly to the unseen examples in the real world, that which your validation set is supposed to model.
The idea behind cross validation is that each iteration is like training the algorithm from scratch. This is desirable since by averaging your validation score, you get a more robust value. It protects against the possibility of a biased validation set.
My only suggestion would be to not use a test set in your cross validation scheme, since your validation set already models unseen examples, a seperate test set during the cross validation is redundant. I would instead split the data into a training and test set before you start cross validation. I would then not touch the test set until you want to gain an objective score for your algorithm.
You could use your cross validation score as an indication of performance on unseen examples, I assume however that you will be choosing parameters on this score, optimizing your model for your training set. Again, the possibility arises this does not generalize well to unseen examples, which is why it is a good practice to keep a seperate unseen test set. Which is only used after you have optimized your algorithm.