I am getting started in kaggle.
I have just gone through various data science and machine learning competition
I have seen that for every competition they have uploaded training data, test data and Original data.
Can someone explain me what are those and how do we use those datasets while solving a problem.
Training data: Used to train the AI.
Test data: Used to assess the strength of the AI that used the previous training data.
Original data: Well, it's the original data.
When doing machine learning, the AI has to be trained in some way. This is why we break the data up, and give the AI a subset of the original data (training data) so that it can learn. We test its knowledge with the test data, then once that is done we can feed it the original data and see how it does.