I am in process of learning the basics of NLP and I am trying to code the kNN classifier.
In the data preparation stage, I am trying to reduce the set size down to a certain dimension but I am confused about how to do that.
Can anyone help me out?
I have written the code below for getting the training dataset
trainingData = fetch_20newsgroups(subset="train",categories=allCategories)
What you're trying to do is renowned for
Dimension Reductionwhich has its own variants, it the broadest sense it is divided intoSupervisedandUnsupervised. Any flavor of it usingsklearnAPI would be implemented as below:output: