I am trying to learn KNN by working on Breast cancer dataset provided by UCI repository. The Total size of dataset is 699 with 9 continuous variables and 1 class variable.
I tested my accuracy on cross-validation set. For K =21 & K =19. Accuracy is 95.7%.
from sklearn.neighbors import KNeighborsClassifier
neigh = KNeighborsClassifier(n_neighbors=21)
neigh.fit(X_train, y_train)
y_pred_val = neigh.predict(X_val)
print accuracy_score(y_val, y_pred_val)
But for K= 1, I am getting Accuracy = 97.85% K = 3, Accuracy = 97.14
I read
Choice of k is very critical – A small value of k means that noise will have a higher influence on the result. A large value make it computationally expensive and kinda defeats the basic philosophy behind KNN (that points that are near might have similar densities or classes ) .A simple approach to select k is set k = n^(1/2). here
Which value of K should I consider for my model. Can you guys elaborate the logic behind it?
Thanks in advance!