This is my code in python to calculate accuracy, precision, recall, and f1 score on K-Fold Cross Validation.
Here in my code I sum up every of my accuracy, recall, and so on. Then I divide it with n_folds. But I don't know if my formula is accurate to calculate those scores. How can I tell?
a=0
p=0
r=0
f=0
for fold in range(0, n_folds):
# splitting the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size =int(len(y)/n_folds))
clf.fit(X_train, y_train)
x_test_prediction = clf.predict(X_test)
a=a+accuracy_score(x_test_prediction, y_test)
p=p+precision_score(x_test_prediction, y_test)
r=r+recall_score(x_test_prediction, y_test)
f=f+f1_score(x_test_prediction, y_test)
accuracy_score=a
precision_score=p
recall_score=r
f1_score=f
print("accuracy score :",(accuracy_score)/n_folds)
print("precision score :",precision_score/n_folds)
print("recall score :",recall_score/n_folds)
print("f1 score :",f1_score/n_folds)
There is a function to handle cross validation for you:
cross_validate. However, your method seems correct.Note that it is not a good idea to use your entire data set to build your model. You can check the documentation about evaluate estimator performance:
Output:
Check other predefined scoring values