I am new to Machine Learning. In a binary classfication problem, we encode/transform target variable like yes=1 and No=0 (directly in dataset) it gives follwoing results
- Accuracy:95
- Recall: 90
- Precision:94
- F1: 92
but if we encode/transform target variable inversely like yes=0 and No=1(directly in dataset), then it gives these results
- Accuracy:95
- Recall:97
- Precision:94
- F1:95
I am using XGboost algorithm. All other variables are numeric(positive and negative) Although accuracy is same in both cases but I assume that F1 should also be same in both cases. So why it is giving different results. I know that scikit-learn can handle encoding but why F1 is different in both cases?
xtrain,xtest,ytrain,ytest=train_test_split(X,encoded_Y,test_size=0.3,random_state=100,shuffle=True)
clf_xgb = xgb.XGBClassifier(nthread=1,random_state=100)
clf_xgb.fit(xtrain, ytrain)
xgb_pred = clf_xgb.predict(xtest)
xgb_pred_prb=clf_xgb.predict_proba(xtest)[:,1]
print(confusion_matrix(xgb_pred,ytest))
# [984 57]
# [103 1856]
#Find Accuracy of XGBoost
accuracy_xgb = accuracy_score(ytest,xgb_pred)
print("Accuracy: {}".format(accuracy_xgb)
#Find Recall of XGBoost
recall_xgb = recall_score(ytest,xgb_pred)
recall_xgb
#Find Precision of XGBoost
precision_xgb = precision_score(ytest,xgb_pred)
precision_xgb
#Find F1 Score XGB
xgb_f1=f1_score(ytest,xgb_pred)
xgb_f1
This is because the f1 score and precision and recall are connected.
The formulas are:
and
So recall and precision are depending on what you define as positive (1). If you Switch your positive / negative cases, like you do by mapping yes/no differently, you get a totally different result. You can see that by the following calculation, assuming you have 100 yes and 4900 nos and get the follwoing result:
Then in case you define
YES
as positive (1), you getWhile if you define
NO
as positive (1), you get:Note, if
YES
is your positive class, the matrix above is assigned to true_positives, ... like this:While if you define
NO
to be the positive class, the true_positives, ... are assigned like this: