Matthews Correlation Coefficient in Lasso model, logistic regression with R

277 Views Asked by At

I am using data (birthwt) from the library (MASS) and I'd like to calculate Mathews Correlation Coefficient (MCC) with R in a Lasso model. I am really confused. Thks in advance

birthwt=birthwt[,-10]
boot=sample(nrow(birthwt), 40)
train.data=birthwt[-boot, ]
test.data=birthwt[boot, ]
x =model.matrix(low~., train.data)[,-1]
y =train.data$low

Lasso Model:

library(glmnet)
set.seed(123)
cv.lasso = cv.glmnet(x, y, alpha = 1, family = "binomial")
model.lasso=glmnet(x,y,alpha=1,family="binomial",lambda= cv.lasso$lambda.min)
coef(model.lasso)
 x.test = model.matrix(low ~., test.data)[,-1]
proba.lasso = predict(model.lasso,newx = x.test)
class.lasso = ifelse(proba.lasso > 0.5, 1, 0)
class.obs = test.data$low
1

There are 1 best solutions below

5
On

If you do predict on a glmnet object, the default response is the logit link, but in your case, you need to do:

class.lasso = predict(model.lasso,newx = x.test,type="class")
class.lasso = as.numeric(class.lasso)
class.obs = test.data$low

You can also use the probability this way:

class.lasso = ifelse(predict(model.lasso,newx = x.test,type="response") > 0.5,1,0)

To calculate mcc, you can do:

library(mltools)
mcc(pred = class.lasso, actual = class.obs)
[1] 0.2581989

Or use something that calculates pearson's phi:

library(psych)
phi(table(class.lasso,class.obs),digits=7)

[1] 0.2581989

Or if you derive it from scratch using the formula from wiki:

cm = table(class.lasso,class.obs)
TP = cm[2,2]
FP = cm[2,1]
TN = cm[1,1]
FN = cm[1,2]

(TP * TN - FP*FN)/sqrt((TP+FP)*(TP+FN)*(TN+FP)*(TN+FN))
[1] 0.2581989