Exploratory analysis with logistic regression - Do I need to split into training and test sets?

71 Views Asked by At

I have a dataset that I am just exploring at the moment. One thing I am interested in is whether some variables classify a particular outcome measure. I want to use logistic regression just to see how well the it does. In this situation is it okay to just use the entire dataset to training and then predict on (see code below) or even in this exploratory phase should I be splitting into training and test sets?

    model_glm = glm(outcome ~ var1 + var2, data = DataSet, family = "binomial")
    prob = predict(model_glm, newdata = DataSet, type = "response")
    roc = roc(DataSet$outcome ~ prob, plot = TRUE, print.auc = TRUE)
    as.numeric(roc$auc)
    coords(roc, "best")

I tried splitting the dataset into training and test sets as well as the above code. My results were similar.

0

There are 0 best solutions below