I'm thoroughly enjoying pycaret to handle much of the legwork in my analysis. I'm making heavy use of the setup() method in preprocessing to handle normalization, target transformation, and feature selection in my data. After creating and validating my model, using the train/test sets that pycaret generates, I'm aiming to run the model on an unseen dataset to mimic a real world application. It would be nice to make use of the pycaret preprocessing to handle the legwork on the unseen dataset, just as I did for train/test.
Towards datascience has a great tutorial on analysis with pycaret here but after using a variety of transformations in the preprocessing setup method, they appear to just feed the raw data_unseen set into the predict_model() method without any obvious preparation. Is there a way to use pycaret's preprocessor on subsequent datasets that aren't train/test splits? Or do we need to do it without pycaret?
Here is their code:
import pandas as pddf = pd.read_csv('source/heart.csv')
df.head()
data = df.sample(frac=0.95, random_state=42)
data_unseen = df.drop(data.index)
data.reset_index(inplace=True, drop=True)
data_unseen.reset_index(inplace=True, drop=True)
print('Data for Modeling: ' + str(data.shape))
print('Unseen Data For Predictions: ' + str(data_unseen.shape))
Data for Modeling: (288, 14)
Unseen Data For Predictions: (15, 14)
from pycaret.classification import *
from imblearn.over_sampling import RandomOverSampler
model = setup(data = data, target = 'output', normalize = True, normalize_method='minmax', train_size = 0.8,fix_imbalance = True, fix_imbalance_method=RandomOverSampler(), session_id=123)
best = compare_models()
tuned_best = tune_model(best)
plot_model(tuned, plot = 'pr')
final_best = finalize_model(tuned_best)
predict_model(final_best)
predict_model(final_best, data = data_unseen)
Clarification needed: Firstly, more clarification is needed for your question, your purpose is not clear, and it is not clear which parts you want to automate (avoiding manual work).
Here is the general flow for classification: https://pycaret.gitbook.io/docs/
You need to call
compare_models()aftersetup(). Then you can use the chosen model which is namedbesthere.Tutorial: Also PyCaret's binary classification tutorial can be of help to you: Colab - Binary Classification
Here is the github link of this tutorial: Github - Binary Classification
For more PyCaret tutorials: PyCaret Tutorials