X = df.drop(columns="CLASS")
y = df.CLASS
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
preprocessor = ColumnTransformer([
('numeric', num_pipe(), ["PINJAM"]),
('categoric', cat_pipe(encoder='onehot'), ["JENIS KELAMIN", "STATUS PERNIKAHAN", "JUMLAH TANGGUNGAN"]),
])
from sklearn.naive_bayes import GaussianNB
pipeline = Pipeline([
('prep', preprocessor),
('algo', GaussianNB)
])
pipeline.fit(X_train, y_train)
Error:
--------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [46], in <cell line: 1>()
----> 1 pipeline.fit(X_train, y_train)
File ~\anaconda3\envs\jcopml\lib\site-packages\sklearn\pipeline.py:394, in Pipeline.fit(self, X, y, **fit_params)
392 if self._final_estimator != "passthrough":
393 fit_params_last_step = fit_params_steps[self.steps[-1][0]]
--> 394 self._final_estimator.fit(Xt, y, **fit_params_last_step)
396 return self
TypeError: fit() missing 1 required positional argument: 'y'
How do I resolve this?
It's always better to give a fully working example in your question. This can and should be minimal. As @anastasiya-Romanova pointed out, you have to follow the right init methods for the pipeline, which is also shown here.
This prints:
For completeness, the linked documentation from sklearn demonstrates how to use the pipeline in such a way: