'DataFrame' object has no attribute '_check_fit_params'

2k Views Asked by At

In order to drop the feature based on the accuracy score, I wrote the following code, Here 'ABCDEFGHIJKLMNO' are the columns(features), a total of 15 features.

features = 'ABCDEFGHIJKLMNO'

for i in range(0,len(features)):
    
    pipeline = PMMLPipeline
    ([
    ('mapper', DataFrameMapper([(X_train.columns.drop([features[i:i+1]]).values)])),
    ('pca', PCA(n_components=3)),
    ('classifier', DecisionTreeClassifier())
    ])
    
    pipeline.fit(training_data.drop([features[i:i+1]],axis=1),training_data['Class'])
    
    result = pipeline.predict(X_test)
    actual = np.concatenate(y_test.values)
    
    print("Dropped feature: {}, Accuracy: {}".format(features[i:i+1], metrics.accuracy_score(actual,result)))

I am using sklearn2pmml.pipeline library but I got below error at the time of fitting the data. I could not be able to figure out why?

enter image description here

1

There are 1 best solutions below

0
On

Seems like your PMMLPipeline is indented wrongly and most likely you don't need DataFrameMapper because it is (according to help page):

DataFrameMapper, a class for mapping pandas data frame columns to different sklearn transformations

You are not applying the transformations differently, so we don't need that.

Set up an example dataset like:

from sklearn2pmml.pipeline import PMMLPipeline
from sklearn_pandas import DataFrameMapper
from sklearn.decomposition import PCA
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

features = 'ABCDEFGHIJKLMNO'

X = pd.DataFrame(np.random.uniform(0,1,(50,15)),
columns=[i for i in features])
y = np.random.binomial(1,0.5,50)

X_train, X_test,y_train, y_test = train_test_split(X,y,test_size=0.3)

And running the corrected code works ok:

for i in range(0,len(features)):
    
    pipeline = PMMLPipeline([
    ('pca', PCA(n_components=3)),
    ('classifier', DecisionTreeClassifier())
    ])
    
    pipeline.fit(X_train.drop([features[i:i+1]],axis=1),y_train)
    
    result = pipeline.predict(X_test.drop([features[i:i+1]],axis=1))
    actual = y_test
    
    print("Dropped feature: {}, Accuracy: {}".format(features[i:i+1],
    accuracy_score(actual,result)))


Dropped feature: A, Accuracy: 0.9333333333333333
Dropped feature: B, Accuracy: 0.6
Dropped feature: C, Accuracy: 0.7333333333333333
Dropped feature: D, Accuracy: 0.6
Dropped feature: E, Accuracy: 0.6666666666666666
Dropped feature: F, Accuracy: 0.6666666666666666
Dropped feature: G, Accuracy: 0.6
Dropped feature: H, Accuracy: 0.8
Dropped feature: I, Accuracy: 0.6666666666666666
Dropped feature: J, Accuracy: 0.6666666666666666
Dropped feature: K, Accuracy: 0.7333333333333333
Dropped feature: L, Accuracy: 0.8
Dropped feature: M, Accuracy: 0.6
Dropped feature: N, Accuracy: 0.8
Dropped feature: O, Accuracy: 0.6666666666666666