In order to drop the feature based on the accuracy score, I wrote the following code, Here 'ABCDEFGHIJKLMNO'
are the columns(features), a total of 15 features.
features = 'ABCDEFGHIJKLMNO'
for i in range(0,len(features)):
pipeline = PMMLPipeline
([
('mapper', DataFrameMapper([(X_train.columns.drop([features[i:i+1]]).values)])),
('pca', PCA(n_components=3)),
('classifier', DecisionTreeClassifier())
])
pipeline.fit(training_data.drop([features[i:i+1]],axis=1),training_data['Class'])
result = pipeline.predict(X_test)
actual = np.concatenate(y_test.values)
print("Dropped feature: {}, Accuracy: {}".format(features[i:i+1], metrics.accuracy_score(actual,result)))
I am using sklearn2pmml.pipeline
library but I got below error at the time of fitting the data. I could not be able to figure out why?
Seems like your
PMMLPipeline
is indented wrongly and most likely you don't needDataFrameMapper
because it is (according to help page):You are not applying the transformations differently, so we don't need that.
Set up an example dataset like:
And running the corrected code works ok: