naive bayes feature vectors in pmml

218 Views Asked by At

I am trying to build my own pmml exporter for Naive Bayes model that I have built in scikit learn. In reading the PMML documentation it seems that for each feature vector you can either output the model in terms of count data if it is discrete or as a Gaussian/Poisson distribution if it is continous. But the coefficients of my scikit learn model are in terms of Empirical log probability of features i.e p(y|x_i). Is it possible to specify the Bayes input parameters in terms of these probability rather than counts?

1

There are 1 best solutions below

0
On BEST ANSWER

Since the PMML representation of the Naive Bayes model implements representing joint probabilities via the "PairCounts" element, one can simply replace that ratio with the probabilities output (not the log probability). Since the final probabilities are normalized, the difference doesn't matter. If the requirements involve a large number of proabilities which are mostly 0, the "threshold" attribute of the model can be used to set the default values for such probabilities.