I have run a RandomForestClassifier model in Python using the sklearn module. I saved the model in a pickle file. I then extract data from Oracle, save it as a .csv file, send this .csv file to a machine that can open the model's pickle file in Python, and score the data. Once the data is scored I send the results back to Oracle.
Is it possible to extract the scoring coefficients from the RandomForestClassifier(.predict_proba) function so I can load that data into Oracle and score the data solely inside of Oracle?
After reading the documentation, it appears the scoring algorithm is too complex to perform the above suggestion given that it has to push each new record through each tree before it can arrive at a final scored probability. Is this correct?
I appreciate your help in advance.
Matt
AFAIK there is no ready-made tool to do so but you can read the Cython source code of the base decision tree class, in particular the predict method to understand how the prediction works from the fitted parameters of the Decision Tree model. The random forest prediction treats individual tree predictions as binary probabilities (0 or 1), average them and normalize them as written here.
Turning that into PL/SQL might not be trivial though. Apparently Oracle Data Mining has some support for PMML Import/Export of decision tree models among other models. Unfortunately I am not aware of any implementation of a PMML exporter for scikit-learn decision tree either (although it could be easier to write by taking source code of the graphviz tree exporter as an example for instance).
Also note that under PostgreSQL on the other hand you could directly use scikit-learn in a DB function written using PL/Python.