Meaning of predValue field in H2O Random forest model

37 Views Asked by Vijay Kansal At 13 November 2023 at 17:57

I built an H2ORandomForestEstimator model using train() method on my spark dataframe with the target column containing values 0 or 1. I downloaded and printed its mojo files using model.download_mojo(MOJO_ZIP_PATH) and h2o.print_mojo(MOJO_ZIP_PATH, tree_index=tree_ind) functions respectively. A partial such output tree is shown below.

As can be seen, leaf nodes have a field named predValue containing a value between 0 and 1. What is the meaning of this predValue field? Does it mean that the target variable is likely to contain the value contained in predValue field if the input variables happen to meet this root to leaf path when predict() is called on them?

Moreover, I want to preprocess the output model of H2ORandomForestEstimator and filter only those rules (root to leaf paths) for which my model will predict 1. Is there a way to filter such rules by parsing the mojo files without actually running the predict() function on input variables? predValue field in the output mojo files looked promising to solve this problem but I could not figure out its co-relation with the output variable. Can it be used to figure out the top-N rules?

'trees': [{
    'root': {
        'nodeNumber': 0,
        'weight': 18319.0,
        'colId': 169,
        'colName': 'pkg_items_gl_product_group_desc_1.gl_electronics',
        'leftward': True,
        'isCategorical': False,
        'inclusiveNa': True,
        'splitValue': 0.5,
        'rightChild': {
            'nodeNumber': 25,
            'weight': 462.0,
            'predValue': 0.9935065
        },
        'leftChild': {
            'nodeNumber': 1,
            'weight': 17857.0,
            'colId': 0,
            'colName': 'pkg_attr_total_pkg_price',
            'leftward': True,
            'isCategorical': False,
            'inclusiveNa': True,
            'splitValue': 186.52805,
            'rightChild': {
                'nodeNumber': 26,
                'weight': 201.0,
                'predValue': 0.9900498
            },
            'leftChild': {
                'nodeNumber': 3,
                'weight': 13184.0,
                'colId': 149,
                'colName': 'pkg_items_gl_product_group_desc_1.gl_automotive',
                'leftward': True,
                'isCategorical': False,
                'inclusiveNa': True,
                'splitValue': 0.5,
                'rightChild': {
                    'nodeNumber': 27,
                    'weight': 312.0,
                    'predValue': 0.99038464
                },

Original Q&A

There are 1 best solutions below

Maurever On 15 November 2023 at 13:39

Thanks for the question.

Every leaf node contains predValue = information regarding the final prediction made on that node.

See tree structure info here:

To get information if the result prediction is 0 or 1, you must get the threshold (default_threshold) used for these decisions. You can find the default_threshold in the model info.

You can get the decision path for the concrete node see decision_paths or decision paths for the whole tree, see tree_decision_path.

If you are interested in leaf node assignment, see https://docs.h2o.ai/h2o/latest-stable/h2o-docs/performance-and-prediction.html?#predicting-leaf-node-assignment.

Let me know if you have another question.

Meaning of predValue field in H2O Random forest model

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in PYSPARK

Related Questions in RANDOM-FOREST

Related Questions in DECISION-TREE

Related Questions in H2O

Trending Questions

Popular # Hahtags

Popular Questions