no imputati..." /> no imputati..." /> no imputati..."/>

PMML- MultipleModels: Additional target with information about missing/Invalid values

183 Views Asked by At

I want to add an additional target ("outputState") to my PMML-Regression modell.

  • outputState = 0: no missing/invalid input values(-> no imputation in the regression model)
  • outputState = 1: there are missing/invalid invalid values (->imputation in the regression model)

I tried to work with multiple models but I dont know how to handle multiple models/targets/outputs right.

Example (explanation below):

    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3"><Header><Application name="JPMML-R" version="1.3.14"/><Timestamp>2020-01-07T15:56:07Z</Timestamp></Header>
    <DataDictionary>
      <DataField name="outputState" optype="categorical" dataType="integer"/>
      <DataField name="outputResult" optype="continuous" dataType="double"/>
      <DataField name="inputA" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
      <DataField name="inputB" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
      <DataField name="inputC" optype="continuous" dataType="double">
        <Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
        <Value property="missing" value="NA"/>
      </DataField>
    </DataDictionary>
    <TransformationDictionary/>
    <MiningModel functionName="mixed">
      <MiningSchema>
      <MiningField name="outputState" usageType="target"/>
      <MiningField name="outputResult" usageType="target"/>
      <MiningField name="inputA"/>
      <MiningField name="inputB"/>
      <MiningField name="inputC"/>
    </MiningSchema>
    <Output>
      <OutputField name="outputState" optype="categorical" dataType="integer" targetField="outputState"/>
      <OutputField name="outputResult" optype="continuous" dataType="double" targetField="outputResult"/>
    </Output>
    <Segmentation multipleModelMethod="selectAll">
      <Segment id="1">
        <True/>
        <TreeModel modelName="TEST" functionName="classification" noTrueChildStrategy="returnLastPrediction">
          <MiningSchema>
            <MiningField name="outputState" usageType="target"/>
            <MiningField name="inputA" invalidValueTreatment="asMissing"/>
            <MiningField name="inputB" invalidValueTreatment="asMissing"/>
            <MiningField name="inputC" invalidValueTreatment="asMissing"/>
          </MiningSchema>
          <Node score="0">
          <True/>
            <Node score="1">    
              <CompoundPredicate booleanOperator="or">
              <SimplePredicate field="inputA" operator="isMissing"/>
              <SimplePredicate field="inputB" operator="isMissing"/>
              <SimplePredicate field="inputC" operator="isMissing"/>
              </CompoundPredicate>
            </Node> 
          </Node>
        </TreeModel>
      </Segment>
      <Segment id="2">
        <True/>
        <RegressionModel functionName="regression">
          <MiningSchema>
            <MiningField name="outputResult" usageType="target"/>
            <MiningField name="inputA" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
            <MiningField name="inputB" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
            <MiningField name="inputC" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
          </MiningSchema>
          <RegressionTable intercept="2">
            <NumericPredictor name="inputA" coefficient="1"/>
            <NumericPredictor name="inputB" coefficient="2"/>
            <NumericPredictor name="inputC" coefficient="3"/>
          </RegressionTable>
        </RegressionModel>
      </Segment>
    </Segmentation>
    </MiningModel>
    </PMML>

Explanation:

  1. DataDictonary (with left and right margins)
  2. MiningModel (functionName="mixed" seemed to be wrong?; Segmentation multipleModelMethod="selectAll" wrong too?):
    • output definition (seemed to be wrong too? because of different targets?)
    • simple classification treemodel (to detect missing/imputed values) -> target: outputState
    • simple regression model -> target:outputResult

Anyone an idea or better suggestions?

0

There are 0 best solutions below