I want to add an additional target ("outputState") to my PMML-Regression modell.
- outputState = 0: no missing/invalid input values(-> no imputation in the regression model)
- outputState = 1: there are missing/invalid invalid values (->imputation in the regression model)
I tried to work with multiple models but I dont know how to handle multiple models/targets/outputs right.
Example (explanation below):
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:data="http://jpmml.org/jpmml-model/InlineTable" version="4.3"><Header><Application name="JPMML-R" version="1.3.14"/><Timestamp>2020-01-07T15:56:07Z</Timestamp></Header>
<DataDictionary>
<DataField name="outputState" optype="categorical" dataType="integer"/>
<DataField name="outputResult" optype="continuous" dataType="double"/>
<DataField name="inputA" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
<DataField name="inputB" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
<DataField name="inputC" optype="continuous" dataType="double">
<Interval closure="closedClosed" leftMargin="-1" rightMargin="1"/>
<Value property="missing" value="NA"/>
</DataField>
</DataDictionary>
<TransformationDictionary/>
<MiningModel functionName="mixed">
<MiningSchema>
<MiningField name="outputState" usageType="target"/>
<MiningField name="outputResult" usageType="target"/>
<MiningField name="inputA"/>
<MiningField name="inputB"/>
<MiningField name="inputC"/>
</MiningSchema>
<Output>
<OutputField name="outputState" optype="categorical" dataType="integer" targetField="outputState"/>
<OutputField name="outputResult" optype="continuous" dataType="double" targetField="outputResult"/>
</Output>
<Segmentation multipleModelMethod="selectAll">
<Segment id="1">
<True/>
<TreeModel modelName="TEST" functionName="classification" noTrueChildStrategy="returnLastPrediction">
<MiningSchema>
<MiningField name="outputState" usageType="target"/>
<MiningField name="inputA" invalidValueTreatment="asMissing"/>
<MiningField name="inputB" invalidValueTreatment="asMissing"/>
<MiningField name="inputC" invalidValueTreatment="asMissing"/>
</MiningSchema>
<Node score="0">
<True/>
<Node score="1">
<CompoundPredicate booleanOperator="or">
<SimplePredicate field="inputA" operator="isMissing"/>
<SimplePredicate field="inputB" operator="isMissing"/>
<SimplePredicate field="inputC" operator="isMissing"/>
</CompoundPredicate>
</Node>
</Node>
</TreeModel>
</Segment>
<Segment id="2">
<True/>
<RegressionModel functionName="regression">
<MiningSchema>
<MiningField name="outputResult" usageType="target"/>
<MiningField name="inputA" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
<MiningField name="inputB" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
<MiningField name="inputC" missingValueReplacement="0" missingValueTreatment="asMean" invalidValueTreatment="asMissing"/>
</MiningSchema>
<RegressionTable intercept="2">
<NumericPredictor name="inputA" coefficient="1"/>
<NumericPredictor name="inputB" coefficient="2"/>
<NumericPredictor name="inputC" coefficient="3"/>
</RegressionTable>
</RegressionModel>
</Segment>
</Segmentation>
</MiningModel>
</PMML>
Explanation:
- DataDictonary (with left and right margins)
- MiningModel (functionName="mixed" seemed to be wrong?; Segmentation multipleModelMethod="selectAll" wrong too?):
- output definition (seemed to be wrong too? because of different targets?)
- simple classification treemodel (to detect missing/imputed values) -> target: outputState
- simple regression model -> target:outputResult
Anyone an idea or better suggestions?