Overview
I am currently working on a normalization PMML-Model executor in c#.
These PMML normalization models look like this:
<TransformationDictionary>
<DerivedField displayName="BU01" name="BU01*" optype="continuous" dataType="double">
<Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 17 column(s)"/>
<NormContinuous field="BU01">
<LinearNorm orig="0.0" norm="-0.6148417019560395"/>
<LinearNorm orig="1.0" norm="-0.6140350877192982"/>
</NormContinuous>
</DerivedField>
(...)
I do know how min-max normalization in theory works using
z_i = (x_i - min(x)) / (max(x) - min(x))
to normalize a dataset into the range of 0-1 and obviously it's not hard to reverse this equation.
Problem
So to execute the normlization and denormalization I somehow have to translate this orig, norm values into min, max values. But I just can't figure out how these orig/norm values are being calculated and how they relate to min/max.
Question
So I'm asking if some does know an equation to transform orig/norm to min/max and back. Or is someone able to explain how to directly use orig/norm values to normalize/denormalize my fields?
Further Explanation
EDIT: It loks like as if I did not state clearly what the problem exactly is so here is another approach:
I try to get an attribut of a dataset normalized into the range from 0-1 using Min-Max normalization method (aka Feature Scaling
). Using the Data Analysis tool Knime I can do this and export my "scaling" as a PMML Model. (Example of this is the XML provided above)
With these normalized attributes I train my MLP Model. Now if I export my MLP Model as PMML I have to put normalized values in and get normalized output out when caluclating a prediction. (Computing the MLP Network already works)
In a deployed scenario where Knime can't do this normalization for me I want to use my normalization Model. As already described I do know the theory behing Feature Scaling
and can easily compute de-/normalization if I am provided with min and max of my attribute. The problem is that PMML has another let's say "notation" for saving this min-max information which is somehow inside the orig
and norm
value.
So what I am ultimately looking for is a way to convert orig/norm to min/max or how min/max information is "encoded" into orig/norm values.
Extra Info
[Why this "encoding" is done in the first place seems to be because computation speed reasons (which is not important in my scenario) and to easier encode min/max normlization info for ranges other than 0-1.]
Example #1
To give an example:
Let's say I want to normalize the array of [0, 1, 2, 4, 8] into the range of 0-1. Clearly the answer is [0, 0.125, 0.25, 0.5, 1] as computed by Feature Scaling
with min = 0, max = 8. Easy. But now if I look at the PMML normalization Model:
<TransformationDictionary>
<DerivedField displayName="column1" name="column1*" optype="continuous" dataType="double">
<Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 1 column(s)"/>
<NormContinuous field="column1">
<LinearNorm orig="0.0" norm="0.0"/>
<LinearNorm orig="1.0" norm="0.125"/>
</NormContinuous>
</DerivedField>
</TransformationDictionary>
Example #2
[1, 2, 4, 8] -> [0, 0.333, 0.667, 1] With:
<TransformationDictionary>
<DerivedField displayName="column1" name="column1*" optype="continuous" dataType="double">
<Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 1 column(s)"/>
<NormContinuous field="column1">
<LinearNorm orig="0.0" norm="-0.3333333333333333"/>
<LinearNorm orig="1.0" norm="0.0"/>
</NormContinuous>
</DerivedField>
</TransformationDictionary>
Question
So how am I supposed to scale with orig/norm or compute min/max from these values?
Found the answer. After carefully reading again through the Documentation (which is extremly confusing imo) i came across this sentence:
Which basically explains it all. Normalization in PMML is done by using a stepwise interpolation with only 2 points. So in fact just a simple conversion function.
In the case of normalization into a range of 0-1 it even get's easier as the two points will always be at
x1=0
andx2=1
(orig values). And will therefore always have their y axis intercept at orig=0 norm-value. As far as the slope of the function is concerned it is also very easy to calculate byslope = (y2-y1)/(x2-x1) = (y2-y1)/(1-0) = y2-y1
which are just the 2 norm-values.So to get our interpolation function which will always be a polynom 1st grade we just calculate:
f(x) = ax + b = (y2-y1)x + y1 = (norm(orig=1)-norm(orig=0) * x + norm(orig=0)
This is used for normalization.and now we can calculate the inverse:
x = (f(x) - norm(orig=0)) / (norm(orig=1)-norm(orig=0))
This is used for de-normalizationHope this helps everyone who at someday will also go through the hassle of implementing their own PMML executor engine and gets stuck at this topic.