Calculation of orig, norm attributes of NormContinous in PMML

145 Views Asked by At

Overview

I am currently working on a normalization PMML-Model executor in c#.

These PMML normalization models look like this:

 <TransformationDictionary>
    <DerivedField displayName="BU01" name="BU01*" optype="continuous" dataType="double">
      <Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 17 column(s)"/>
      <NormContinuous field="BU01">
        <LinearNorm orig="0.0" norm="-0.6148417019560395"/>
        <LinearNorm orig="1.0" norm="-0.6140350877192982"/>
      </NormContinuous>
    </DerivedField>
(...)

I do know how min-max normalization in theory works using

z_i = (x_i - min(x)) / (max(x) - min(x))

to normalize a dataset into the range of 0-1 and obviously it's not hard to reverse this equation.

Problem

So to execute the normlization and denormalization I somehow have to translate this orig, norm values into min, max values. But I just can't figure out how these orig/norm values are being calculated and how they relate to min/max.

Question

So I'm asking if some does know an equation to transform orig/norm to min/max and back. Or is someone able to explain how to directly use orig/norm values to normalize/denormalize my fields?

Further Explanation

EDIT: It loks like as if I did not state clearly what the problem exactly is so here is another approach:

I try to get an attribut of a dataset normalized into the range from 0-1 using Min-Max normalization method (aka Feature Scaling). Using the Data Analysis tool Knime I can do this and export my "scaling" as a PMML Model. (Example of this is the XML provided above)

With these normalized attributes I train my MLP Model. Now if I export my MLP Model as PMML I have to put normalized values in and get normalized output out when caluclating a prediction. (Computing the MLP Network already works)

In a deployed scenario where Knime can't do this normalization for me I want to use my normalization Model. As already described I do know the theory behing Feature Scaling and can easily compute de-/normalization if I am provided with min and max of my attribute. The problem is that PMML has another let's say "notation" for saving this min-max information which is somehow inside the orig and norm value.

So what I am ultimately looking for is a way to convert orig/norm to min/max or how min/max information is "encoded" into orig/norm values.

Extra Info

[Why this "encoding" is done in the first place seems to be because computation speed reasons (which is not important in my scenario) and to easier encode min/max normlization info for ranges other than 0-1.]

Example #1

To give an example: Let's say I want to normalize the array of [0, 1, 2, 4, 8] into the range of 0-1. Clearly the answer is [0, 0.125, 0.25, 0.5, 1] as computed by Feature Scaling with min = 0, max = 8. Easy. But now if I look at the PMML normalization Model:

<TransformationDictionary>
  <DerivedField displayName="column1" name="column1*" optype="continuous" dataType="double">
    <Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 1 column(s)"/>
    <NormContinuous field="column1">
      <LinearNorm orig="0.0" norm="0.0"/>
      <LinearNorm orig="1.0" norm="0.125"/>
    </NormContinuous>
  </DerivedField>
</TransformationDictionary>

Example #2

[1, 2, 4, 8] -> [0, 0.333, 0.667, 1] With:

<TransformationDictionary>
  <DerivedField displayName="column1" name="column1*" optype="continuous" dataType="double">
    <Extension name="summary" extender="KNIME" value="Min/Max (0.0, 1) normalization on 1 column(s)"/>
    <NormContinuous field="column1">
      <LinearNorm orig="0.0" norm="-0.3333333333333333"/>
      <LinearNorm orig="1.0" norm="0.0"/>
    </NormContinuous>
  </DerivedField>
</TransformationDictionary>

Question

So how am I supposed to scale with orig/norm or compute min/max from these values?

2

There are 2 best solutions below

0
On BEST ANSWER

Found the answer. After carefully reading again through the Documentation (which is extremly confusing imo) i came across this sentence:

The sequence of LinearNorm elements defines a sequence of points for a stepwise linear interpolation function. The sequence must contain at least two elements. Within NormContinous the elements LinearNorm must be strictly sorted by ascending value of orig.

Which basically explains it all. Normalization in PMML is done by using a stepwise interpolation with only 2 points. So in fact just a simple conversion function.

In the case of normalization into a range of 0-1 it even get's easier as the two points will always be at x1=0 and x2=1 (orig values). And will therefore always have their y axis intercept at orig=0 norm-value. As far as the slope of the function is concerned it is also very easy to calculate by slope = (y2-y1)/(x2-x1) = (y2-y1)/(1-0) = y2-y1 which are just the 2 norm-values.

So to get our interpolation function which will always be a polynom 1st grade we just calculate:

f(x) = ax + b = (y2-y1)x + y1 = (norm(orig=1)-norm(orig=0) * x + norm(orig=0) This is used for normalization.

and now we can calculate the inverse:

x = (f(x) - norm(orig=0)) / (norm(orig=1)-norm(orig=0)) This is used for de-normalization

Hope this helps everyone who at someday will also go through the hassle of implementing their own PMML executor engine and gets stuck at this topic.

1
On

What I'm about to say depends on what you mean by (min, max).

I'm going to assume that min equals the value where 0.5% of the total lies below and max equals the value where 0.5% of the total lies above.

If we agree on that, a symmetric normal distribution would have a mean value of approximately mean ~ (max+min)/2. (You call the mean the origin.)

Six standard deviations encompasses 99% of a normal distribution, so the standard deviation is approximately sigma ~ (max-min)/6.

The definition of normalized z = (x - mean)/sigma.

With those values you can get yourself back to the denormalized distribution.