Is such a normalization right for wiggly curves?

Question

Is such a normalization right for wiggly curves?

171 Views Asked by Pe Dro At 26 November 2019 at 14:37

I am training a neural network (in C++, without any additional library), to learn a random wiggly function:

f(x)=0.2+0.4x2+0.3sin(15x)+0.05cos(50x)

Plotted in Python as:

lim = 500

for i in range(lim):
  x.append(i)
  p = 2*3.14*i/lim
  y.append(0.2+0.4*(p*p)+0.3*p*math.sin(15*p)+0.05*math.cos(50*p))

plt.plot(x,y)

that corresponds to a curve as :

The same neural network has successfully approximated the sine function quite well with a single hidden layer(5 neurons), tanh activation. But, I am unable to understand what's going wrong with the wiggly function. Although the Mean Square Error seems to dip.(**The error has been scaled up by 100 for visibility):

And this is the expected (GREEN) vs predicted (RED) graph.

I doubt the normalization. This is how I did it:

Generated training data as:

int numTrainingSets = 100;
double MAXX = -9999999999999999;

for (int i = 0; i < numTrainingSets; i++)
    {
        double p = (2*PI*(double)i/numTrainingSets);
        training_inputs[i][0] = p;  //INSERTING DATA INTO i'th EXAMPLE, 0th INPUT (Single input)
        training_outputs[i][0] = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p); //Single output

        ///FINDING NORMALIZING FACTOR (IN INPUT AND OUTPUT DATA)
        for(int m=0; m<numInputs; ++m)
            if(MAXX < training_inputs[i][m])
                MAXX = training_inputs[i][m];   //FINDING MAXIMUM VALUE IN INPUT DATA
        for(int m=0; m<numOutputs; ++m)
            if(MAXX < training_outputs[i][m])
                MAXX = training_outputs[i][m];  //FINDING MAXIMUM VALUE IN OUTPUT DATA

        ///NORMALIZE BOTH INPUT & OUTPUT DATA USING THIS MAXIMUM VALUE 
        ////DO THIS FOR INPUT TRAINING DATA
        for(int m=0; m<numInputs; ++m)
            training_inputs[i][m] /= MAXX;
        ////DO THIS FOR OUTPUT TRAINING DATA
        for(int m=0; m<numOutputs; ++m)
            training_outputs[i][m] /= MAXX;
    }

This is what the model trains on. The validation/test data is generated as follows:

int numTestSets = 500;
    for (int i = 0; i < numTestSets; i++)
    {
        //NORMALIZING TEST DATA USING THE SAME "MAXX" VALUE 
        double p = (2*PI*i/numTestSets)/MAXX;
        x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

        ///Actual Result
        double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);
        y1.push_back(res);  //FORMS THE GREEN CURVE FOR PLOTTING

        ///Predicted Value
        double temp[1];
        temp[0] = p;
        y2.push_back(MAXX*predict(temp));  //FORMS THE RED CURVE FOR PLOTTING, scaled up to de-normalize 
    }

Is this normalizing right? If yes, what could probably go wrong? If no, what should be done?

Original Q&A

There are 2 best solutions below

a_guest On 27 November 2019 at 11:33

There's nothing wrong with using that normalization, unless you use a fancy weight initialization for the neural network. It rather seems that something goes wrong during training but without further details on that side, it's hard to pinpoint the problem.

I ran a quick crosscheck using tensorflow (MSE loss; Adam optimizer) and it does converge in that case:

Here's the code for reference:

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf


x = np.linspace(0, 2*np.pi, 500)
y = 0.2 + 0.4*x**2 + 0.3*x*np.sin(15*x) + 0.05*np.cos(50*x)


class Model(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.h1 = tf.keras.layers.Dense(5, activation='tanh')
        self.out = tf.keras.layers.Dense(1, activation=None)

    def call(self, x):
        return self.out(self.h1(x))


model = Model()
loss_object = tf.keras.losses.MeanSquaredError()
train_loss = tf.keras.metrics.Mean(name='train_loss')
optimizer = tf.keras.optimizers.Adam()


@tf.function
def train_step(x, y):
    with tf.GradientTape() as tape:
        loss = loss_object(y, model(x))
    gradients = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(gradients, model.trainable_variables))
    train_loss(loss)


# Normalize data.
x /= y.max()
y /= y.max()
data_set = tf.data.Dataset.from_tensor_slices((x[:, None], y[:, None]))
train_ds = data_set.shuffle(len(x)).batch(64)

loss_history = []
for epoch in range(5000):
    for train_x, train_y in train_ds:
        train_step(train_x, train_y)

    loss_history.append(train_loss.result())
    print(f'Epoch {epoch}, loss: {loss_history[-1]}')
    train_loss.reset_states()

plt.figure()
plt.xlabel('Epoch')
plt.ylabel('MSE loss')
plt.plot(loss_history)

plt.figure()
plt.plot(x, y, label='original')
plt.plot(x, model(list(data_set.batch(len(x)))[0][0]), label='predicted')
plt.legend()
plt.show()

**Pe Dro** · Accepted Answer · 2019-11-28T06:02:22.453000

I found the case to be not so regular and this was the mistake: 1) I was finding the normalizing factor correctly, but had to change this:

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find and update Normalization factor(as shown in the question)

    //Normalize the training example
 }

to

 for (int i = 0; i < numTrainingSets; i++)
 {
    //Find Normalization factor (as shown in the question)
 }

  for (int i = 0; i < numTrainingSets; i++)
 {    
    //Normalize the training example
 }

Also, the validation set was earlier generated as :

int numTestSets = 500;
for (int i = 0; i < numTestSets; i++)
{
    //Generate data
    double p = (2*PI*i/numTestSets)/MAXX;
    //And other steps...
}

whereas the Training data was generated on numTrainingSets = 100. Hence, p generated for training set and the one generated for validation set lies in different range. So, I had to make ** numTestSets = numTrainSets**.

Lastly,

Is this normalizing right?

I had been wrongly normalizing the actual result too! As shown in the question:

double p = (2*PI*i/numTestSets)/MAXX;
x.push_back(p);     //FORMS THE X-AXIS FOR PLOTTING

///Actual Result
double res = 0.2+0.4*pow(p, 2)+0.3*p*sin(15*p)+0.05*cos(50*p);

Notice: the p used to generate this actual result has been normalized (unnecessarily).

This is the final result after resolving these issues...

Is such a normalization right for wiggly curves?

There are 2 best solutions below

Related Questions in C++

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORK

Related Questions in NORMALIZATION

Related Questions in FUNCTION-APPROXIMATION

Trending Questions

Popular # Hahtags

Popular Questions