Simple NN aiming to learn a of non-linear equations can't converge

19 Views Asked by At

I've got a bunch of equations using sums, multiplication and min(x,0) or max(x,0) that yield a result (one output, 18 inputs).

I'm trying to have an NN model in pytorch learn these so I generate quick results.

I generated 30k random X-Y pairs in excel (just using RND()*100-50 for X and calculating Y). I uploaded the pairs with pandas and wrote an NN with ReLu (which I hoped would handle the non-linearity). Here's the net:

class MyModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super().__init__()
        self.flatten = nn.Flatten()  # Flatten input data
        self.hidden_layer = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_size),
            nn.Linear(input_size, hidden_size),
            nn.Linear(input_size, hidden_size),

            nn.ReLU()
        )
        self.output_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.flatten(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer(x)
        output = self.output_layer(x)
        return output

sizes are 18 for inputs and hidden layer and 1 for output.

Can't converge, left with quite a big error. Thought that'd be a simple task for an NN, to learn that set of equations, there's no noise or anything. What can I do to make this work?

1

There are 1 best solutions below

1
Karl On

Your nn.Sequential setup doesn't make sense. nn.Sequential runs the model modules in the order listed. Yours:

        self.hidden_layer = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.BatchNorm1d(hidden_size),
            nn.Linear(input_size, hidden_size),
            nn.Linear(input_size, hidden_size),

            nn.ReLU()
        )

Has linear layers back to back, which is redundant since the composition of two linear layers is still a linear layer. Your sizes don't line up. The first layer maps an input of size input_size to hidden_size, but your second layer expects the input to be of size input_size. This works for you currently because you are using the same size for input and hidden, but this will throw an error if that is ever not the case.

You want something like this:

self.hidden_layer = nn.Sequential(
    nn.Linear(input_size, hidden_size),
    nn.ReLU(),
    nn.BatchNorm1d(hidden_size),
    nn.Linear(hidden_size, hidden_size),
    nn.ReLU(),
    nn.BatchNorm1d(hidden_size)
)

That example has two blocks of linear/relu/batchnorm. You can add more if you want.

Your forward method is also weird.

First, make sure nn.Flatten is doing what you expect. Check the input/output shapes to be sure.

Second, you apply the same block of layers three times. If you want more layers, you should add them to the nn.Sequential block instead of passing different activations through the same layers 3 times.