Unterstanding the feedforward function in Michael Nielsens book

266 Views Asked by At

I'm currently reading his awesome book (http://neuralnetworksanddeeplearning.com/chap1.html) and I think I get most of these things quite well so far. The math requires some thinking but is manageable.

What simply doesn't want to make any sense to me is what the feedforwardfunction does and how it can work. I couldn't get it for so long now that I decided I need to create an acc here and ask for help.

    def feedforward(self, a):
    """Return the output of the network if ``a`` is input."""
    for b, w in zip(self.biases, self.weights):
        a = sigmoid(np.dot(w, a)+b)
    return a

You might say now: "Hey this is pretty straight forward, what cannot be understood about it" and basically you are right. But what bugs me is the fact, that this network is supposed to classify digits - and the code seems to work just fine. If I run it on my machine it shows me the progress as it is supposed to do. So the result of net.feedforward(image) should be an array with 10 entries that indicates as which digit the network classifies our picture. But the actual output is an array with 10 times 30 entries --> This makes my head hurt. Neither summing these entries, nor averaging them seems to give something close to the desired (0,0,0,0,1,0,0,0,0,0) that you would want for an image of a 4. The even weirder part of it is, that the rest of the code seems to assume that it's going to get just this:

    def evaluate(self, test_data):
    """Return the number of test inputs for which the neural
    network outputs the correct result. Note that the neural
    network's output is assumed to be the index of whichever
    neuron in the final layer has the highest activation."""
    test_results = [(np.argmax(self.feedforward(x)), y)
                    for (x, y) in test_data]
    return sum(int(x == y) for (x, y) in test_results)

If I do this: test=net.feedforward(image) and then use np.argmax(test) I get an arbitrary result depending on the input between 0 and 299 and not 0 to 9 as expected in the function.

Nonetheless the code seems to work - this drives me crazy. What is the explanation for this that resolves the knot in my head?

1

There are 1 best solutions below

1
On

Basically your feedforward function returns 10 sigmoid values for each input, each one indicating how "sure" the network is about that output. In order to retrieve the correct prediction for each input, you have to pass the axis argument to the argmax function, otherwise the output is the index of the biggest value in the WHOLE output. What you need is the biggest value for each COLUMN (or row, the one that indicates each sample).

Hope this helps

[Edit]

from the Numpy docs for argmax:

axis : int, optional. By default, the index is into the flattened array, otherwise along the specified axis.

This tells you why you get the wrong result