How to use feature maps of CNN to localize obect on the image?

665 Views Asked by At

I'm learning CNN now. I understand how convolutional and pooling layers work, I understand how and why feature maps are created. How do I then localize object? I.e. I'm using Helen dataset where every photo has 194 points of face (contour, eyes, nose and mouth). Feeding those faces to my neural network I can receive maps features and calculate from them probability of whether the image has eyes in it, for example. But how from that feature maps I can get know where exactly those eyes are?

The only one decision that comes to my mind is the following: presume we have image 16x16, then with three filters 3x3 we receive three "basic feature" maps 14x14, without pooling (because it makes position less accurate) we process them with three more filters 3x3 to receive 9 maps with more general features. On these maps we find the detected position of required feature (eye), go to previous layer to find elements from which this eye was received (lets call them summand basic features), and from each of this elements we go to even more previous layer (input) and mark all input layer elements that participated in creating of summand basic features.

This way seems to be too complicated and inaccurate, that's why I'm asking for proper method of localizing images.

1

There are 1 best solutions below

3
On BEST ANSWER

What you want is called "object localization". There are many techniques for that. See f.e. lectures 8 and 9 in Stanford CS231 https://www.youtube.com/playlist?list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC (Feb 1 and Feb 3 in http://cs231n.stanford.edu/syllabus.html).