I'm trying to build a ML model for a specific use case. I've read up on various different libraries, and attempted to train my own classifiers, but I feel like what I'm doing isn't quite right - the setups for object detection all seem based on the idea that the object you're detecting can have a vast number of forms, and thus the training methods are designed to take that into account. My use case is different than that.
I have static, flat imagery that I want to identify, for example a book cover. It therefore makes sense that I shouldn't need to provide many images of it, but just a single image of what it looks like from the front. I want to train a ML model so that I could show it an image of that book cover after training, and it would recognise it.
The image of the book cover after training may include environmental factors, such as different lighting, or an alternate angle, but the idea is that if the book cover itself is in full view, it should be able to be recognised.
It's proven to be quite difficult to figure out what to do here. Every guide I've come across has been designed for training on objects which can potentially take many forms. Adapting those guides for my purpose hasn't been successful.
I've tried using Turi Create's very simple setup, training it on each single data point I have for each book, and then using that same data for validation, as I obviously don't have a training and validation set. Turi Create takes care of all training details, and is obviously designed for many examples for each class. I feel like I'm badly modifying it here for my purposes. Upon testing, it also doesn't work for object detection.
I've had some limited success using OpenCV's keypoint detection and Nearest Neighbour matching features, but the idea is that there'd be a much broader list of items, perhaps 10k books, and so it's not practical to do image comparison in such a way on each of those.
I've been learning more about ML and Computer Vision over the past month, but it's certainly not my area of expertise - I'm a software developer primarily. Would appreciate any advice I could get here.
Your question has no neat out of the box answer (sorry to say), but there are a few key areas of computer vision / machine learning that you will want to know about to get this solved.
First: if you really want to stay in opencv and existing libraries (as in, you don't want this to turn into an algorithm research project), I suggest the following:
Second: if the above is not adequate, you are in for a more advanced research project. I would still recommend something like a Hough transform or SIFT, because the key insight there is that you should be able to find a filter (or filter-like object) that is really good at recognizing this book cover specifically. That means things like typical deep learning approaches are less useful out of the box. If you really want to go down that path, start with reading about data augmentation, then read about one-shot or few-shot learning, and then read about transfer learning. That's a long road, so I'd strongly favor the first approach I suggest.