Question:
Using MLKit, I am familiar with how to customize the classifier model, but is there any way to customize or retrain the object detection model?
Background
I'm working on a robotics project where an Android powered robot has to detect other robots and charging stations. I'm working on retraining the image classifier to properly classify robots and charging stations and this seems to be fairly straight forward, given a large enough image set. In order to generate this image set, I run a random walk with the robot and have it take a photo each time an object is recognized and continue taking a photo every 0.5 seconds while said object is in the viewframe. I also crop the image to the bounding box and name it using the classifier for easier grouping. After running this for a half hour or so I inspected the images and found that the robots themselves are detected quite readily, under various classifiers, but the charging stations are not. My random walk generated about 1000 images, and 100-150 of those were of other robots, whereas I only captured about 5 images of the charging stations.
When I try to manually position the robots around the charging stations I note that the placement has to be very specific before the object is detected. This led me to the idea that I'd like to retrain the object detection model somehow to better recognize my charging stations. From the docs, all I can find are ways to retrain the classifier, not the object detection. Is there any way to do this?
Code
Although I've modified it quite a bit to do the image capture and other robotic things, the base code for what I'm using is the default vision-quickstart object detection module with a custom classifier.
Images
Note the charging station (right) is a bit larger than one of the wheels on the robot (left) for scale and it fits under the robot between the wheels if that helps visualize it as well.
Edit 1
I just tried out the TF Lite object detection sample app which lies outside MLKit, and it is immediately apparent that this does a much better job at detecting the charging stations and smaller objects in general. I tried using the detect model used in this example, in the mlkit but it looks like they are not compatible. I am getting errors:
E/native: calculator_graph.cc:772 INVALID_ARGUMENT: CalculatorGraph::Run() failed in Run:
Calculator::Open() for node "[BoxClassifierCalculator, BoxClassifierCalculator with output stream: detection_results0]" failed: #vk Unexpected number of dimensions for output index 0: got 3D, expected either 2D (BxN with B=1) or 4D (BxHxWxN with B=1, W=1, H=1). [type.googleapis.com/mediapipe.StatusList='\n\xb7\x02\x08\x03\x12\x85\x02\x43\x61lculator::Open() for node \"[BoxClassifierCalculator, BoxClassifierCalculator with output stream: detection_results0]\" failed: #vk Unexpected number of dimensions for output index 0: got 3D, expected either 2D (BxN with B=1) or 4D (BxHxWxN with B=1, W=1, H=1).\x1a+\n$tflite::support::TfLiteSupportStatus\x12\x03\x34\x30\x30']
Maybe I can rework this to meet the model requirements? Or am I trying to fit a square peg into a round hole?
Currently, we do not provide API to swap out the detector for ODT in MLKIt. We have a plan to make the detector also swappable.
The model from TFLite sample is a single model containing both detector and classifier, while the models in MLKit are two separated models. To swap the detector model only would be non-trivial. Let's try some workaround below first:
To make the charging station easier to be detected, you could try to put more 'texture' on the charging point comparing with background. For example, different color dots on the edge of the charging points.
Also, if you are using the single object mode (primary object), it requires the object is in the center of the image frame. You could also try multiple objects mode which does not require the object on the center, but you may need to filter out other non-interest detection results.