I have been trying to train an object detection model using the tensorflow object detection API.
The network trains well when batch_size is 1. However, increasing the batch_size leads to the following error after some steps.
Network : Faster RCNN
train_config: {
batch_size: 1
optimizer {
momentum_optimizer: {
learning_rate: {
manual_step_learning_rate {
initial_learning_rate: 0.0002
schedule {
step: 25000
learning_rate: .00002
}
schedule {
step: 50000
learning_rate: .000002
}
}
}
momentum_optimizer_value: 0.9
}
use_moving_average: false
}
Error:
INFO:tensorflow:Error reported to Coordinator: , ConcatOp : Dimensions of inputs should match: shape[0] = [1,841,600,3] vs. shape[3] = [1,776,600,3]
[[node concat (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/legacy/trainer.py:190) ]]
Errors may have originated from an input operation.
Input Source operations connected to node concat:
Preprocessor_3/sub (defined at /home/<>/.virtualenvs/dl4cv/lib/python3.6/site-packages/object_detection-0.1-py3.6.egg/object_detection/models/faster_rcnn_inception_v2_feature_extractor.py:100)
The training with increased batch_size works on SSD mobilenet however.
While, I have solved the issue for my use-case at the moment, posting this question in SO to understand the reason for this behavior.
Just from the error it seems like your individual inputs have different sizes. I suppose it tries to concatenate (ConcatOp) 4 single inputs into one tensor to build a mini batch as the input.
While trying to concatenate it has one input with
841x600x3and one input with776x600x3(ignored the batch dimension). So obviously 841 and 776 are not equal but they should be. With a batch size of 1 the concat function is probably not called, since you don't need to concatenate inputs to get a minibatch. There also seems to be no other component that relies on a pre defined input size, so the network will train normally or at least doesn't crash.I would check the dataset you are using and check if this is supposed to be this way or you have some faulty data samples. If the dataset is ok and this can in fact happen you need to resize all inputs to some kind of pre defined resolution to be able to combine them probably into a minibatch.