I try to build an object detection model, as a part of Master's degree project.
When we work with neural networks - batch size is an important hyperparameter. From previous questions I learned, that each minibatch is randomly sampled without replacement from the dataset (https://stats.stackexchange.com/questions/235844/should-training-samples-randomly-drawn-for-mini-batch-training-neural-nets-be-dr).
However, I am uncertain about TFOD approach to minibatches:
- How does TFOD sample images from training data into a mini-batch? (random sampling without replacement?)
- What do we do, if we sampled without replacement, and reached the end of the dataset? Is the data repeated?
I was trying to look for the answer in the internal functions of TFOD framework, but found nothing than dataset_builder.build() function, that just builds the ready dataset, but is not responsible for batches sampling.
I would appreciate any thoughts! Thank you guys!
def build(
input_reader_config,
batch_size=None,
transform_input_data_fn=None,
input_context=None,
reduce_to_frame_fn=None,
):
"""Builds a tf.data.Dataset."""
This question (or answer) isn't specific to object detection. You have the same "problem" in other tasks, as afar as you train the model with gradient descending algorithms.
First thing to note is why do we need batches: because gradients are too heavy and expensive to be computed for all your dataset at once. So, to reduce this cost (specially in terms of memory), instead of computing gradients for your full dataset, you need to take smaller samples, called batches: you sample a small batch with, let's say, 32 examples on it, then compute the gradients and update the weights. This is called a training step, and need to be repeated untill all your dataset is sampled. On each step, a random batch of 32 examples is sampled, without replacement.
Once all your training dataset is feed to the model, an epoch is completed.
For instance, if your dataset contains 320 examples, and you are using a batch size of 32, 10 training steps complete a single epoch.
From an implementation point of view, usually the dataset is shuffled before an epoch begins and then the shuffled data is returned in consecutive batches, something like this:
Usually this logic is already implemented in standard packages like tensorflow or torch.
That said, usually a single epoch is not enough, so, once it's over, you start another. The proper number of epochs is really problem dependent.
To finalize, consider that some in special cases, epochs may not be defined, as in reinforcement learning problems, and there are also other settings where sampling is not uniformly random or even with replacement, but those are not the standard.