How many images should I label from the training set?

15 Views Asked by At

We have 10,000 images and we want to implement a deep-learning model to extract the vegetation. What's the minimum number of images we should label if we want an 80% training set?

We want to use semantic segmentation should we label every object or only the vegetation?

1

There are 1 best solutions below

0
hussein mohamed On

Well, the amount of data you need to annotate is not related to the percentage of training data, if you are going to train fully in a supervised manner, then you need to annotate 100% of the data you have.

A useful approach that might help is to use machine-generated annotation and limit the human labor to correct the machine's mistakes and revise the data, to save hours of expensive human labour.

If the annotation resources are limited, conventional wisdom in the field suggests starting from a strong model may be from a different domain (different data), or similar task, reducing your need for in-domain data significantly, without loss in performance.

You only need to annotate the objects you are interested in, the only reason you will do anything different is if you want to repurpose the data later in a different domain or a different task, then maybe it is cheaper to add additional classes upfront.