I'm working on building a classifier CNN that takes inputs of satellite imagery where there are large parts of the image that are masked out (i.e. set to nans). After converting these images to a more CNN friendly image format (uint8) I get this image, where nans are converted to 0s. My concern is that if I use these types of images as training data, the CNN will learn to ignore the parts of the image that are 0s, which is an issue since other images I'll be classifying may actually have important data in those regions. The solutions I've been able to come up with are:
- Ensure I have enough varied training data with variable masked regions so that the network doesn't learn the mask position.
- Fill in the masked areas with some sort of random noise.
But I'm not sure if either of these will work. Is there a way to push the original masked image, with nans intact or as a masked numpy array, through the training stages of a CNN?
Before training, use a library such as Albumentations to augment the images to create more variety. Augmentations such as rotate, flip up/down, transpose and shift will vary the positions of the masked areas and give your model the variety it needs to not learn the masked areas.