I'm new to machine learning and I'm working on a dataset with 14k pictures of sea, forest, glaciers, streets, buildings and mountains (6 classes). I have been training my model with it and achieved a val acc of 91% but for some reason it is biased, when I try to predict new images with my inference code the only classes chosen are glaciers and sea. Here is the Github with the model creation code and the inference code.
train_datagen = ImageDataGenerator(
rotation_range= 20, # Rotate the augmented image by 20 degrees
zoom_range=0.3, # Zoom by 20% more or less
horizontal_flip=True, # Allow for horizontal flips of augmented images
vertical_flip=True, # Allow for vertical flips of augmented images
brightness_range=[0.6, 1.2], # Lighter and darker images
fill_mode='nearest',
preprocessing_function=preprocess_input)
img_data_iterator = train_datagen.flow_from_directory(
# Where to take the data from, the classes are the sub folder names
'../Q2B/archive/seg_train/seg_train/',
class_mode="categorical", # classes are in 2D one hot encoded way
shuffle=True, # shuffle the data, default is true but just to point it out
batch_size=32,
target_size=(150, 150), # This size is the default of mobilenet NN)
validation_generator = ImageDataGenerator(
preprocessing_function=preprocess_input).flow_from_directory(
'../Q2B/archive/seg_test/seg_test/',
class_mode="categorical",
shuffle=True,
batch_size=32,
target_size=(150, 150),)
My guess is that it is related to the way I pre-processed the data.
can you post more of your code. Change the class_mode to 'categorical' for the train and test generators Change the final dense layer from 1 to 2 so this will return scores/probabilities for both classes. So when you use argmax, it will return the index position of the top score indicating which class it has predicted.