I have a model that looks like below, taking a 224x224x3 image input and classifying it into one of two categories:

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 rescaling_3 (Rescaling)     (None, 224, 224, 3)       0         
                                                                 
 mobilenetv2_1.00_224 (Funct  (None, 7, 7, 1280)       2257984   
 ional)                                                          
                                                                 
 spatial_dropout2d_12 (Spati  (None, 7, 7, 1280)       0         
 alDropout2D)                                                    
                                                                 
 conv2d_9 (Conv2D)           (None, 7, 7, 2048)        2623488   
                                                                 
 spatial_dropout2d_13 (Spati  (None, 7, 7, 2048)       0         
 alDropout2D)                                                    
                                                                 
 conv2d_10 (Conv2D)          (None, 7, 7, 1024)        2098176   
                                                                 
 spatial_dropout2d_14 (Spati  (None, 7, 7, 1024)       0         
 alDropout2D)                                                    
                                                                 
 conv2d_11 (Conv2D)          (None, 7, 7, 256)         262400    
                                                                 
 spatial_dropout2d_15 (Spati  (None, 7, 7, 256)        0         
 alDropout2D)                                                    
                                                                 
 flatten_3 (Flatten)         (None, 12544)             0         
                                                                 
 dropout_3 (Dropout)         (None, 12544)             0         
                                                                 
 dense_3 (Dense)             (None, 2)                 25090     
                                                                 
=================================================================
Total params: 7,267,138
Trainable params: 5,009,154
Non-trainable params: 2,257,984
_________________________________________________________________

And here are the compile parameters:

model.compile(loss='categorical_crossentropy',
    optimizer=RMSprop(lr=0.001),
    metrics=['categorical_accuracy'])

When I train it for any number of epochs, I get a "categorical accuracy" that is around 70% (chance would be 50%). This is the number produced by model.evaluate(test_set)

However, when I actually compare the argmax() of each prediction to the test_set.labels list of correct labels, they are only equal in about 50% of cases. I.e. no better than chance.

This is happening consistently. I understand that maybe the problem is that the problem is too hard to classify, but where does the 70/50 discrepancy come from?

I expected the test set performance to match the output of model.evaluate(test_set), but the performance is much worse.

0

There are 0 best solutions below