Image sequence detection with Keras, Convolutional and Stateful Neural Network

203 Views Asked by At

I am trying to write a pretty complicated neural network (at least for me) in keras that needs to combine both a common CNN structure and an LSTM/GRU layer.

Basically, I have a dataset of climatological maps of the Mediterranean sea, each map details the wind, pressure and other parameters. I am studying Medicanes (Mediterranean hurricanes) and my goal is to create a neural network that can classify each map with a label zero if there is no trace of such hurricanes or one if the map contains one.

In order to achieve that I need a network with two parts:

  1. feature extractor (normal CNN).
  2. temporal layer (LSTM/GRU).

The main cause of this is that each map is correlated with the previous one because the formation and life cycle of a Medicane can take several days to complete.

Important note: the dataset is too big to be uploaded all at once so I have to work one batch at a time.


I am working with Keras and I found it pretty challenging to adapt its standard framework to my needs so I have come up with some peculiar flow to feed my data into the network.

In particular, I found it hard to pass both my batch size and my time-step parameter to the GRU layer using a more standard alternative.

This is what I tried:

I am positively sure I have overcomplicated the task, but, as I said I am not very proficient with Keras and TensorFlow.

The main problem was that I could not find a way to import the data both in a batch (for RAM reasons) and in a sequence of 10-15 pictures (to be used as the time steps in the GRU layer).

I solved this problem by importing batches of 120 maps in order (no shuffle) and I created a way to turn these batches into the sequence of images I needed then I proceeded to re-batch the sequences and feed them to the model manually.


Data Import

batch_size=120

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    "./Figures_1/Train",
    validation_split=None,
    subset=None,
    labels="inferred",
    label_mode="binary",
    color_mode="rgb",
    interpolation='bilinear',
    batch_size=batch_size,
    image_size=(600, 600),
    shuffle=False,
    seed=123
)

Get a sequence of Images

Here, I break down the 120 map batches into sequences of 60 observations, and I return each sequence one at a time.

sequence_lengh=60

def sequence_x(train_dataset):
    
    x_numpy = np.asarray(list(map(lambda x: x[0], tfds.as_numpy(train_dataset))),dtype=object)
    
    for element in range(0,x_numpy.shape[0]):
        for i in range(0, x_numpy.shape[0],sequence_lengh):
            x_seq = x_numpy[element][i:i+sequence_lengh]
            yield x_seq
        
def sequence_y(train_dataset):
    
    y_numpy = np.asarray(list(map(lambda x: x[1], tfds.as_numpy(train_dataset))),dtype=object)
    
    for element in range(0,y_numpy.shape[0]):
        for i in range(0, y_numpy.shape[0],sequence_lengh):
            y_seq = y_numpy[element][i:i+sequence_lengh]
            yield y_seq

CNN Model

I build the CNN model based on a pre-trained DenseNet

from keras.layers import TimeDistributed, GRU

def build_convnet(shape=(600, 600, 3)):
    
    inputs = keras.Input(shape = shape)
    x = inputs

    # preprocessing
    x = keras.applications.densenet.preprocess_input(x)

    #Convbase
    x = convBase(x)
    x = layers.Flatten()(x)

    # Fine tuning
    x = keras.layers.Dense(1024, activation='relu')(x)
    x = layers.Dropout(0.2)(x)
    x = keras.layers.Dense(512, activation='relu')(x)
    x = keras.layers.GlobalMaxPool2D()
    
    return x

GRU Model

I build the time part of the network with a GRU layer

def action_model(shape=(15, 600, 600, 3), nbout=15):
    # Create our convnet with (112, 112, 3) input shape
    convnet = build_convnet(shape[1:]) #[1:]
    
    # then create our final model
    model = keras.Sequential()
    # add the convnet with (5, 112, 112, 3) shape
    model.add(TimeDistributed(convnet, input_shape=shape))
    # here, you can also use GRU or LSTM
    model.add(GRU(64))
    # and finally, we make a decision network
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(15, activation='softmax'))
    return model

Transfer Learning

I retrain a part of the GRU

convBase = DenseNet121(include_top=False, weights=None, input_shape=(600,600,3), pooling="avg")

for layer in convBase.layers: 
  if 'conv5' in layer.name:
    layer.trainable = True
for layer in convBase.layers: 
  if 'conv4' in layer.name:
    layer.trainable = True

Model Compile

Model compilation ( image size= 600x600x3)

INSHAPE=(15, 600, 600, 3) # (5, 112, 112, 3)
model = action_model(INSHAPE, 1)
optimizer = keras.optimizers.Adam(0.001)

model.compile(
    optimizer,
    'categorical_crossentropy',
    metrics='accuracy'
)

Model Fit

Here I manually batch my data. I turn an array (60, 600, 600, 3) into a (4,15,600,600) array. Meaning 4 batches each one containing a 15-map long sequence.


epochs = 10

for value in range(0, epochs):
    
    train_x, train_y = sequence_x(train_ds), sequence_y(train_ds)
    val_x, val_y = sequence_x(validation_ds), sequence_y(validation_ds)
    
    for i in range(0,278): #
        
        x = next(train_x, "none")
        y = next(train_y, "none")
        
        if (x!="none" or y!="none"):

            if (np.any(x) and np.any(y)):

                x_stack = np.stack((x[:15], x[15:30], x[30:45], x[45:]))
                y_stack = np.stack((y[:15], y[15:30], y[30:45], y[45:]))
                y_stack=y_stack.reshape(4,15)

                model.fit(x=x_stack, y=y_stack, 
                            validation_data=None, 
                            batch_size=None,
                            shuffle=False
                            )

            else:
                continue
        else:
            continue

The idea is to get a model that, when presented with a sequence of images, can categorize each one of them with a 0 or a 1 if they have a Medicane or not.


The model does compile without any errors but the results it provides are horrible:

(Image 1).

What am I doing incorrectly? Is there a more effective way to write all of this?

0

There are 0 best solutions below