I'm first time building a CNN model for image classification and i'm a little bit confused about what would be the input shape for each type (1D CNN, 2D CNN, 3D CNN) and how to fix the number of filters in the convolution layer. My data is 100x100x30 where 30 are features. Here is my essay for the 1D CNN using the Functional API Keras:
def create_CNN1D_model(pool_type='max',conv_activation='relu'):
input_layer = (30,1)
conv_layer1 = Conv1D(filters=16, kernel_size=3, activation=conv_activation)(input_layer)
max_pooling_layer1 = MaxPooling1D(pool_size=2)(conv_layer1)
conv_layer2 = Conv1D(filters=32, kernel_size=3, activation=conv_activation)(max_pooling_layer1)
max_pooling_layer2 = MaxPooling1D(pool_size=2)(conv_layer2)
flatten_layer = Flatten()(max_pooling_layer2)
dense_layer = Dense(units=64, activation='relu')(flatten_layer)
output_layer = Dense(units=10, activation='softmax')(dense_layer)
CNN_model = Model(inputs=input_layer, outputs=output_layer)
return CNN_model
CNN1D = create_CNN1D_model()
CNN1D.compile(loss = 'categorical_crossentropy', optimizer = "adam",metrics = ['accuracy'])
Trace = CNN1D.fit(X, y, epochs=50, batch_size=100)
However, while trying the 2D CNN model by just changing Conv1D, Maxpooling1D to Conv2D and Maxpooling2D, i got the following error :
ValueError: Input 0 of layer conv2d_1 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: (None, 30, 1)
Can anyone please tell me how would be the input shape for 2D CNN and 3D CNN ? And what can be done on input data preprocessing?
TLDR; your
X_train
can be looked at as (batch, spatial dims..., channels). A kernel applies to the spatial dimensions for all channels in parallel. So a 2D CNN, would require two spatial dimensions(batch, dim 1, dim 2, channels)
.So for
(100,100,3)
shaped images, you will need a 2D CNN that convolves over 100 height and 100 width, over all the 3 channels.Lets, understand the above statement.
First, you need to understand what CNN (in general) is doing.
Kernel moves over the spatial dimensions
Now, Let's say you have 100 images (called batches). Each image is 28 by 28 pixels and has 3 channels R, G, B (which are also called feature maps in context to CNNs). If I were to store this data as a tensor, the shape would be
(100,28,28,3)
.However, I could just have an image that doesn't have any height (may like a signal) OR, I could have data that has an extra spatial dimension such as a video (height, width, and time).
In general, here is how the input for a CNN-based neural network looks like.
Same kernel, all channels
The second key point you need to know is, A 2D kernel will convolve over 2 spatial dimensions BUT the same kernel will do this over all the feature maps/channels. So, if I have a
(3,3)
kernel. This same kernel will get applied over R, G, B channels (in parallel) and move over theHeight
andWidth
of the image.Operation is a dot product
Finally, the operation (for a single feature map/channel and single convolution window) can be visualized like below.
Therefore, in short -
Let's take the example of tensors with single feature maps/channels (so, for an image, it would be greyscaled) -
Let's try this with code -
For applying 1D CNN -
For applying 2D CNN -
For 3D CNN -