How to prepare data for a 1d CNN

476 Views Asked by At

I was wondering if someone could clear up my confusion. I have this code:

ef create_data(df):

    logits = []
    labels = []

    for x in range(df.shape[0]):

        current = df.iloc[x]

        logits.append(np.array([current["Taste"],current["Look"]]))
        labels.append(current["Score"])

    return np.array(logits), np.array(labels)

X, y = create_data(df)

Tx, Testx, Ty, Testy = train_test_split(X,y,train_size=0.8)

def create_model():

    model = Sequential()
    model.add(Conv1D(128,3,input_shape=(2,1),activation='relu'))
    model.add(MaxPool1D())
    model.add(Conv1D(64,3,activation='relu'))
    model.add(MaxPool1D())
    model.add(BatchNormalization())
    model.add(Flatten())
    model.add(Dense(64,activation='relu'))
    model.add(Dense(1,activation='relu'))

    return model

model = create_model()
model.compile(optimizer='adam',loss='mse',metrics=['accuracy'])

model.fit(Tx, Ty, batch_size=10, epochs=10)

and I get this error:

ValueError: Input 0 of layer sequential is incompatible with the layer: : expected min_ndim=3, found ndim=2. Full shape received: (None, 2)

I know this has something to do with how I'm preparing my data. But I'm super confused. When I researched there was a lot of mention of timesteps but how does that apply to vector classification?

The question that I hope to get answered is how do I prepare the data correctly?

1

There are 1 best solutions below

3
On

Input shape for the first Conv1D layer will be number of data points per sample X number of channels. Think of this as a single row of pixels so the input shape will be number of columns X channels. If it was an RGB image then it will be channels==3, but since you are using just the data points it will be 1.

Working example:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def create_model():
    model = keras.Sequential()
    model.add(layers.Conv1D(128,3,input_shape=(100,1), activation='relu'))
    model.add(layers.MaxPool1D())
    model.add(layers.Conv1D(64,3,activation='relu'))
    model.add(layers.MaxPool1D())
    model.add(layers.BatchNormalization())
    model.add(layers.Flatten())
    model.add(layers.Dense(64,activation='relu'))
    model.add(layers.Dense(1,activation='relu'))

    return model

model = create_model()
model.compile(optimizer='adam',loss='mse',metrics=['accuracy'])

# dummy 500 samples each having 100 data points
x = np.random.randn(500, 100, 1)
y = np.random.randn(500, 1)
model.fit(x, y, batch_size=10, epochs=2)

Output:

Epoch 1/2
50/50 [==============================] - 1s 8ms/step - loss: 0.9799 - accuracy: 0.0000e+00
Epoch 2/2
50/50 [==============================] - 0s 7ms/step - loss: 0.9930 - accuracy: 0.0000e+00