I have understood and solved the notebook available on Coursera for the Deep Learning Specialization (Sequence Models course) by Andrew Ng. In the notebook, he provides a detailed walkthrough for building a wake word detection model. However, at the end, he loads a pre-trained model trained on the word "activate."
I attempted to use Google Colab and my own data. I collected 369 voices of people saying "Alexa," which are available on Kaggle. However, they have a sample rate of 16000KHz.
I also used Google voice commands as negative sounds and collected some clips from YouTube that contain various environmental sounds.
I followed all the steps exactly as instructed, and i'm using google colab for training. but when I try to create the dataset, the RAM quickly fills up, and I cannot create 4000 samples as mentioned by Andrew in his notebook.
here is my code of "create_training_examples":
nsamples = 4000
X_train = []
Y_train= []
X_test = []
Y_test = []
train_count = 0
test_count = 0
to_test = False
for i in range(0, nsamples):
if i % 500 == 0:
print(i)
rand = random.randint(0,61)
if i%5 == 0:
x, y = create_data_example(backgrounds_list[rand], alexa_list, negatives_list, Ty, name=str(i),to_test = True)
X_test.append(x.swapaxes(0,1))
Y_test.append(y.swapaxes(0,1))
test_count+=1
else:
x, y = create_data_example(backgrounds_list[rand], alexa_list, negatives_list, Ty, name=str(i),to_test = False)
X_train.append(x.swapaxes(0,1))
Y_train.append(y.swapaxes(0,1))
train_count+=1
print("Number of training samples:", train_count)
print("Number of testing samples:", test_count)
X_train = np.array(X_train)
Y_train = np.array(Y_train)
np.save('XY_train/X_train.npy', X_train)
np.save('XY_train/Y_train.npy', Y_train)
X_test = np.array(X_test)
Y_test = np.array(Y_test)
np.save('XY_test/X_test.npy', X_test)
np.save('XY_test/Y_test.npy', Y_test)
print('done saving')
print('X_train.shape: ',X_train.shape)
print('Y_train.shape: ',Y_train.shape)
here is the model i use:
def model(input_shape):
X_input = Input(shape = input_shape)
X = Conv1D(196,15,strides=4)(X_input)
X = BatchNormalization()(X)
X = Activation('relu')(X)
X = Dropout(0.8)(X)
X = GRU(128,return_sequences = True)(X)
X = Dropout(0.8)(X)
X = BatchNormalization()(X)
X = GRU(128,return_sequences=True)(X)
X = Dropout(0.8)(X)
X = BatchNormalization()(X)
X = Dropout(0.8)(X)
X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed (sigmoid)
model = Model(inputs = X_input, outputs = X)
return model
and for training:
opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
model.fit(X_train, Y_train, batch_size = 5, epochs=20,validation_data=(X_test,Y_test))
I have used the notebook as it is, even following the same method for feature extraction. The only thing I modified was the training data, using my own data. However, what happened is that the RAM quickly filled up. And when I reduced the number of samples to 1600KHz or 8000KHz, I didn't get good results at all.
i also tried to edit batch_size,learning_rate. nothing change..
am i doing something wrong?
Do you have any advice or suggestions please?