How can i edit the "wake-word-detection notebook" on coursera so it fit my own word?

24 Views Asked by At

I have understood and solved the notebook available on Coursera for the Deep Learning Specialization (Sequence Models course) by Andrew Ng. In the notebook, he provides a detailed walkthrough for building a wake word detection model. However, at the end, he loads a pre-trained model trained on the word "activate."

I attempted to use Google Colab and my own data. I collected 369 voices of people saying "Alexa," which are available on Kaggle. However, they have a sample rate of 16000KHz.

I also used Google voice commands as negative sounds and collected some clips from YouTube that contain various environmental sounds.

I followed all the steps exactly as instructed, and i'm using google colab for training. but when I try to create the dataset, the RAM quickly fills up, and I cannot create 4000 samples as mentioned by Andrew in his notebook.

here is my code of "create_training_examples":

nsamples = 4000
X_train = []
Y_train= []
X_test = []
Y_test = []
train_count = 0
test_count = 0

to_test = False
for i in range(0, nsamples):

    if i % 500 == 0:
      print(i)
    rand = random.randint(0,61)

    if i%5 == 0:


      x, y = create_data_example(backgrounds_list[rand], alexa_list, negatives_list, Ty, name=str(i),to_test = True)
      X_test.append(x.swapaxes(0,1))
      Y_test.append(y.swapaxes(0,1))
      test_count+=1

    else:

      x, y = create_data_example(backgrounds_list[rand], alexa_list, negatives_list, Ty, name=str(i),to_test = False)
      X_train.append(x.swapaxes(0,1))
      Y_train.append(y.swapaxes(0,1))
      train_count+=1


print("Number of training samples:", train_count)
print("Number of testing samples:", test_count)


X_train = np.array(X_train)
Y_train = np.array(Y_train)

np.save('XY_train/X_train.npy', X_train)
np.save('XY_train/Y_train.npy', Y_train)


X_test = np.array(X_test)
Y_test = np.array(Y_test)
np.save('XY_test/X_test.npy', X_test)
np.save('XY_test/Y_test.npy', Y_test)


print('done saving')

print('X_train.shape: ',X_train.shape)
print('Y_train.shape: ',Y_train.shape)


here is the model i use:

def model(input_shape):

    
    X_input = Input(shape = input_shape)
    

    X = Conv1D(196,15,strides=4)(X_input)                                 
    X = BatchNormalization()(X)                              
    X = Activation('relu')(X)                                
    X = Dropout(0.8)(X)                               

    
    X = GRU(128,return_sequences = True)(X)
    X = Dropout(0.8)(X)        
    X = BatchNormalization()(X)
    
    
    X = GRU(128,return_sequences=True)(X)               
    X = Dropout(0.8)(X)                                 
    X = BatchNormalization()(X)                         
    X = Dropout(0.8)(X)                                 
    
    X = TimeDistributed(Dense(1, activation = "sigmoid"))(X) # time distributed  (sigmoid)


    model = Model(inputs = X_input, outputs = X)
    
    return model


and for training:

opt = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, decay=0.01)
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=["accuracy"])
model.fit(X_train, Y_train, batch_size = 5, epochs=20,validation_data=(X_test,Y_test))


I have used the notebook as it is, even following the same method for feature extraction. The only thing I modified was the training data, using my own data. However, what happened is that the RAM quickly filled up. And when I reduced the number of samples to 1600KHz or 8000KHz, I didn't get good results at all.

i also tried to edit batch_size,learning_rate. nothing change..

am i doing something wrong?

Do you have any advice or suggestions please?

0

There are 0 best solutions below