I have 70k samples of text which I have embedded using Keras 'one hot' preprocessing. This gives me an array of [40, 20, 142...]
which I then pad for a length of 28 (the longest sample length). All I am trying to do is predict these values to some categorical label (0 to 5 lets say). When I train the model I cannot get anything beyond -.13% accuracy (currently my error is this I have tried many ways to pass the input).
This is my data currently and am just trying to create a simple LSTM. Again my data is X -> [length of 28 integer values, embeddings] and Y -> [1 integer of length 3, (100, 143 etc.)]. Any idea what I am doing wrong?? I have asked many people and no one has been able to help. Here is the code for my model... any ideas? :(
optimizer = RMSprop(lr=0.01) #saw this online, no idea
model = Sequential()
model.add(Embedding(input_dim=28,output_dim=1,init='uniform')) #28 features, 1 dim output?
model.add(LSTM(150)) #just adding my LSTM nodes
model.add(Dense(1)) #since I want my output to be 1 integer value
model.compile(loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy'])
print(model.summary())
Edit:
using model.add(Embedding(input_dim=900,output_dim=8,init='uniform'))
seems to work however still the accuracy never improves, I am at a loss of what to do.
I have two suggestions.