I am currently working with lstm. I have a dataset of a number of sentences about transactional info and I want to extract information, suppose amount, date and transactionWith. I already tried with basic lstm where my system tried to predict each word of a given sequence as amount, date, transactionWith or irrelevant.
I have made my training data like this:
Input:
You gave 100.00 to John on 13-08-2018
Target:(labelled every word)
ir ir amount ir transactionWith ir date
You can see that the entire dataset has a lot of "ir" or irrelevant tag and I think that will make my system biased to predict "ir" for test data.
Now I want to try using seq2seq model of tensorflow where the input is a transactional sentence and the target is a seq of the extracted information. An example would be like this -
Input:
You gave 100.00 to John on 13-08-2018.
Target:
100.00 13-08-2018 John
Here all my target seq will maintain a fixed format like the first one is the amount, the second one is date the third one is the transactionWith etc.
Can I do this like a language translation model with encoder for input seq and decoder for target seq and how can I make sure that my predicted seq for test data is from within the vocabulary of the given single input sentence and not from the entire target vocabulary?
Thank you all the awesome people in advance. :)