I'm trying to train a language model with LSTM based on Penn Treebank (PTB) corpus.
I was thinking that I should simply train with every bigram in the corpus so that it could predict the next word given previous words, but then it wouldn't be able to predict next word based on multiple preceding words.
So what exactly is it to train a language model?
In my current implementation, I have batch size=20 and the vocabulary size is 10000, so I have 20 resulting matrices of 10k entries (parameters?) and the loss is calculated by making comparison to 20 ground-truth matrices of 10k entries, where only the index for actual next word is 1 and other entries are zero. Is this a right implementation? I'm getting perplexity of around 2 that hardly changes over iterations, which is definitely not in a right range of what it usually is, say around 100.
how to learn language model?
102 Views Asked by ytrewq At
1
There are 1 best solutions below
Related Questions in MACHINE-LEARNING
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
Related Questions in NLP
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
Related Questions in LSTM
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
Related Questions in LANGUAGE-MODEL
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
Related Questions in PENN-TREEBANK
- Add image to JCheckBoxMenuItem
- How to access invisible Unordered List element with Selenium WebDriver using Java
- Inheritance in Java, apparent type vs actual type
- Java catch the ball Game
- Access objects variable & method by name
- GridBagLayout is displaying JTextField and JTextArea as short, vertical lines
- Perform a task each interval
- Compound classes stored in an array are not accessible in selenium java
- How to avoid concurrent access to a resource?
- Why does processing goes slower on implementing try catch block in java?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular # Hahtags
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
I think you don't need to train with every bigram in the corpus. Just use a sequence to sequence model, and when you predict the next word given previous words you just choose the one with the highest probability.
Yes, per step of decoding.
You can first read some open source code as a reference. For instance: word-rnn-tensorflow and char-rnn-tensorflow. The perplexity is at large -log(1/10000) which is around 9 per word(which means the model is not trained at all and selects the words totally randomly, as the model being tuned the complexity will decrease, so 2 is reasonable). I think 100 in your statement may mean the complexity per sentence.
For example, if tf.contrib.seq2seq.sequence_loss is employed to calculate the complexity, the result will be less than 10 if you set both
average_across_timesteps
andaverage_across_batch
to be True as default, but if you set theaverage_across_timesteps
to be False and the average length of the sequence is about 10, it will be about 100.