How many grams should be calculate in N-Gram model?

302 Views Asked by At

I use N-Grams model for my NLP probabilistic calculation. What is the experimented grams for calculation. (three grams or four grams or five or ...etc) Because in my project presentation they will ask from me why did you stop this level(this grams). I couldn't find any article regarding N should be which number. What kind of answer can I provide that type of question?

1

There are 1 best solutions below

2
On

If you need some sort of numbers, one way is to simply measure the performance of your system (e.g. F1-score for an information-retrieval task) using an n-gram model, then n+1-gram, n+2-gram, etc., until you no longer get a statistically-significant improvement in your score. Of course, then you still have to arbitrarily choose a p-value for significance... but, luckily, you could then use 0.05 as a p-value and say with conviction that "most people do it this way".

Another way would be to calculate the perplexity of each language model given your test input with its gold-standard annotation.