I use N-Grams model for my NLP probabilistic calculation. What is the experimented grams for calculation. (three grams or four grams or five or ...etc) Because in my project presentation they will ask from me why did you stop this level(this grams). I couldn't find any article regarding N should be which number. What kind of answer can I provide that type of question?
How many grams should be calculate in N-Gram model?
328 Views Asked by Maduri At
1
There are 1 best solutions below
Related Questions in NLP
- command line parameter in word2vec
- Annotator dependencies: UIMA Type Capabilities?
- term frequency over time: how to plot +200 graphs in one plot with Python/pandas/matplotlib?
- Stanford Entity Recognizer (caseless) in Python Nltk
- How to interpret scikit's learn confusion matrix and classification report?
- Detect (predefined) topics in natural text
- Amazon Machine Learning for sentiment analysis
- How to Train an Input File containing lines of text in NLTK Python
- What exactly is the difference between AnalysisEngine and CAS Consumer?
- keywords in NEGATIVE Sentiment using sentiment Analysis(stanfordNLP)
- MaxEnt classifier implementation in java for linguistic features?
- Are word-vector orientations universal?
- Stanford Parser - Factored model and PCFG
- Training a Custom Model using Java Code - Stanford NER
- Topic or Tag suggestion algorithm
Related Questions in PROBABILITY
- How to pick a number based on probability?
- Markov chain: join states in Transition Matrix
- How to get a number of probability distributions "averaged"?
- Prediction of sets
- Fast computation of joint histogram of two images
- Given three boxes X, Y, Z. Let W denote white balls and B denote black balls. The contents of the boxes are : X (2W, 3B) , Y(3W, 1B) , Z(1W, 4B).
- Probability generating function
- How to calculate the entropy of a coin flip
- Number of combinations in a given number array in javascript
- Calculating the probability of incorrect events within independent groups
- Randomly Generating Combinations From Variable Weights
- Python Custom Zipf Number Generator Performing Poorly
- How to programmatically calculate a discrete probabilities
- How to get the probability values from a histogram matlab
- Comparing two methods of choosing random numbers in [-1, +1]
Related Questions in N-GRAM
- Generate Random Sentence From Grammar or Ngrams?
- Saving ngram objects in a dataframe
- how to create the bigram matrix?
- elasticsearch, ngrams should cover entire query? (compound word query)
- How to assign more weight to bigram and trigram?
- Keras word embedding in four gram model
- how to fulltext index both chinese and english characters together by using ngram parser in mysql 5.7?
- Ngram Tokenizer on field, not on query
- Creating ngrams from scikit learn and count vectorizer throws Memory Error
- Create ngrams only for words on the same line (disregarding line breaks) with Scikit-learn CountVectorizer
- How to use Lingua::EN::Ngram for multiple files
- How many grams should be calculate in N-Gram model?
- Neither BigQuery nor the public data sets seems to have all the bigrams
- partial text matching in middle of word
- Detecting foreign words
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
If you need some sort of numbers, one way is to simply measure the performance of your system (e.g. F1-score for an information-retrieval task) using an n-gram model, then n+1-gram, n+2-gram, etc., until you no longer get a statistically-significant improvement in your score. Of course, then you still have to arbitrarily choose a p-value for significance... but, luckily, you could then use 0.05 as a p-value and say with conviction that "most people do it this way".
Another way would be to calculate the perplexity of each language model given your test input with its gold-standard annotation.