Please suggest me a downloadable English corpus that contains informal, playful words such as 'gonna', 'LOL' and 'wanna'
Is there a downloadable corpus (dictionary/ lexicon) for informal, playful words such as 'gonna', 'LOL', 'wanna' in English?
342 Views Asked by AudioBubble At
2
There are 2 best solutions below
0
clemtoy
On
I don't know such lexicon but you can try to do this, alternatively:
- Get the vocabulary V1 of Twitter or other web and chat corpus.
- Get the vocabulary V2 of literary corpus.
The lexicon you want might be V1 \ V2 i.e. all the words of V1 which are not in V2.
Using Python, NLTK provides corpora (see nltk.corpus.webtext). Moreover, as @mbatchkarov said in the comments: Twitter is full of informal language.
Related Questions in NLP
- command line parameter in word2vec
- Annotator dependencies: UIMA Type Capabilities?
- term frequency over time: how to plot +200 graphs in one plot with Python/pandas/matplotlib?
- Stanford Entity Recognizer (caseless) in Python Nltk
- How to interpret scikit's learn confusion matrix and classification report?
- Detect (predefined) topics in natural text
- Amazon Machine Learning for sentiment analysis
- How to Train an Input File containing lines of text in NLTK Python
- What exactly is the difference between AnalysisEngine and CAS Consumer?
- keywords in NEGATIVE Sentiment using sentiment Analysis(stanfordNLP)
- MaxEnt classifier implementation in java for linguistic features?
- Are word-vector orientations universal?
- Stanford Parser - Factored model and PCFG
- Training a Custom Model using Java Code - Stanford NER
- Topic or Tag suggestion algorithm
Related Questions in CORPUS
- NLTK - Get and Simplify List of Tags
- Printing Simplified Corpus to Json File
- Create Dictionary from Penn Treebank Corpus sample from NLTK?
- How can I download only a certain bookshelf from Project Gutenberg?
- How to change the list format into text file and pass it as argument to a function defined in python?
- Memory error when working with large text corpus
- How to select only a subset of corpus terms for TermDocumentMatrix creation in tm
- Wordcloud + corpus error in R
- How to create table to find mean of document using python
- Plot TF.IDF value of bigram over time
- Find 2 words phrase using tm R
- Keeping ID's with corpus and stemming
- c a Corpus using rep or replicate or similar
- Creating a new corpus with NLTK
- java gate api. Creating pipeline with success, how can i get the annotationsets from the docs processed?
Related Questions in LINGUISTICS
- semantics of verb-attached preposition phrases Prolog
- Any (rough) equivalent to iOS NSLinguisticTagger for android?
- How to linguistically parse English Text?
- Natural language summary based on two properties
- translation doesn't work when executing application
- Transliteration between different writing systems
- feed class from list
- How do I install "Ruby Linguistics With Verb Conjugation"?
- English Language Dictionary api
- Looking up a word's sentences in a corpus of 15 million words
- stanford corenlp not working
- Extracting verb from german sentenceces
- Where is the same error coming from, LMER test?
- Selecting the most fluent text from a set of possibilities via grammar checking (Python)
- How to build short sentences with a small letter set restriction?
Related Questions in LEXICON
- How can I apply a lexicon to a list of sentences?
- modx : changing label text language in MIGX
- Find the number of positive and negative words in a text using a Lexicon,
- Wordnet query to return example sentences
- With text analysis inner_join removes more than a thousand words in R
- Is there a downloadable corpus (dictionary/ lexicon) for informal, playful words such as 'gonna', 'LOL', 'wanna' in English?
- Strange lemmatization result in r, textstem package
- Best lexicons for sentence vs document level analysis
- Example of NLTK's Vader Scoring Text
- I am trying to create a lexicon using an input file in C++
- Microsoft Speech Recognition
- Marklogic lexicons: understanding cts:element-values
- Getting accented characters recognized when building a custom stopwords lexicon in R
- String match with R: Finding the best possible match
- "'utf-8' codec can't decode byte 0xf3" while performing the sentiment lexicon
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Use 'NetLingo'. They have a rich content :)