I have a Corpus
(tm
package), containing a collection of 1.300 different text documents [Content: documents: 1.300].
My goal is now to search the frequency of a specific wordlist in each of those documents. E.g. if my wordlist contains the words "january, february, march,...."
. I want to analyze how often the documents refer to these words.
Example:
Text 1: I like going on holiday in january and not in february.
Text 2: I went on a holiday in march.
Text 3: I like going on vacation.
The result should look like this:
Text 1: 2
Text 2: 1
Text 3: 0
I tried using the following codes:
library(quanteda)
toks <- tokens(x)
toks <- tokens_wordstem(toks)
dtm <- dfm(toks)
dict1 <- dictionary(list(c("january", "february", "march")))
dict_dtm2 <- dfm_lookup(dtm, dict1, nomatch="_unmatched")
tail(dict_dtm2)
This code was proposed in a different chat, however it does not work on mine and an error, saying it is only applicaple on text or corpus elements occurs.
How can I search for my wordlist using my existing Corpus
in tm
package in R?
To make your Quanteda code work, you first have to convert your tm VCorpus object
x
+ fix few other minor issues:dictionary()
expects a named listCreated on 2023-09-02 with reprex v2.0.2