For a research project I am working on, I have read pdf documents into R, created a corpus and a TermDocumentMatrix. I want to check the frequency of specific words in each document in my corpus. The code below gives me the kind of matrix I want, with the frequency of words by document, but obviously it only does high frequency terms not specific terms.
ft <- findFreqTerms(tdm, lowfreq = 100, highfreq = Inf)
as.matrix(opinions.tdm[ft,])
I found the code below in another comment, which allows for searching the frequency of specific terms, however, it sums across the documents. How do I adapt this so that I am searching for the specific terms but within each document rather than across?
library(tm)
data("crude")
crude <- as.VCorpus(crude)
crude <- tm_map(crude, stripWhitespace)
crude <- tm_map(crude, removePunctuation)
crude <- tm_map(crude, content_transformer(tolower))
crude <- tm_map(crude, removeWords, stopwords("english"))
tdm <- TermDocumentMatrix(crude)
# turn tdm into dense matrix and create frequency vector.
freq <- rowSums(as.matrix(tdm))
freq["crude"]
crude
21
freq["oil"]
oil
85
Skip the
rowSums
part and just refer to the matrix