I would like to know if it is possible to delete documents from a corpus if the text is in fact "empty". I am building a corpus of texts in order to subsequently run some textmodels using quanteda package in R. The texts are in a column of a csv file and are imported as follows:
> mycorpus<-corpus(readtext("tablewithdocuments.csv",text_field="textcolumn"))
> mycorpus
Corpus consisting of 25 documents and 14 docvars.
I know how to erase empty texts from the dfm of the corpus, but I want to have a new corpus which is a subset of the original one excluding documents with a missing cell in the csv column "textcolumn".
In practice, from something as the following corpus:
library("quanteda")
text <- c(
doc1 = "",
doc2 = "pinapples and pizzas taste good",
doc3 = "but please do not mix them together"
)
mycorpus <- corpus(text)
mycorpus
## Corpus consisting of 3 documents and 0 docvars.
summary(mycorpus)
## Corpus consisting of 3 documents:
## Text Types Tokens Sentences
## doc1 0 0 0
## doc2 4 4 1
## doc3 5 5 1
I would like to obtain a new corpus with only doc2 and doc3 in it.
Thank you in advance for you help.
Best wishes,
Michele