How do I tag a document if a word is not present in it?

103 Views Asked by At

I am performing text mining on text data having 2500 documents and looking for a specific word in the document.

I want to tag the document if a word say 'laceration' is not present in it and get the output as list of documents not having that word. And would also like to save the output in a text file.

I am using the following code

library(qdapRegex)

grab2 <- rm_(pattern=S("@around_", 1, "laceration", 1), extract=TRUE)

grab2(l$Text)

Sample output I am getting

[[2164]]
[1] NA

[[2165]]
[1] NA

[[2166]]
[1] "laceration"

[[2167]]
[1] NA

[[2168]]
[1] NA

I want the code which will return only the documents without the word 'laceration'. And want to write the output in a file.

1

There are 1 best solutions below

0
On BEST ANSWER

While you could do this in R, it would be much more efficient to do this at the command line (using a Linux-like OS or CygWin if on Windows):

grep -v "\blaceration\b" *.txt >ListOfNoLac

In R, you could do this:

fileList <- list.files(".", "\\.txt$")
hasLac <- sapply(fileList, function(x) length(grep("\\blaceration\\b", readLines(x))) > 0)
fileList[!hasLac]