Can readLines be used ngram processing for R?

49 Views Asked by At

I am trying to figure the frequency of phrases made up from one to eight words. I have been reading about text mining for phrases here and elsewhere and have found out that using ngram tokenization will be the best way to go.

However, when I copy and paste text from a .txt file it either comes up with an unidentified symbol error for multiple lines. Is it possible to use the readLines function in place of X in an ngram_Tokenizer code? E.g.:

Bigram_Tokenizer<-function(X(readLines(file.choose())(Ngram_tokenizer(X(readLines(file.choose(),WekaControl(min=#,max=#) in the example given by tomkauffman at GitHubGist (1)?

When I copy the readLines printout it comes up with 'unexpected [ in [' Do I need to include the same text in both "X" entries?

Thank you, Ben M.

0

There are 0 best solutions below