How to filter out all short strings (2 and lower chars) in a corpus?

36 Views Asked by At

Given a simple string:

t <- "hello world ww ff a wr gj dkjffdkn kuku"

VCorpus(VectorSource(t))

I want to filter out all the 2 and lower length substrings. How can I do this using qdap or tm packages? I know I can use regex for this but is there a function that does it?

1

There are 1 best solutions below

0
tmfmnk On BEST ANSWER

With the package qdapRegex, you can do:

rm_nchar_words(t, "1,2")

[1] "hello world dkjffdkn kuku"