I want to remove the special character — from my simple corpus. Unfortunately, it doesn't work in my case. I tried different variations of gsub. Also, I tried to copy the dash — from my R object. I use XML data and changes it in a simple corpus. For this I used tm_map.
If I use
text <- c("Today is the weather nice — I want to go to the beach —")
text_new <- gsub("—", "", text)
The output is
Today is the weather nice — I want to go to the beach —
whereas I'd like my output to be
Today is the weather nice I want to got to the beach
If I define the text as a vector than it works. But as a corpus R doesn't recognise the symbol —. How can I detect the long dash?
It could well be that you are searching for a
-with yourgsub()function, while the text from the PDF contains a long dash or any other type of dash that only looks similar. Have you tried opening the R object with the text and copy pasting the-you want to delete from there to yourgsub()function?