Removing paragraphs in txt file with R

485 Views Asked by At

Using the readLines() function, I have imported a txt file, which stored sentences within multiple paragraphs like this:

sentence1. sentence2. sentence3.

sentence4. sentence5.

sentence6. sentence7.
 

For further analysis I would like to apply the sentiment_by() function on my imported txt file. When I do so, I receive sentiment values for each paragraph rather than the whole txt file itself. Therefore I want to remove the paragraphs within the txt file so that I receive only one sentiment coefficient. To do so I would need to transform my txt file so that the text looks like this:

sentence1. sentence2. sentence3. sentence4. sentence5. sentence6. sentence7.

If I were to run the sentiment_by() function on this piece of text it would yield one coefficient for the whole text. Is there a way I can transform the text by removing the paragraphs in R before I carry on with the analysis?

1

There are 1 best solutions below

0
On

If each paragraph you grab is a character vector you can strip tabs and newlines away (and other whitespace characters if needed).

trimmed_text = trimws(text_var, which = "both", whitespace = "[\t\r\n]")

There are other things you can tweak as shown here.