How to cut all Lines/Characters in R after specific Characters

65 Views Asked by At

I am currently taking a course that teaches textual analysis in R. As I am fairly new to R, I could not figure out yet how to cut all Lines after a specific set of characters.

For example, I have the following given:

documentName <- "Hello my name is Johann my had is the largest to be deleted X"

My desired outcome is:

documentName <- "Hello my name is Johann"

So far I have tried the following but it is not getting me anywhere.

gsub("(\Johann).*\\","",documentName)

Any hint would be much appreciated.

2

There are 2 best solutions below

1
On BEST ANSWER

Here is one way, capturing all content appearing before Johann:

x <- "Hello my name is Johann my had is the largest to be deleted"
out <- sub("^(.*\\bJohann)\\b.*$", "\\1", x)
out

[1] "Hello my name is Johann"

Another approach, stripping off all content appearing after Johann:

sub("(?<=\\bJohann)\\s+.*$", "", x, perl=TRUE)
0
On

You could use str_remove() from package dplyr

str_remove(documentName, "(?<=Johann).*")
[1] "Hello my name is Johann"

or adjust your gsub() regex to

gsub("(?<=Johann).*", "", documentName, perl=TRUE)
[1] "Hello my name is Johann"