Regex or condition for text in r

1.9k Views Asked by At

I have a text suppose

1) "Project:ABC is located near CBA, being too far from city  "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"

I want to extract text between the word "project" and "," so that my output is"ABC is located near CBA" from text1 and "PQR is located near RQP" from text2, for that I used regex

x="Project:ABC is located near CBA, being too far from city  "
sub(".*Project: *(.*?) *, .*", "\\1", x)
O\P
ABC is located near CBA

But for text2) it doesn't gives proper output so how do I include OR condition so that my both condition is satisfied. Any suggestion will be helpful. Thanks

4

There are 4 best solutions below

1
On BEST ANSWER

Make your regex a bit more flexible: [^:]+:\s*([^,]+),.*

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"

and

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "Project:ABC is located near CBA, being too far from city  ")
[1] "ABC is located near CBA"
2
On

You can use some regex expression with Lookahead and Lookbehind assertion.

Using stringr package on a small example

Vec <- c("Project:ABC is located near CBA, being too far from city", 
         "P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA"  " PQR is located near RQP"

If your input is more complexe, regex should be adapted, as it may not be enough restrictive (currently, it is anything between first : and last ,)

3
On

One option in base R is gsub to match characters (.*) until the : followed by zero or more spaces (\\s*) or (|) a , followed by other characters and replace it with blank ("")

gsub(".*:\\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"

If we need to match Project followed by :

pat <- paste0(gsub("", "\\\\s*", "Project"), ":\\s*|\\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."   

data

Vec <- c("Project:ABC is located near CBA, being too far from city", 
 "P r o j e c t : PQR is located near RQP, highlights some greenary", 
 "Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
 )
0
On

If Project word is not a concern:

> text
[1] "Project:ABC is located near CBA, being too far from city  "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"

should work fine!