Regex that extracts values between specific characters at beginning and end of string

57 Views Asked by At

I have a comment field in a dataset that I need to extract some numbers from. The string looks like this. The data I want would to extract that series120_count =1 and crossing success =2

x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"

I've tried a few things but can't quite get it. For example, my attempt to isolate series120_count is below, but it's not quite there yet.

str_extract(x, "(?<=series120_count)(.+)(?=\\; )")

Ideally, I would like something that matches "series120_count[" at the start, and ends when the bracket closes "]". I'd like to be able to change this as well to get the crossing success by just subbing out the first match with "crossing_success["

1

There are 1 best solutions below

2
The fourth bird On BEST ANSWER

If you want to use the lookbehind assertion for both strings and extract the digits, you can use:

\b(?<=crossing_success\[|series120_count\[)\d+(?=])

The pattern matches:

  • \b A word boundary to prevent a partial word match
  • (?<=crossing_success\[|series120_count\[) Positive lookbehind, assert one of the alternatives to the left
  • \d+ Match 1+ digits
  • (?=]) Positive lookahead, assert ] to the right

Regex demo | R demo

library(stringr)

x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"
pattern <- "\\b(?<=crossing_success\\[|series120_count\\[)\\d+(?=])"
matches <- str_extract_all(x, pattern)
print(matches)

Output

[[1]]
[1] "1" "2"

Alternatively you can use a capture group

\b(?:crossing_success|series120_count)\[(\d+)]

Regex demo