RE2-compatible regex to get only one substring from string with a set of substrings inside of it

159 Views Asked by At

I have a string in format @@@substring1@@@substring2, that comes from a black-box.

substring1 could be empty or not, substring2 is always non-empty. @@@ is a delimiter and I could change it via black-box settings. substring1 and substring2 never contain @@@ inside of them.

I need to get the first substring from this string, e.g. from @@@substring1@@@substring2 I need to get substring1, from @@@@@@substring2 I need to get substring2.

My black-box allows to process the string with RE2 regex. I can't use external stuff like cut, sed, awk etc. Is it possible to do that with regex only?

My thoughts are as follows:

regex @@@([^@]+)

  • will produce 1 match with 1 group @@@@@@substring2 - that is what I need
  • will produce 2 matches with 1 group each for @@@substring1@@@substring2 - that is not what I need, I need only 1 match

Lookahead / lookbehind assertions (?=re), (?!re), (?<=re), (?<!re) and \K syntax are not supported in RE2 regex.

3

There are 3 best solutions below

1
AntonioK On BEST ANSWER

Working RE2-flavored solution based on @InSync answer:

(?:^@@@|^)@@@([^@]+).*$

  • for @@@substring1@@@substring2 it matches the whole string with just one capturing group ${1} containing substring1
  • for @@@@@@substring2 it matches the whole string with just one capturing group ${1} containing substring2
2
InSync On

Match the trailing delimiters as well so that substring2 would not be able to match if substring1 matched:

@@@           # Match triple '@'
([^@]+)       # followed by a non-empty sequence of non-'@' character, which we capture,
(?:@@@|$)     # then another triple '@' or the end of string.

Try it on regex101.com.

This, of course, relies on a capturing group. If you cannot use capturing groups, then there is no answer.

Also, just for fun, here's a PCRE solution:

^                      # Match at the start of the string
(?(?=@@@(.+?)@@@.+)    #                     if it exists
  @@@\1                # the first substring
|                      # or
  @{6}\K.+             # the second substring (preceded by 6 '@' which we forfeit).
)                      #

Try it on regex101.com.

...and an extension of the first regex above which accepts substrings containing no more than three consecutive @ (see my explanation for the middle expression here):

^(?:@@@)?@@@
((?:@(?:@(?:[^@]|$)|[^@]|$)|[^\n@])+)
(?:@@@|$)

Try it on regex101.com.

0
Reilas On

"... I need to get the first substring from this string, e.g. from @@@substring1@@@substring2 I need to get substring1, from from @@@@@@substring2 I need to get substring2. ...

... Is it possible to do that with regex only?"

Yes, you can use the following pattern.

@{3,6}(.+?)(?:@|$)

Yours is correct also, you just need to define when to stop the capture.

@@@([^@]+?)(?:@|$)