Match regex named group up until optional word

49 Views Asked by At

I have these strings that I want to grab some information from them using a Regex.

Vuln : Upgrade pypi://onnx from 1.12.0 to 1.13.0 Final

Vuln : Upgrade gav://com.google.guava:guava from 22.0 to 32.0.0-android for github.com/blah/blah (Need to capture gav://com.google.guava:guava 22.0 32.0.0-android)

Vuln : Upgrade gav://org.apache.avro:avro from 1.11.1 for Android to 1.11.3 Final for github.com/blah/blah (Need to capture gav://org.apache.avro:avro 1.11.1 for Android 1.11.3 Final)

I specifically just need to grab the strings pypi://onnx 1.12.0 1.13.0 Final for example which are the library and version names since I'm using Splunk and the capture groups can become variables, all of these strings are dynamic, they will not always be what it is above.

I've been having difficulty crafting a regex that stops the moment the for is encountered, since it can be optional.

This is the one I've tried

Vuln : Upgrade\s*(?<vulnNameFromTo>.+)\sfrom\s*(?<vulnCurrentVersionFromTo>.+)\sto\s(?<vulnFixVersionFromTo>.+)(?:\sfor\s)?

But the last named capture group, grabs everything and the for and that's not what I want.

2

There are 2 best solutions below

2
Wiktor Stribiżew On BEST ANSWER

You can use

Vuln : Upgrade\s*(?<vulnNameFromTo>.*?)\s+from\s+(?<vulnCurrentVersionFromTo>.*?)\s+to\s+(?<vulnFixVersionFromTo>.+?)(?:\s+for\b.*)?$

See the regex demo.

Details:

  • Vuln : Upgrade - a literal text
  • \s* - zero or more whitespaces
  • (?<vulnNameFromTo>.*?) - Group "vulnNameFromTo": one or more chars other than line break chars as few as possible
  • \s+from\s+ - from string enclosed with one or more whitespaces
  • (?<vulnCurrentVersionFromTo>.*?) - Group "vulnCurrentVersionFromTo": one or more chars other than line break chars as few as possible
  • \s+to\s+ - to word enclosed with one or more whitespaces
  • (?<vulnFixVersionFromTo>.+?) - Group "vulnFixVersionFromTo": any one or more chars other than line break chars as few as possible
  • (?:\s+for\b.*)? - an optional sequence of one or more whitespaces, for, a word boundary and then any zero or more chars other than line break chars as many as possible
  • $ - end of string.
3
DuesserBaest On

Try:

^Vuln\h+:\h+Upgrade\h+(?P<pypi>\S+)\h+from\h+(?P<ini_v>[0-9.]+)\h+to\h+(?P<final_v>[0-9.]+\h+Final)

See: regex101

...or try this run-anywhere example SPL:

| makeresults
| eval a=split("Vuln : Upgrade pypi://onnx from 1.12.0 to 1.13.0 Final#Vuln : Upgrade pypi://onnx from 1.12.0 to 1.13.0 Final for github.com/blah/blah","#")
| mvexpand a
| table a
```actual extraction```
| rex field=a "^Vuln\h+:\h+Upgrade\h+(?P<pypi>\S+)\h+from\h+(?P<ini_v>[0-9.]+)\h+to\h+(?P<final_v>[0-9.]+\h+Final)"

Explanation

  • ^Vuln\h+:\h+Upgrade\h+: Static start to string: Vuln : Upgrade
  • (?P<pypi>\S+): grabs pipy part
  • \h+from\h+: static from `
  • (?P<ini_v>[0-9.]+): grabs the initial version
  • \h+to\h+: static to
  • (?P<final_v>[0-9.]+\h+Final): grabs final version