Why can't I capture more than one digit in substring?

464 Views Asked by At

I am creating regex to extract various fields from logs files. I have created one regex using some tools and its almost complete. the only thing is for one field its extracting only one digit instead of full number. for better understanding I have saved it to below link.

My Regex Demo

Pattern:

/(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew))^(?:).*(?P<ParNew_before_1>\d)K\->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)/

String:

146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080K), 0.0320299 secs] [Times: user=0.32 sys=0.01, real=0.03 secs]

Current Output:

Full match      `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`
Group `ParNew_before_1`     `3`
Group `ParNew_after_1`      `88155`
Group `young_heap_size`     `419456`
Group `par_new_duration`    `0.0313803`
Group `ParNew_before_2`     `9893391`
Group `ParNew_after_2`      `9602913`
Group `total_heap_size`     `12478080`

Expected Output:

Full match      `146372.273: [GC146372.274: [ParNew: 378633K->88155K(419456K), 0.0313803 secs] 9893391K->9602913K(12478080`

Group ParNew_before_1 378633

Group `ParNew_after_1`      `88155`
Group `young_heap_size`     `419456`
Group `par_new_duration`    `0.0313803`
Group `ParNew_before_2`     `9893391`
Group `ParNew_after_2`      `9602913`
Group `total_heap_size`     `12478080`

In above example: Group ParNew_before_1 extracting only one digit.

2

There are 2 best solutions below

0
On BEST ANSWER

There are three things I'd like to note here:

  • The lookahead should be placed after ^ (it will make more sense to check its pattern at the start of the string only)
  • The \d won't match more than 1 digit, add + after it to match 1 or more
  • .* is too greedy, use lazy .*?.

Use

^(?=[^P]*(?:ParNew|P.*ParNew|PSYoungGen|DefNew)).*?(?P<ParNew_before_1>\d+)K->(?P<ParNew_after_1>\d+)K\((?P<young_heap_size>\d+)K\), (?P<par_new_duration>\d+\.\d+) secs\] (?P<ParNew_before_2>\d+)K\->(?P<ParNew_after_2>\d+)K\((?P<total_heap_size>\d+)
 ^^^                                           ^  ^                      ^

See this regex demo

Also, you do not need to escape - that are not inside character classes.

0
On

As an aside when you have a long pattern, do not hesitate to use the x modifier (for the "free-spacing" mode) and eventually the quoting feature \Q..\E (to figure spaces and special character without escaping them) to make it more readable:

/
^
(?=
    [^PD\n]* (?>[PD][^\nPD]*)*? \b
    (?: ParNew | PSYoungGen | DefNew )
)
[^\n\d]* (?>\d+[^\n\d]+)*? \b
(?<ParNew_before_1>  \d+      ) K->
(?<ParNew_after_1>   \d+      ) \QK(\E
(?<young_heap_size>  \d+      ) \QK), \E
(?<par_new_duration> \d+\.\d+ ) \Q secs] \E
(?<ParNew_before_2>  \d+      ) K->
(?<ParNew_after_2>   \d+      ) \QK(\E
(?<total_heap_size>  \d+      )
/x

demo