Hi I want to break a log file using Streamsets. the log is like,
Deny tcp src dmz:77.77.77.7/61112 dst dmz:55.55.56.57/139 by access-group "outside_access_in" [0x8b3ecfdc, 0x0]
There may be more than 2 IP's also in the log and I'm trying to capture the only 1st and 2nd IP address from my log. It's written that Streamsets use Java REGEX patterns.
what I did till now in Expression Evaluator processor in Streamsets is,
${str:regExCapture(record:value('/Message'),'(\\d+[.]\\d+[.]\\d+[.]\\d+/?\\d*)', 1)}
Any idea how to capture the 2nd IP?
You may use
See the regex demo.
Details
^
- start of string(?:.*?(\\d+(?:[.]\\d+){3}(?:/\\d+)?)){2}
- two consecutive occurrences of.*?
- any 0+ chars other than line break chars, as few as possible(\\d+(?:[.]\\d+){3}(?:/\\d+)?)
- Capturing group 1 (its value will be returned bystr:regExCapture
since the last argument is set to1
):\\d+
- 1+ digits(?:[.]\\d+){3}
- three occurrences of.
and 1+ digits(?:/\\d+)?
- an optional sequence of/
and 1+ digits.Since the contents in a group is re-written when several occurrences are captured within one match operation, Group 1 will only contain the second IP value.
Note that a better (safer, more precise) IP pattern would be
(?:25[0-5]|2[0-4]\\d|[0-1]?\\d?\\d)(?:\\.(25[0-5]|2[0-4]\\d|[0-1]?\\d?\\d)){3}
, see Extract ip addresses from Strings using regex. So, you may also write the command asSee another regex demo.