Regex to handle all Multiline exception in rubular fluentd

295 Views Asked by At

I designed the regex to match the all multiline exception or warning message field for fluentd parser in rubular format as below

(SLF4J:\s.*|[a-zA-z_]*\..*\.*\s.*\s.*|Caused\sby:\s|\s+at\s.*|\s+\.\.\. (\d)+ more)

It matches unnecessary fields.

I want to match all start of exception or warning multiline. In short: The most recent multiline will be read from the beginning of the file unitl it gets a next line as JSON.JSON always starts with {" togather. when we see lines begings with {" we will stop reading multiline

one regex for both the cases or 2 regex for both the cases is fine

Demo link

regex is available at: https://rubular.com/r/O26Wm6mc7z51re

regex is available at: https://rubular.com/r/v6Q7iwZqmNDAAx

Test Strings is :

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more
{"log_timestamp": "2021-02-18T11:33:23.114+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "QuorumPeer[myid=2](plain=/0.0.0.0:2181)(secure=disabled)", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "PeerState set to LOOKING"}
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.intam.svc.cluster.local"}
java.net.UnknownHostException: zk-2.zk-headless.intam.svc.cluster.local
        at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
        at java.net.InetAddress.getAllByName(InetAddress.java:1193)
        at java.net.InetAddress.getAllByName(InetAddress.java:1127)
        at java.net.InetAddress.getByName(InetAddress.java:1077)
        at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)
        at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:764)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:699)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:618)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
        at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
        at java.lang.Thread.run(Thread.java:748)
{"log_timestamp": "2021-02-18T11:33:23.115+0000", "log_level": "WARN", "process_id": "zookeeper#2", "process_name": "zookeeper", "thread_id": 1, "thread_name": "WorkerSender[myid=2]", "action_name": "org.apache.zookeeper.server.quorum.QuorumPeer", "log_message": "Failed to resolve address: zk-2.zk-headless.sxc.svc.cluster.local"}

Expected Match : For demo1: https://rubular.com/r/O26Wm6mc7z51re

java.lang.InterruptedException: Timeout while waiting for epoch from quorum
        at org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1227)
        at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:482)
        at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1284)
        ... 19 more

For demo2 :https://rubular.com/r/v6Q7iwZqmNDAAx

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/spark/jars/logback-classic-1.2.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/spark/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type 
1

There are 1 best solutions below

15
On BEST ANSWER

You might get both parts using a single pattern with a capture group and a backreference

^(SLF4J:|java\.lang\.InterruptedException:).*(?:\R(?!\1|{).*)*

The pattern matches:

  • ^ Start of string
  • (SLF4J:|java\.lang\.InterruptedException).* Capture in group 1 matching either of the alternatives
  • (?: Non capture group
    • \R(?!\1|{).* Match a newline and assert that the string does not start with either wat is captured in group 1 or {
  • )* Close the group and optionally repeat to match all lines

Regex demo

See the rubular match for the first part and the second part.

Note that in Java to double the backslashes

String regex = "^(SLF4J:|java\\.lang\\.InterruptedException:).*(?:\\R(?!\\1|\\{).*)*";

To not cross SLF4J or different types of Exceptions denoted as dot separated strings at the start of the string:

^(?:SLF4J:|\w+(?:\.\w+)+).*(?:\R(?!(?:SLF4J:|\w+(?:\.\w+)+)|{).*)*

Regex demo