How to find time duration between two splunk events which has unique key

626 Views Asked by At

First Event

17:09:05:362 INFO com.a.b.App - Making a GET Request and req-id: [123456]

Second Event

17:09:06:480 INFO com.a.b.App - Output Status Code: 200 req-id:"123456"

I tried to use index="xyz" container="service-name" | transaction "req-id" startswith="Making a GET Request" endswith="Output Status Code" | table duration but it is also not working.

I want to calculate duration of above two events for every request. I went over some solutions in splunk and Stack Overflow, but still can't get the proper result.

2

There are 2 best solutions below

6
warren On

Try doing it with stats instead:

index=ndx sourcetype=srctp 
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| rex field=_raw "(?<sequence>Making a GET Request)"
| rex field=_raw "(?<sequence>Output Status Code)"
| eval sequence=sequence+";"+_time
| stats values(sequence) as sequence by req_id
| mvexpand sequence
| rex field=sequence "(?<sequence>[^;]+);(?<time>\d+)"
| eval time=strftime(time,"%c")

This will extract the "req-id" into a field named req_id, and the start and end of the sequence into a field named sequence

Presuming the sample data you shared is correct, when you stats values(sequence) as sequence, it will put the "Making..." entry first and the "Output..." entry second

Because values() will do this, when you mvexpand and then split the values()'d field part into sequence and time, they'll be in the proper order

If the sample data is incomplete, you may need to tweak the regexes for populating sequence

0
Hakan On

It’s seem you’re going with my previously suggested approach

Now you have 2 possibilities

1. SPL

Below the simplest query, only invoking 1 rex and assuming _time field correctly filled

index=<your_index> source=<your_source>
("*Making a GET Request*" OR "*Output Status Code*")
| rex field=_raw "req\-id\D+(?<req_id>\d+)"
| stats max(_time) as end, min(_time) as start by id
| eval duration = end - start
| table id duration

Note that depending the amount of data to scan, this one can be ressources consuming for your Splunk cluster

2. Log the response time directly in API (more efficient)

It seem you are working on an API. You must have capabilities to get the response time of each call and directly trace it in your log

Then you can exploit it easily in SPL without calculation

It always preferable to persist data at index time vs. operate systematic calculation at search time