Async profiler overhead on Zing

407 Views Asked by At

Our team is monitoring latency of our application using HdrHistograms. When I attach async profiler to it all percentiles increase dramatically.

OS: Red Hat Enterprise Linux release 8.1 (Ootpa)

JVM: 11.0.8-zing_20.08.2.0-b2-product-linux-X86_64

This is what happens if I attach the profiler with flags -i 1000 -t: enter image description here

This is what happens if I attach the profiler with flags -i 100000 -t: enter image description here

Decreasing sampling frequency obviously decreases overhead, but it still remains large. I have two questions about it:

  1. Are there any other ways to decrease the profiling overhead other than reducing sampling frequency? Maybe there are some magical kernel/JVM flags?
  2. Does this overhead materially distort the profile itself?

Thank you

1

There are 1 best solutions below

1
On BEST ANSWER

The profiling interval is in nanoseconds. You can explicitly specify units, e.g. -i 10ms. In your case, -i 1000 means 1000 nanoseconds, which is not a sane sampling interval: the process will just do continuous sampling instead of the useful work - and, of course, the result profile will not reflect the realistic picture. Start from the default interval (10ms) and decrease it only if absolutely needed.

I have explained the reasonable range in this answer:

As to the profiling interval, 10 ns is roughly 20-50 cpu instructions. It's literally impossible to take samples at such rate. The process will do nothing but spending all time inside the profiler.

The default sampling interval in cpu mode is 10ms. This choice is good enough for profiling in production: for an average application the profiling overhead will be negligible, while the number of samples will be enough to collect a meaningful profile.

1ms interval is usually fine for benchmarks and for profiling real applications for a short period of time. Lower intervals are rarely useful - maybe, only for capturing a profile of a short running piece of code.