asyncprofiler malloc undefined category

923 Views Asked by At

I have set up and using https://github.com/jvm-profiling-tools/async-profiler which is extremely useful but I have a strange thing I cannot explain.

My setup is exactly where multiple presentation showed it can help:

  • AKS kubernetes cluster with a nodepool

  • A pod deployed on one node

  • Within the container I have set up openjdk-11 with the debuginfo

  • The profiling setup is a simple ./profiler start -e malloc PID

  • Since I'm in a virtualised environment profiling works, the only warning I get is

      [WARN] Kernel symbols are unavailable due to restrictions. Try
    sysctl kernel.kptr_restrict=0
    sysctl kernel.perf_event_paranoid=1
    

I think regarding malloc call capturing that is probably not needed.

And the question is that after some profiling time I have captured portions for allocation where the flame graph says: "unknown" for the stacktrace (see attached pic). May it be that I still do not have a full setup in place in the container or I would really need those sysctls in place?

Problem is that it is not trivial with virtualization to put them in place since as I understood that is practically affecting the underlying node we are running on.

Flame graphs for allocations

UPDATE

Now that I restarted the profiling after all main functionality fired at least once for my microservice seems that there are no unknown allocation. Stupid question, but can it happen that I started profiling immediately before all classloading happened (since beans are instantiated lazy) and this is why it was classified like that?

UPDATE 2

Actually my hypothesis is wrong I did one good dump OK dump where unknown classification is minimal

Short after that again the same phenomenon happened that reportedly huge amount of captured malloc event are unkonwn, top shows no dramatic increas. Can this be due to virtualization and I'm actually capturing event from other containers on the same node? In my container there are no more java processes and I'm also specifying directly the PID enter image description here

UPDATE 3

So after Andrei provided me the "dwarf stackwalker" this looks much better. I only have one question which is still not clear but it is only me. We are profiling here malloc event with my: ./profile.sh start --cstack dwarf -e malloc PID So what I see on these flame graphs: Is it only the captured event number which could be freed in the meantime or it currently held native memory by all those mallocs?

My current situation is that I see payara-micro healthcheck and autodeploy holds significant amount of memory which is weird and my first guess for the leak source.

Percentage of autodeploy Percentage of healthcheck

I also made a jeprof output anybody has a guess what "updatewindow/inflate" can point to? Jeprof output

1

There are 1 best solutions below

2
On BEST ANSWER

Container environment is not related here.

It seems like libc (where malloc implementation resides) on your system is compiled without frame pointers. So the standard stack walking mechanism in the kernel is unable to find a parent of malloc frame.

I've recently implemented an alternative stack walking algorithm that relies on DWARF unwinding information. New version has not been yet released, but you may try to build it from sources. Or, for your convenience, I prepared the new build here: async-profiler-2.6-dwarf-linux-x64.tar.gz

Then add --cstack dwarf option, and all malloc stack traces should be in place.