I'm profiling an application of mine - it's a macro system built on top of Java. I'm using hprof to profile some basic examples, and here are the top 20 functions where time is spent:
rank self accum count trace method
1 14.73% 14.73% 453 303755 java.lang.Object.<init>
2 9.75% 24.48% 300 303754 java.lang.Object.<init>
3 7.18% 31.66% 221 303641 java.lang.Object.<init>
4 6.83% 38.49% 210 303621 java.util.ArrayList.<init>
5 6.76% 45.25% 208 303620 java.util.ArrayList.<init>
6 5.95% 51.20% 183 303833 java.lang.Character.toChars
7 4.55% 55.75% 140 303809 java.lang.Object.<init>
8 4.42% 60.18% 136 303791 java.lang.Object.<init>
9 3.77% 63.95% 116 303756 java.lang.Object.<init>
10 3.64% 67.59% 112 300509 java.lang.Object.<init>
11 2.67% 70.25% 82 303789 java.lang.Object.<init>
12 2.54% 72.79% 78 303790 java.lang.Object.<init>
13 1.69% 74.48% 52 303688 java.lang.Object.<init>
14 1.66% 76.14% 51 303644 java.lang.Object.<init>
15 1.46% 77.60% 45 305935 java.lang.Object.<init>
16 1.40% 79.00% 43 303758 java.lang.Object.<init>
17 1.20% 80.20% 37 305324 java.lang.Object.<init>
18 0.98% 81.18% 30 302559 java.lang.Object.<init>
19 0.62% 81.79% 19 300006 java.util.Arrays.copyOf
20 0.52% 82.31% 16 305214 java.lang.Object.<init>
As you can see, most of the time is spent is Object.<init>
. This is somewhat obscure to me.
My intuition is that the time is taken by memory allocation. Working in C has left me with a strong sense that dynamic memory allocation (i.e. malloc()
) is inefficient. However, in Java the gospel seems to be that the JVM deals efficiently with short-lived objects; and that there is consequently nothing to be gained from patterns such as object pools.
I should add that the most performance-hungry part of the application is the parser, which does indeed create a whole lot of short-lived objects as part of its operation.
What do you think the time spent in Object.<init>
is due to? Is it indeed linked to memory allocation? Could I benefit from using an object pool or some other trick to reduce memory allocation?
EDIT:
In reaction to Mike Dunlavey's answer, here is the interpretation of hprof's output by JPerfAnal, giving inclusive times.
Method Times by Caller (times inclusive): 3076 ticks
1: java.lang.Object.<init>: 71,26% (2192 inclusive / 2191 exclusive)
2: com.sun.demo.jvmti.hprof.Tracker.ObjectInit: 0,03% (1 inclusive / 0 exclusive)
3: java.lang.Thread.currentThread: 0,03% (1 inclusive / 1 exclusive)
1: parser.ParseData.<init>: 34,33% (1056 inclusive / 0 exclusive)
2: parser.ParseErrors.<init>: 13,98% (430 inclusive / 1 exclusive)
3: java.lang.Object.<init>: 7,18% (221 inclusive / 221 exclusive)
3: java.util.ArrayList.<init>: 6,76% (208 inclusive / 208 exclusive)
2: java.lang.Object.<init>: 13,52% (416 inclusive / 416 exclusive)
2: java.util.ArrayList.<init>: 6,83% (210 inclusive / 0 exclusive)
3: java.util.ArrayList.<init>: 6,83% (210 inclusive / 210 exclusive)
1: parser.Matcher.parse: 34,33% (1056 inclusive / 0 exclusive)
2: parser.ParseData.<init>: 34,33% (1056 inclusive / 0 exclusive)
3: parser.ParseErrors.<init>: 13,98% (430 inclusive / 1 exclusive)
4: java.lang.Object.<init>: 7,18% (221 inclusive / 221 exclusive)
4: java.util.ArrayList.<init>: 6,76% (208 inclusive / 208 exclusive)
3: java.lang.Object.<init>: 13,52% (416 inclusive / 416 exclusive)
3: java.util.ArrayList.<init>: 6,83% (210 inclusive / 0 exclusive)
4: java.util.ArrayList.<init>: 6,83% (210 inclusive / 210 exclusive)
1: java.util.ArrayList.<init>: 28,38% (873 inclusive / 419 exclusive)
2: java.util.AbstractList.<init>: 14,76% (454 inclusive / 0 exclusive)
3: java.util.AbstractCollection.<init>: 14,76% (454 inclusive / 0 exclusive)
4: java.lang.Object.<init>: 14,76% (454 inclusive / 454 exclusive)
(JPerfAnal also generates an inverted tree where the childs are the callers of the parents. I don't reproduce it for brevity, but suffice to say that about 40% of Object.<init>
calls are made from the initialization of ArrayList
, ParseData
and ParseErrors
.)
Now, that doesn't really change how I think about the issue, or my questions. I could change the algorithm so that it instantiates less objects; but for the time being, I'm looking for an orthogonal solution. So: could object pools help me?
1) "self" time is almost always useless because the real problem is in your code that calls those low-level methods - get inclusive time (self + callees).
2) The "accum" column is even more useless because all it does is add up the "self" times.
The other columns are also telling you nothing helpful.
hprof
takes stack traces. What you need is for those to be as deep as possible.You don't need a large number of them (contrary to popular misinformation about statistics) - ten are likely more than enough.
However, you do need to look at them and understand them.
Here's some general info about profiling.
EDIT in response to your edit showing JPerfAnal output:
a) It looks like the stack depth is only 4. You need to see stack depth as deep as possible. I don't care for the way they present the output. I mainly just look at the stack samples themselves. Time is not the really important thing. The important thing is what is going on and why.
b) Stack samples actually contain line-level information. So they will point to the precise lines in your routines like
parser.ParseData.<init>
,parser.Matcher.parse
, andparrser.ParseErrors.<init>
where the problematic calls are taking place. That will tell you exactly which allocations are costing all the time, and then you can decide if there's a better way to do them.I get the impression that the parser runs off data structure tables. I know you don't want to redesign it, but if you redesign it you will get far better performance. Basically, if you can express it as a recursive-descent parser, and if the language does not change often, you write the parser directly in code - no tables - and it will be 10-100 times faster.