Stack trace sampling is very powerful technique for performance troubleshooting. Advantages of stack trace sampling are
- it doesn't require upfront configuration
- cost added by sampling is small and controllable
- it is easy to compare analysis result from different experiments
Unfortunately, tools offered for stack trace analysis by widespread Java profilers are very limited.
Solving performance problem in complex applications (a lot of business logic etc) is one of my regular challenges. Let's assume I have another misbehaving application at my hands. First step would be to localize bottleneck to specific part of stack.
Meet call tree
Call tree is built by digesting large number of stack traces. Each node in tree has a frequency - number of traces passing though this node.
Usually tools allow you to navigate through call tree reconstructed from stack trace population.
There is also flame graphs visualization (shown at right top of page) which is fancier but is just the same tree.
Looking at these visualization what can I see? - Not too much.
Why? Business logic somewhere in the middle of call tree produces too many branches. Tree beneath business logic is blurred beyond point of usability.
Dissecting call tree
Application is build using frameworks. For the sake of this article, I'm using example based on JBoss, JSF, Seam, Hibernate.
Now, if 13% of traces in our dump contain
JDBC we can conclude what 13% of time
is spent in JDBC / database calls.
13% is reasonable number, so database is not to blame here.
Let's go down the stack, Hibernate is next layer. Now we need to calculate all traces containing Hibernate classes excluding ones containing JDBC. This way we can attribute traces to particular framework and quickly get a picture where time is spent at runtime.
I didn't find any tool that can do it kind of analysis for me, so I build one for myself few years ago. SJK is my universal Java troubleshooting toolkit.
Below is command doing analysis explained above.
sjk ssa -f tracedump.std --categorize -tf **.CoyoteAdapter.service -nc
JDBC=**.jdbc
Hibernate=org.hibernate
"Facelets compile=com.sun.faces.facelets.compiler.Compiler.compile"
"Seam bijection=org.jboss.seam.**.aroundInvoke/!**.proceed"
JSF.execute=com.sun.faces.lifecycle.LifecycleImpl.execute
JSF.render=com.sun.faces.lifecycle.LifecycleImpl.render
Other=**
Below is output of this command.
Total samples 2732050 100.00%
JDBC 405439 14.84%
Hibernate 802932 29.39%
Facelets compile 395784 14.49%
Seam bijection 385491 14.11%
JSF.execute 290355 10.63%
JSF.render 297868 10.90%
Other 154181 5.64%
Well, we clearly see a large amount of time spent
in Hibernate.
This is very wrong, so it is first candidate
for investigation.
We also see that a lot of CPU is spent
on JSF compilation, though pages should
be compiled just once and cached
(it turned out to be configuration issue).
Actual application logic falls in JFS life cycle calls (execute()
, render()
).
I would be possible to introduce additional
category to isolate pure application logic
execution time, but looking at numbers,
I would say it is not necessary until
other problems are solved.
Hibernate is our primary suspect, how to look inside? Let's look at method histogram for traces attributed to Hibernate trimming away all frames up to first Hibernate method call.
Below is command to do this.
sjk ssa -f --histo -tf **!**.jdbc -tt ogr.hibernate
Here is top of histogram produced by command
Trc (%) Frm N Term (%) Frame
699506 87% 699506 0 0% org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1204)
689370 85% 689370 10 0% org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
676524 84% 676524 0 0% org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58)
675136 84% 675136 0 0% org.hibernate.internal.SessionImpl.list(SessionImpl.java:1261)
573836 71% 573836 4 0% org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)
550968 68% 550968 1 0% org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99)
533892 66% 533892 132 0% org.hibernate.event.internal.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:227)
381514 47% 381514 882 0% org.hibernate.event.internal.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:76)
271018 33% 271018 0 0% org.hibernate.event.internal.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:161)
Here is our suspect. We spent 87% of Hibernate
time in autoFlushIfRequired()
call
(and JDBC time is already excluded).
Using few commands we have narrowed down one performance bottleneck. Fixing it is another topic though.
In a case, I'm using as example, CPU usage of application were reduced by 10 times. Few problems found and addressed during that case were
- optimization of Hibernate usage
- facelets compilation caching were properly configure
- work around performance bug in Seam framework was implemented
- JSF layouts were optimized to reduce number of Seam injections / outjections
Limitations of this approach
During statistical analysis of stack traces you deal with wallclock time, you cannot guest real CPU time using this method. If CPU on host is saturated, your number will be skewed by the threads idle time due to CPU starvation.
Normally you can get stack trace only at JVM safepoints. So if some methods are inlined by JIT compiler, they may never appear at trace even if they are really busy. In other words, tip of stack trace may be skewed by JIT effects. Practically, it was never an obstacle for me, but you should be keep in mind possibility of such effect.
What about flame graphs?
Well, despite being not so useful, they look good on presentations. Support for flame graphs was added to SJK recently.
Update
After some time, I've found my self using flame graphs very actively. Yes, for certain situation this type of visualization doesn't make sense, but as a first bird eye look at the problem flame graphs are indispensable.
No comments:
Post a Comment