Alexey Ragozin: sjk

Showing posts with label sjk. Show all posts

Thursday, September 21, 2023

Curse of the JMX

JMX stands for Java Management Extension, it was introduced as part Java Enterprise Edition (JEE) and later has become an integral part of JVM.

JVM exposes a handful of useful information on diagnostic tooling through the JMX interface.

Many popular tools such as Visual VM and Mission Control are heavily based on JXM. Event Java Flight Record is exposed for remote connection via JMX.

Middleware and libraries are also exploiting JMX to expose custom MBeans with helpful information.

So if you are in the business of JVM monitoring or diagnostic tooling you cannot avoid dealing with JMX.

JMX is a remote access protocol, it is using TCP sockets and requires some upfront configuration for JVM to start listening for network connections (though tools such as VisualVM can enable JMX at runtime, provided they have access to the JVM process).

You can find details about JMX agent configuration in official documentation, but below is minimal configuration (add snippet below to JVM start command).

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=55555

JVM will start listening on port 5555. You would be able to use this port in Visual VM and other tools.

Configuration above is minimal, access control and TLS encryption are disabled. You should consult documentation mentioned above to add security (which would be typically required in a real environment).

JMX is a capable protocol, but it has some idiosyncrasies due to its JEE lineage. In particular, it has specific requirements for network topology.

JVM is based on Java RMI protocol. Access to JMX agent has a two step handshake.

On the first step, the client makes a request to the RMI registry and receives a serialized remote interface stub. JXM agent has a built-in single object registry which is exposed on port 5555 in our example.

On the second step, client to accessing remote interface via network address embedded in this stub object received on the first step.

In a trivial network, this is not an issue, but if there are any form of NAT or proxy between JMX client and JVM things are likely to break.

So we have two issues here:

1. Stub could be exposed on different port number, which is not whitelisted

2. Stub may provide some kind of internal IP, not routable for client host

First issue is easily solvable with com.sun.management.jmxremote.rmi.port property, which can be set to the same value as registry port (5555 in our example).

Second issue is much more tricky as JVM may be totally unaware of IP visible from outside, even worse such IP could be dynamic so it cannot be configured via JVM command line.

In this article, I would describe a few recipes for dealing with JMX in the modern container/cloud world. None of them is ideal, but I hope at least one could be useful for you.

Configuring JMX for known external IP address

In case if you know a routable IP address, the solution is to configure the JVM to provide specific IP inside of the remote interface stub. Example for this situation would be running a JVM in a local Docker container.

JVM parameter -Djava.rmi.server.hostname=<MyHost> can be used to override IP in remote stubs provided by JMX agent. This parameter affects all RMI communication, but RMI is rarely used nowadays besides the JXM protocol.

Resulting communication scheme is outlined on the diagram below.

JVM options

-Djava.rmi.server.hostname=1.2.3.4
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=55555
-Dcom.sun.management.jmxremote.rmi.port=5555

Communication diagram

Configuring JMX for tunneling

In some situations, the IP address of the JVM host may not be even reachable from the JMX client host. Here is a couple of typical examples

● You are using SSH to access the internal network through a bastion host.

● JVM is in Kubernetes POD.

In both situations you can use port forwarding to establish a network connectivity between JMX client and JVM.

Again, you would need to override IP in remote service stub, but now you will have to set it to 127.0.0.1.

Communication diagram is shown below.

JVM options

-Djava.rmi.server.hostname=127.0.0.1
-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=55555
-Dcom.sun.management.jmxremote.rmi.port=5555

Communication diagram

In the case of SSH, you can use port forwarding.
In Kubernetes, there is also a handy kubectl port-forward command which allows to communicate with POD directly.

You can even chain port-forwarding multiple times.

Though this approach has its own limitations.

● JMX will not be available for remote hosts without port forwarding any more, so this configuration may interfere with monitoring agents running in your cluster and collecting JMX metrics.

● You cannot connect to multiple JVMs using the same JMX port (e.g. PODs from single deployment), as your port on client host is bound to a particular remote destination. Remapping ports will break the JMX.

Using HTTP JMX connector

Root of the problem is the RMI protocol which is archaic and doesn’t evolve to support modem network topologies. JMX is flexible enough to use alternative transport layers and one of them is HTTP (using Jolokia open source project).

Though implementation doesn’t come out of the box. You will have to ship a Jolokia agent jar with your application and introduce it via JVM command like Java agent (see details here).

Good news is that nowadays tools such as VisualVM and Mission Control fully support Jolokia JMX transport. Below are few demo videos for Jolokia project:

● Jolokia from JMC

● Connect Visual VM to a JVM running in Kubernetes using Jolokia

● Connect Java Mission Control to a JVM in Kubernetes

In addition to classic tools, Jolokia HTTP endpoint is accessible from client side JavaScript so web client is also possible. See Hawt.IO project implement diagnostic web console for Java on top of Jolokia.

Using SJK JMX proxy

Dealing with JMX over the years, at some point I have decided to make a diagnostic tool specifically for JMX connectivity troubleshooting.

It is part of SJK - my jack-of-all-knives solution for dealing with JVM diagnostics. mxping command can help to identify, which part of JMX handshake is broken.

While implementing mxping, I have realized that I can solve the root cause of RMI network sensitivity by messing with JMX client code. As I am not eager to patch all JMX tools around, I have introduced JMX Proxy (mxprx), which can be used between JMX Client and remote JVM.

Using JMX proxy may eliminate issues with port forwarding scenario mention above as

● It does require -Djava.rmi.server.hostname=127.0.0.1 on the JVM side.

● Allow you remap ports and thus keep multiple ports forwarded at the same time.

Below is a communication diagram using JMX proxy from SJK.

In addition, with JMX proxy ad hoc configuration of JMX endpoint without JVM restart becomes possible.

JMX agent could be started and configured at runtime via jcmd, but java.rmi.server.hostname can only be set in the command line of the JVM. But with JMX proxy we do not rely on java.rmi.server.hostname anymore!

Below are steps to connect to the JVM in the Kubernetes POD even if JMX was not configured upfront.

1. Enter the container shell using the kubectl exec command.

2. In the container, use jcmd ManagementAgent.start to start JMX agent (see more details here).

3. Forward port from container to your local host.

4. Start JMX proxy on your host pointing it on localhost:<port forwarded from container> and provide some outbound port (see more details here).

5. Now you can connect with any JMX aware tool via locally running JMX proxy.

Conclusion

I have listed four alternative approaches for JMX setup. None of them are universal unfortunately and you have to pick one which is most suitable for your case.

While JMX is kind of archaic it is still essential for JVM monitoring and you are likely to have to deal with it for any serious Java based system.

I hope someday HTTP will become built-in and default for JVM and all this trickery will become a horror story from the old days.

Wednesday, May 30, 2018

SJK is learning new tricks

SJK or (Swiss Java Knife) was my secret weapon for firefighting various types of performance problems for long time.

A new version of SJK was released not too long ago and it contains а bunch of new and powerful features I would like to highlight.

ttop contention monitoring

SJK is living it's name by bundling a number of tool into single executable jar. Though, ttop is a likely single most commonly used tool under SJK roof.

ttop is a kind top for threads of JVM process. Besides CPU usage counter (provided by OS) and allocation rate (tracked by JVM), a new thread contention metrics was introduced in recent SJK release.

Thread contention metrics are calculated by JVM, which counts and times when Java threads enters into BLOCKED or WAITING state.

If enabled, SJK is using these metrics to display rates and percentage of time spent in either state.

2018-05-29T14:20:03.382+0300 Process summary 
  process cpu=231.09%
  application cpu=212.78% (user=195.86% sys=16.92%)
  other: cpu=18.31% 
  thread count: 157
  GC time=4.72% (young=4.72%, old=0.00%)
  heap allocation rate 976mb/s
  safe point rate: 6.3 (events/s) avg. safe point pause: 8.24ms
  safe point sync time: 0.07% processing time: 5.09% (wallclock time)
[000180] user=19.40% sys= 0.31% wait=183.6/s(75.77%) block=    0/s( 0.00%) alloc=  110mb/s - hz._hzInstance_2_dev.cached.thread-8
[000094] user=16.92% sys= 0.16% wait=58.50/s(81.54%) block=    0/s( 0.00%) alloc=   94mb/s - hz._hzInstance_3_dev.generic-operation.thread-0
[000057] user=15.05% sys= 0.62% wait=56.91/s(82.35%) block= 0.20/s( 0.01%) alloc=   91mb/s - hz._hzInstance_2_dev.generic-operation.thread-0
[000095] user=15.21% sys= 0.00% wait=55.61/s(82.32%) block= 0.30/s( 0.04%) alloc=   87mb/s - hz._hzInstance_3_dev.generic-operation.thread-1
[000022] user=14.59% sys= 0.00% wait=56.01/s(83.42%) block= 0.30/s( 0.08%) alloc=   86mb/s - hz._hzInstance_1_dev.generic-operation.thread-1
[000058] user=13.97% sys= 0.16% wait=56.91/s(84.13%) block= 0.10/s( 0.02%) alloc=   81mb/s - hz._hzInstance_2_dev.generic-operation.thread-1

An important fact about these metrics is - CPU time + WAITING + BLOCKED should be 100% in ideal world.

In reality, you a likely to see a gap. A few reason why equation above is not holding:

GC pauses are freezing thread execution, but not accounted by thread contention monitoring,
thread may be waiting for IO operation, but it is not accounted as BLOCKED or WAITING state by JVM,
system may starve on CPU resource and thread is waiting for CPU core on OS level (which is also not accounted by JVM).

Contention monitoring is not enabled by default, use -c flag with ttop command to enabled it.

HTML5 based flame graph

SJK was able to produce flame graphs for sometime already. Though, old flame graphs were generated as svg with limited interactivity.

New version offers a new type of flame graphs based on HTML5 and interactive. Right in browser it allows:

filtering data by threads,
zoom into specific paths or by presence of specific frame,
filtering data by thread state (if state information is available).

HTML5 report is 100% self contained file with no dependencies, it can sent it by email and open on any machine. Here is an example of new flame graph you can play right now.

New flame command is used to generate HTML5 flame graphs.

`jstack` dump support

SJK is accepting a number of input data formats for thread sampling data, which is used for flame graphs and other types of performance analysis.

A new format added in 0.10 version is text thread dump formats produced by jstack. Full list of input formats now:

SJK native thread sampling format
JVisualVM sampling snapshots (.nps)
Java Flight Recorder recording (.jfr)
jstack produced text thread dumps

Sunday, January 24, 2016

Flame Graphs Vs. Cold Numbers

Stack trace sampling is very powerful technique for performance troubleshooting. Advantages of stack trace sampling are

it doesn't require upfront configuration
cost added by sampling is small and controllable
it is easy to compare analysis result from different experiments

Unfortunately, tools offered for stack trace analysis by widespread Java profilers are very limited.

Solving performance problem in complex applications (a lot of business logic etc) is one of my regular challenges. Let's assume I have another misbehaving application at my hands. First step would be to localize bottleneck to specific part of stack.

Meet call tree

Call tree is built by digesting large number of stack traces. Each node in tree has a frequency - number of traces passing though this node.

Usually tools allow you to navigate through call tree reconstructed from stack trace population.

There is also flame graphs visualization (shown at right top of page) which is fancier but is just the same tree.

Looking at these visualization what can I see? - Not too much.

Why? Business logic somewhere in the middle of call tree produces too many branches. Tree beneath business logic is blurred beyond point of usability.

Dissecting call tree

Application is build using frameworks. For the sake of this article, I'm using example based on JBoss, JSF, Seam, Hibernate.

Now, if 13% of traces in our dump contain JDBC we can conclude what 13% of time is spent in JDBC / database calls.
13% is reasonable number, so database is not to blame here.

Let's go down the stack, Hibernate is next layer. Now we need to calculate all traces containing Hibernate classes excluding ones containing JDBC. This way we can attribute traces to particular framework and quickly get a picture where time is spent at runtime.

I didn't find any tool that can do it kind of analysis for me, so I build one for myself few years ago. SJK is my universal Java troubleshooting toolkit.

Below is command doing analysis explained above.

sjk ssa -f tracedump.std  --categorize -tf **.CoyoteAdapter.service -nc
JDBC=**.jdbc 
Hibernate=org.hibernate
"Facelets compile=com.sun.faces.facelets.compiler.Compiler.compile"
"Seam bijection=org.jboss.seam.**.aroundInvoke/!**.proceed"
JSF.execute=com.sun.faces.lifecycle.LifecycleImpl.execute
JSF.render=com.sun.faces.lifecycle.LifecycleImpl.render
Other=**

Below is output of this command.

Total samples    2732050 100.00%
JDBC              405439  14.84%
Hibernate         802932  29.39%
Facelets compile  395784  14.49%
Seam bijection    385491  14.11%
JSF.execute       290355  10.63%
JSF.render        297868  10.90%
Other             154181   5.64%

Well, we clearly see a large amount of time spent in Hibernate. This is very wrong, so it is first candidate for investigation. We also see that a lot of CPU is spent on JSF compilation, though pages should be compiled just once and cached (it turned out to be configuration issue). Actual application logic falls in JFS life cycle calls (execute(), render()). I would be possible to introduce additional category to isolate pure application logic execution time, but looking at numbers, I would say it is not necessary until other problems are solved.

Hibernate is our primary suspect, how to look inside? Let's look at method histogram for traces attributed to Hibernate trimming away all frames up to first Hibernate method call.

Below is command to do this.

sjk ssa -f --histo -tf **!**.jdbc -tt ogr.hibernate

Here is top of histogram produced by command

Trc     (%)  Frm  N  Term    (%)  Frame                                                                                                                                                                                  
699506  87%  699506       0   0%  org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1204)                                                                                                          
689370  85%  689370      10   0%  org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)                                                                                                                              
676524  84%  676524       0   0%  org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58)                                                                          
675136  84%  675136       0   0%  org.hibernate.internal.SessionImpl.list(SessionImpl.java:1261)                                                                                                                         
573836  71%  573836       4   0%  org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)                                                                                                                          
550968  68%  550968       1   0%  org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99)                                                          
533892  66%  533892     132   0%  org.hibernate.event.internal.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:227)                                                                       
381514  47%  381514     882   0%  org.hibernate.event.internal.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:76)                                                                                      
271018  33%  271018       0   0%  org.hibernate.event.internal.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:161)

Here is our suspect. We spent 87% of Hibernate time in autoFlushIfRequired() call (and JDBC time is already excluded).

Using few commands we have narrowed down one performance bottleneck. Fixing it is another topic though.

In a case, I'm using as example, CPU usage of application were reduced by 10 times. Few problems found and addressed during that case were

optimization of Hibernate usage
facelets compilation caching were properly configure
work around performance bug in Seam framework was implemented
JSF layouts were optimized to reduce number of Seam injections / outjections

Limitations of this approach

During statistical analysis of stack traces you deal with wallclock time, you cannot guest real CPU time using this method. If CPU on host is saturated, your number will be skewed by the threads idle time due to CPU starvation.

Normally you can get stack trace only at JVM safepoints. So if some methods are inlined by JIT compiler, they may never appear at trace even if they are really busy. In other words, tip of stack trace may be skewed by JIT effects. Practically, it was never an obstacle for me, but you should be keep in mind possibility of such effect.

What about flame graphs?

Well, despite being not so useful, they look good on presentations. Support for flame graphs was added to SJK recently.

Update

After some time, I've found my self using flame graphs very actively. Yes, for certain situation this type of visualization doesn't make sense, but as a first bird eye look at the problem flame graphs are indispensable.

Alexey Ragozin

Pages

Thursday, September 21, 2023

Curse of the JMX

Configuring JMX for known external IP address

Configuring JMX for tunneling

Using HTTP JMX connector

Using SJK JMX proxy

Conclusion

Wednesday, May 30, 2018

SJK is learning new tricks

ttop contention monitoring

HTML5 based flame graph

`jstack` dump support

Sunday, January 24, 2016

Flame Graphs Vs. Cold Numbers

Meet call tree

Dissecting call tree

Limitations of this approach

What about flame graphs?

Update

About Me

Search This Blog

Popular Posts

Labels

Blog Archive

Alexey Ragozin

Pages

Thursday, September 21, 2023

Curse of the JMX

Configuring JMX for known external IP address

Configuring JMX for tunneling

Using HTTP JMX connector

Using SJK JMX proxy

Conclusion

Wednesday, May 30, 2018

SJK is learning new tricks

ttop contention monitoring

HTML5 based flame graph

`jstack` dump support

Sunday, January 24, 2016

Flame Graphs Vs. Cold Numbers

Meet call tree

Dissecting call tree

Limitations of this approach

What about flame graphs?

Update

About Me

Search This Blog

Popular Posts

Labels

Blog Archive

Subscribe To