tag:blogger.com,1999:blog-77358726425136313022024-03-13T21:14:19.325+00:00Alexey RagozinAll my articles on the netAlexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.comBlogger96125tag:blogger.com,1999:blog-7735872642513631302.post-42726022459385078702023-09-21T09:55:00.004+01:002023-10-12T10:41:27.319+01:00Curse of the JMX
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JMX stands for Java
Management Extension, it was introduced as part Java Enterprise Edition (JEE)
and later has become an integral part of JVM.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JVM exposes a handful
of useful information on diagnostic tooling through the JMX interface.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Many popular tools such
as Visual VM and Mission Control are heavily based on JXM. Event Java Flight
Record is exposed for remote connection via JMX.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Middleware and
libraries are also exploiting JMX to expose custom MBeans with helpful
information.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">So if you are in the
business of JVM monitoring or diagnostic tooling you cannot avoid dealing with
JMX.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JMX is a remote access
protocol, it is using TCP sockets and requires some upfront configuration for
JVM to start listening for network connections (though tools such as VisualVM
can enable JMX at runtime, provided they have access to the JVM process).</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">You can find details
about JMX agent configuration in <a href="https://docs.oracle.com/en/java/javase/11/management/monitoring-and-management-using-jmx-technology.html#GUID-805517EC-2D33-4D61-81D8-4D0FA770D1B8"><span style="color: #1155cc;">official documentation</span></a>, but below is minimal
configuration (add snippet below to JVM start command).</span></p>
<p class="normal" style="line-height: normal; margin-top: 10pt; mso-pagination: none;"><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=55555</span></p>
<p class="normal" style="line-height: normal; margin-top: 10pt; mso-pagination: none;"><span lang="EN">JVM will start listening on port 5555. You would be able to
use this port in Visual VM and other tools.</span></p>
<p class="normal" style="line-height: normal; margin-top: 10pt; mso-pagination: none;"><span lang="EN">Configuration above is minimal, access control and TLS
encryption are disabled. You should consult documentation mentioned above to
add security (which would be typically required in a real environment).</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JMX is a capable
protocol, but it has some idiosyncrasies due to its JEE lineage. In particular,
it has specific requirements for network topology.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JVM is based on Java
RMI protocol. Access to JMX agent has a two step handshake.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">On the first step, the
client makes a request to the RMI registry and receives a serialized remote
interface stub.<span style="mso-spacerun: yes;"> </span>JXM agent has a built-in
single object registry which is exposed on port 5555 in our example.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">On the second step,
client to accessing remote interface via network address embedded in this stub
object received on the first step.</span></p>
<p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt; text-align: center;"><span face="Arial,sans-serif" style="background-color: transparent; color: black; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><img height="331" src="https://lh5.googleusercontent.com/1gZ6SX1MOEijCcGTB2YN0Jh1iIsISGjfLdcGvX_1GSEjs5gYofSztxLtWXqIcteP-zM6ojJDNEEIYKuchvy1WJ4jQxVWwTPYbQBgItPMlPAxQiNA5wk_lxitix0lCIrkmRpP7ZqtXEUpTsMEZ-vbbJk=w400-h331" style="margin-left: 0px; margin-top: 0px;" width="400" /></span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In a trivial network,
this is not an issue, but if there are any form of NAT or proxy between JMX
client and JVM things are likely to break.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">So we have two issues
here:</span></p>
<p class="normal" style="margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-list: l3 level1 lfo3; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">1.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Stub could be exposed on different
port number, which is not whitelisted</span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l3 level1 lfo3; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">2.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Stub may provide some kind of
internal IP, not routable for client host</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">First issue is easily
solvable with </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">com.sun.management.jmxremote.rmi.port</span><span lang="EN"> property, which can be set to the same value as registry port (5555 in
our example).</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Second issue is much
more tricky as JVM may be totally unaware of IP visible from outside, even
worse such IP could be dynamic so it cannot be configured via JVM command line.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In this article, I
would describe a few recipes for dealing with JMX in the modern container/cloud
world. None of them is ideal, but I hope at least one could be useful for you.</span></p>
<h2 style="margin-top: 10pt;"><a name="_jv75sa85j768"></a><span lang="EN">Configuring
JMX for known external IP address</span></h2>
<p class="normal"><span lang="EN">In case if you know a routable IP address, the
solution is to configure the JVM to provide specific IP inside of the remote
interface stub. Example for this situation would be running a JVM in a local
Docker container.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">JVM parameter </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Djava.rmi.server.hostname=<MyHost></span><span lang="EN"> can be used to override IP in remote stubs provided by JMX agent. This
parameter affects all RMI communication, but RMI is rarely used nowadays
besides the JXM protocol.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Resulting communication
scheme is outlined on the diagram below.</span></p>
<div align="center">
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: medium none; mso-border-alt: solid black 1.0pt; mso-border-insideh: 1.0pt solid black; mso-border-insidev: 1.0pt solid black; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-table-layout-alt: fixed;">
<tbody><tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="border: 1pt solid black; padding: 5pt;" valign="top" width="455">
<p align="center" class="normal" style="line-height: normal; mso-pagination: none; text-align: center;"><span lang="EN">JVM options</span></p>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 341.25pt;" valign="top" width="455">
<p class="normal" style="line-height: normal; mso-pagination: none;"><span lang="EN" style="color: #980000; font-family: "Roboto Mono Medium"; font-size: 8pt; mso-bidi-font-family: "Roboto Mono Medium"; mso-fareast-font-family: "Roboto Mono Medium";">-Djava.rmi.server.hostname=1.2.3.4</span><br />
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote</span><br />
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.authenticate=false</span><br />
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.ssl=false</span><br />
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.port=55555</span><br />
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.rmi.port=5555</span></p>
</td></tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 341.25pt;" valign="top" width="455">
<p align="center" class="normal" style="line-height: normal; mso-pagination: none; text-align: center;"><span lang="EN">Communication diagram</span></p>
</td></tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 341.25pt;" valign="top" width="455">
<p align="center" class="normal" style="margin-top: 10pt; text-align: center;"><span style="mso-ansi-language: EN-US; mso-no-proof: yes;"></span><span face="Arial,sans-serif" id="docs-internal-guid-7756b081-7fff-2562-eb25-34e661a26e36" style="background-color: transparent; color: black; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><img height="144" src="https://lh6.googleusercontent.com/udgLuxt7fCRTxgvKYHoFHNPCNFjm0i5SnnJFUGeQlPXiVtO9ibQ8fj7yOeEQDKuFP-ngHguNXYIFYltXW2Fly57Sw2mKjX7QVQHBUs7Z0Wxjz6aeXB_jnk3xh1NrY8XHzh16GudCdciFR4jOMYdNYdI" style="margin-left: 0px; margin-top: 0px;" width="364" /></span><span lang="EN" style="color: #980000; font-family: "Roboto Mono Medium"; font-size: 8pt; line-height: 115%; mso-bidi-font-family: "Roboto Mono Medium"; mso-fareast-font-family: "Roboto Mono Medium";"></span></p>
</td></tr>
</tbody></table></div>
<h2 style="margin-top: 10pt;"><a name="_6xkzys6k1bus"></a><span lang="EN">Configuring
JMX for tunneling </span></h2>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In some situations, the
IP address of the JVM host may not be even reachable from the JMX client host.
Here is a couple of typical examples</span></p>
<p class="normal" style="margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-list: l0 level1 lfo4; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">You are using SSH to access the
internal network through a bastion host.</span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l0 level1 lfo4; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">JVM is in Kubernetes POD.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In both situations you
can use port forwarding to establish a network connectivity between JMX client
and JVM.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Again, you would need
to override IP in remote service stub, but now you will have to set it to
127.0.0.1.</span></p>
<p class="normal" style="margin-bottom: 10.0pt; margin-left: 0in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in;"><span lang="EN">Communication diagram is shown below.</span></p>
<div align="center">
<table border="1" cellpadding="0" cellspacing="0" class="MsoNormalTable" style="border-collapse: collapse; border: medium none; mso-border-alt: solid black 1.0pt; mso-border-insideh: 1.0pt solid black; mso-border-insidev: 1.0pt solid black; mso-padding-alt: 0in 5.4pt 0in 5.4pt; mso-table-layout-alt: fixed;">
<tbody><tr style="mso-yfti-firstrow: yes; mso-yfti-irow: 0;">
<td style="border: 1pt solid black; padding: 5pt;" valign="top" width="457">
<p align="center" class="normal" style="line-height: normal; margin-left: -0.35pt; text-align: center;"><span lang="EN">JVM options</span></p>
</td>
</tr>
<tr style="mso-yfti-irow: 1;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 342.75pt;" valign="top" width="457">
<p class="normal" style="line-height: normal; mso-pagination: none;"><span lang="EN" style="color: #980000; font-family: "Roboto Mono Medium"; font-size: 8pt; mso-bidi-font-family: "Roboto Mono Medium"; mso-fareast-font-family: "Roboto Mono Medium";">-Djava.rmi.server.hostname=127.0.0.1</span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 8pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";"></span><br>
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote</span><br>
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.authenticate=false</span><br>
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.ssl=false</span><br>
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.port=55555</span><br>
<span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; font-size: 10pt; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Dcom.sun.management.jmxremote.rmi.port=5555</span></p>
</td>
</tr>
<tr style="mso-yfti-irow: 2;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 342.75pt;" valign="top" width="457">
<p align="center" class="normal" style="line-height: normal; text-align: center;"><span lang="EN">Communication diagram</span></p>
</td>
</tr>
<tr style="mso-yfti-irow: 3; mso-yfti-lastrow: yes;">
<td style="border-top: none; border: 1pt solid black; mso-border-top-alt: solid black 1.0pt; padding: 5pt; width: 342.75pt;" valign="top" width="457">
<p align="center" class="normal" style="margin-top: 10pt; text-align: center;"><span style="mso-ansi-language: EN-US; mso-no-proof: yes;"></span><span face="Arial,sans-serif" id="docs-internal-guid-c55608b6-7fff-7853-e99f-cf33e1d359e0" style="background-color: transparent; color: black; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><img height="140" src="https://lh5.googleusercontent.com/WmmbBqbUBQ8HmK5doXQKMrdvvO7_h4I4FW_wJS5ykSWazRTF4rOo16WOOEUg45Yfn_NYt6YlsAtUclIsItvLOu7tySUvxHfh2YjjV82VPIbcTANR9uwh0FcneHcKaORNtDyky_nX31lBtPxWMDSCCYk" style="margin-left: 0px; margin-top: 0px;" width="419" /></span><span lang="EN" style="color: #980000; font-family: "Roboto Mono Medium"; font-size: 8pt; line-height: 115%; mso-bidi-font-family: "Roboto Mono Medium"; mso-fareast-font-family: "Roboto Mono Medium";"></span></p>
</td>
</tr>
</tbody></table>
</div>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In the case of SSH, you
can use port forwarding. <br />
In Kubernetes, there is also a handy </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">kubectl port-forward</span><span lang="EN"> command which allows
to communicate with POD directly.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">You can even chain
port-forwarding multiple times.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Though this approach
has its own limitations.</span></p>
<p class="normal" style="margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-list: l4 level1 lfo1; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">JMX will not be available for
remote hosts without port forwarding any more, so this configuration may
interfere with monitoring agents running in your cluster and collecting JMX
metrics.</span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l4 level1 lfo1; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">You cannot connect to multiple
JVMs using the same JMX port (e.g. PODs from single deployment), as your port
on client host is bound to a particular remote destination. Remapping ports
will break the JMX.</span></p>
<h2 style="margin-top: 10pt;"><a name="_daql31pd27rl"></a><span lang="EN">Using
HTTP JMX connector</span></h2>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Root of the problem is
the RMI protocol which is archaic and doesn’t evolve to support modem network
topologies. JMX is flexible enough to use alternative transport layers and one
of them is HTTP (using <a href="https://github.com/jolokia/jolokia"><span style="color: #1155cc;">Jolokia</span></a> open source project).</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Though implementation
doesn’t come out of the box. You will have to ship a Jolokia agent jar with
your application and introduce it via JVM command like Java agent (see details <a href="https://jolokia.org/agent/jvm.html"><span style="color: #1155cc;">here</span></a>).</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Good news is that
nowadays tools such as VisualVM and Mission Control fully support Jolokia JMX
transport. Below are few demo videos for Jolokia project:</span></p>
<p class="normal" style="margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-list: l2 level1 lfo5; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN"><a href="https://www.youtube.com/watch?v=PDf2mqxOeMk"><span style="color: #1155cc;">Jolokia
from JMC</span></a></span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l2 level1 lfo5; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN"><a href="https://www.youtube.com/watch?v=ALkMdEPPg1U"><span style="color: #1155cc;">Connect
Visual VM to a JVM running in Kubernetes using Jolokia</span></a></span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l2 level1 lfo5; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN"><a href="https://www.youtube.com/watch?v=IkxDErc23lw"><span style="color: #1155cc;">Connect
Java Mission Control to a JVM in Kubernetes</span></a></span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In addition to classic
tools, Jolokia HTTP endpoint is accessible from client side JavaScript so web
client is also possible. See <a href="https://hawt.io/"><span style="color: #1155cc;">Hawt.IO</span></a> project implement diagnostic web console for Java
on top of Jolokia.</span></p>
<h2 style="margin-top: 10pt;"><a name="_9ql72vxtiuvn"></a><span lang="EN">Using
SJK JMX proxy</span></h2>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Dealing with JMX over
the years, at some point I have decided to make a diagnostic tool specifically
for JMX connectivity troubleshooting.</span></p>
<p class="normal" style="border: medium none; margin-top: 10pt; mso-border-shadow: yes; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt;"><span lang="EN">It is part of <a href="https://github.com/aragozin/jvm-tools"><span style="color: #1155cc;">SJK</span></a>
- my jack-of-all-knives solution for dealing with JVM diagnostics. </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">mxping</span><span lang="EN">
command can help to identify, which part of JMX handshake is broken. </span></p>
<p class="normal" style="border: medium none; margin-top: 10pt; mso-border-shadow: yes; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt;"><span lang="EN">While implementing </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">mxping</span><span lang="EN">,
I have realized that I can solve the root cause of RMI network sensitivity by
messing with JMX client code. As I am not eager to patch all JMX tools around,
I have introduced JMX Proxy (</span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">mxprx</span><span lang="EN">), which can be used between JMX
Client and remote JVM.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Using JMX proxy may
eliminate issues with port forwarding scenario mention above as</span></p>
<p class="normal" style="margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-list: l5 level1 lfo2; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">It does require </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">-Djava.rmi.server.hostname=127.0.0.1</span><span lang="EN"> on the JVM side.</span></p>
<p class="normal" style="margin-left: 0.5in; mso-list: l5 level1 lfo2; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">●<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Allow you remap ports and thus
keep multiple ports forwarded at the same time.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Below is a
communication diagram using JMX proxy from SJK.</span></p>
<p dir="ltr" style="line-height: 1.38; margin-bottom: 0pt; margin-top: 10pt;"><span face="Arial,sans-serif" style="background-color: transparent; color: black; font-size: 11pt; font-style: normal; font-variant: normal; font-weight: 400; text-decoration: none; vertical-align: baseline; white-space: pre;"><img height="226" src="https://lh5.googleusercontent.com/7XHQhv6HJ1MsyWMXgDihjP2hcc4m3RfzBqaIOKaOUdtljTDtYwvDAmuGImGlea12SbINbbjstg4AWVEiS4fT29zzBuK-gvgA2lw0PjpoQemHn-qvt6gWwZGmS87iKpfyGSbXd2ncGJLRehxc7xi2L88=w640-h226" style="margin-left: 0px; margin-top: 0px;" width="640" /></span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">In addition, with JMX
proxy ad hoc configuration of JMX endpoint without JVM restart becomes
possible.</span></p>
<p class="normal" style="border: medium none; margin-top: 10pt; mso-border-shadow: yes; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt;"><span lang="EN">JMX agent could be <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/docs/MPRX.md#example-of-client-side-proxy"><span style="color: #1155cc;">started and configured at runtime</span></a> via </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">jcmd</span><span lang="EN">,
but </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">java.rmi.server.hostname</span><span lang="EN"> can only be set in the command line of the JVM. But with JMX proxy we
do not rely on </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">java.rmi.server.hostname</span><span lang="EN"> anymore!</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">Below are steps to
connect to the JVM in the Kubernetes POD even if JMX was not configured
upfront.</span></p>
<p class="normal" style="border: medium none; margin-bottom: 0in; margin-left: .5in; margin-right: 0in; margin-top: 10.0pt; margin: 10pt 0in 0in 0.5in; mso-border-shadow: yes; mso-list: l1 level1 lfo6; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">1.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Enter the container shell using
the </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">kubectl exec</span><span lang="EN"> command.</span></p>
<p class="normal" style="border: medium none; margin-left: 0.5in; mso-border-shadow: yes; mso-list: l1 level1 lfo6; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">2.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">In the container, use </span><span lang="EN" style="color: #1c4587; font-family: "Roboto Mono"; mso-bidi-font-family: "Roboto Mono"; mso-fareast-font-family: "Roboto Mono";">jcmd ManagementAgent.start</span><span lang="EN"> to start JMX agent (see more details <a href="https://github.com/aragozin/jvm-tools/blob/master/JMX-CONFIG.md#using-jcmd-to-start-jmx-port-without-jvm-restart"><span style="color: #1155cc;">here</span></a>).</span></p>
<p class="normal" style="border: medium none; margin-left: 0.5in; mso-border-shadow: yes; mso-list: l1 level1 lfo6; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">3.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Forward port from container to
your local host.</span></p>
<p class="normal" style="border: medium none; margin-left: 0.5in; mso-border-shadow: yes; mso-list: l1 level1 lfo6; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">4.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Start JMX proxy on your host
pointing it on <i style="mso-bidi-font-style: normal;">localhost:<port
forwarded from container></i> and provide some outbound port (see more
details <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/docs/MPRX.md#example-of-client-side-proxy"><span style="color: #1155cc;">here</span></a>).</span></p>
<p class="normal" style="border: medium none; margin-left: 0.5in; mso-border-shadow: yes; mso-list: l1 level1 lfo6; mso-padding-alt: 31.0pt 31.0pt 31.0pt 31.0pt; text-indent: -0.25in;"><span lang="EN"><span style="mso-list: Ignore;">5.<span style="font: 7pt "Times New Roman";">
</span></span></span><span lang="EN">Now you can connect with any JMX
aware tool via locally running JMX proxy.</span></p>
<h2 style="margin-top: 10pt;"><a name="_6m9yz1d87o9b"></a><span lang="EN">Conclusion</span></h2>
<p class="normal"><span lang="EN">I have listed four alternative approaches for JMX
setup. None of them are universal unfortunately and you have to pick one which
is most suitable for your case.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">While JMX is kind of
archaic it is still essential for JVM monitoring and you are likely to have to
deal with it for any serious Java based system.</span></p>
<p class="normal" style="margin-top: 10pt;"><span lang="EN">I hope someday HTTP
will become built-in and default for JVM and all this trickery will become a
horror story from the old days.</span></p>
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]--><!--[if gte mso 9]><xml>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves>false</w:TrackMoves>
<w:TrackFormatting/>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>EN-US</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>X-NONE</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:DoNotOptimizeForBrowser/>
<m:mathPr>
<m:mathFont m:val="Cambria Math"/>
<m:brkBin m:val="before"/>
<m:brkBinSub m:val="--"/>
<m:smallFrac m:val="off"/>
<m:dispDef/>
<m:lMargin m:val="0"/>
<m:rMargin m:val="0"/>
<m:defJc m:val="centerGroup"/>
<m:wrapIndent m:val="1440"/>
<m:intLim m:val="subSup"/>
<m:naryLim m:val="undOvr"/>
</m:mathPr></w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
<w:LatentStyles DefLockedState="false" DefUnhideWhenUsed="true"
DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267">
<w:LsdException Locked="false" Priority="0" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Normal"/>
<w:LsdException Locked="false" Priority="9" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="heading 1"/>
<w:LsdException Locked="false" Priority="0" QFormat="true" Name="heading 2"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 3"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 4"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 5"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 6"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 7"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 8"/>
<w:LsdException Locked="false" Priority="9" QFormat="true" Name="heading 9"/>
<w:LsdException Locked="false" Priority="39" Name="toc 1"/>
<w:LsdException Locked="false" Priority="39" Name="toc 2"/>
<w:LsdException Locked="false" Priority="39" Name="toc 3"/>
<w:LsdException Locked="false" Priority="39" Name="toc 4"/>
<w:LsdException Locked="false" Priority="39" Name="toc 5"/>
<w:LsdException Locked="false" Priority="39" Name="toc 6"/>
<w:LsdException Locked="false" Priority="39" Name="toc 7"/>
<w:LsdException Locked="false" Priority="39" Name="toc 8"/>
<w:LsdException Locked="false" Priority="39" Name="toc 9"/>
<w:LsdException Locked="false" Priority="35" QFormat="true" Name="caption"/>
<w:LsdException Locked="false" Priority="10" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Title"/>
<w:LsdException Locked="false" Priority="1" Name="Default Paragraph Font"/>
<w:LsdException Locked="false" Priority="11" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtitle"/>
<w:LsdException Locked="false" Priority="22" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Strong"/>
<w:LsdException Locked="false" Priority="20" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Emphasis"/>
<w:LsdException Locked="false" Priority="59" SemiHidden="false"
UnhideWhenUsed="false" Name="Table Grid"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Placeholder Text"/>
<w:LsdException Locked="false" Priority="1" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="No Spacing"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 1"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 1"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 1"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 1"/>
<w:LsdException Locked="false" UnhideWhenUsed="false" Name="Revision"/>
<w:LsdException Locked="false" Priority="34" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="List Paragraph"/>
<w:LsdException Locked="false" Priority="29" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Quote"/>
<w:LsdException Locked="false" Priority="30" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 1"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 1"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 1"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 1"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 1"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 2"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 2"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 2"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 2"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 2"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 2"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 2"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 2"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 2"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 3"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 3"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 3"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 3"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 3"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 3"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 3"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 3"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 3"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 4"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 4"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 4"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 4"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 4"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 4"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 4"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 4"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 4"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 5"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 5"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 5"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 5"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 5"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 5"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 5"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 5"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 5"/>
<w:LsdException Locked="false" Priority="60" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Shading Accent 6"/>
<w:LsdException Locked="false" Priority="61" SemiHidden="false"
UnhideWhenUsed="false" Name="Light List Accent 6"/>
<w:LsdException Locked="false" Priority="62" SemiHidden="false"
UnhideWhenUsed="false" Name="Light Grid Accent 6"/>
<w:LsdException Locked="false" Priority="63" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6"/>
<w:LsdException Locked="false" Priority="64" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6"/>
<w:LsdException Locked="false" Priority="65" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 1 Accent 6"/>
<w:LsdException Locked="false" Priority="66" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium List 2 Accent 6"/>
<w:LsdException Locked="false" Priority="67" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6"/>
<w:LsdException Locked="false" Priority="68" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6"/>
<w:LsdException Locked="false" Priority="69" SemiHidden="false"
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6"/>
<w:LsdException Locked="false" Priority="70" SemiHidden="false"
UnhideWhenUsed="false" Name="Dark List Accent 6"/>
<w:LsdException Locked="false" Priority="71" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Shading Accent 6"/>
<w:LsdException Locked="false" Priority="72" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful List Accent 6"/>
<w:LsdException Locked="false" Priority="73" SemiHidden="false"
UnhideWhenUsed="false" Name="Colorful Grid Accent 6"/>
<w:LsdException Locked="false" Priority="19" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis"/>
<w:LsdException Locked="false" Priority="21" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis"/>
<w:LsdException Locked="false" Priority="31" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference"/>
<w:LsdException Locked="false" Priority="32" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference"/>
<w:LsdException Locked="false" Priority="33" SemiHidden="false"
UnhideWhenUsed="false" QFormat="true" Name="Book Title"/>
<w:LsdException Locked="false" Priority="37" Name="Bibliography"/>
<w:LsdException Locked="false" Priority="39" QFormat="true" Name="TOC Heading"/>
</w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Arial","sans-serif";
mso-ansi-language:EN;}
</style>
<![endif]-->Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-52259070217316492232019-03-11T09:08:00.002+00:002024-02-29T00:43:05.334+00:00Lies, darn lies and sampling bias<p>
Sampling profiling is very powerful technique widely used across various platforms for identifying hot code (execution bottlenecks).
</p>
<p>
In Java world sampling profiling (thread stack sampling to be precise) is supported by every serious profiler.
</p>
<p>
While being powerful and very handy in practice, sampling has well known weakness – sampling bias. It is real and well-known problem, though its practical impact is often being <a href="https://psy-lob-saw.blogspot.com/2016/02/why-most-sampling-java-profilers-are.html">over exaggerated</a>.
</p>
<p>
A picture is worth a thousand of words, so let me jump start with example.
</p>
<h4>Case 1</h4>
<p>
Below is a simple snippet of code. This snippet is doing cryptographic hash calculation over a bunch of random strings.
</p>
<pre class="brush: java" style="overflow: scroll;">import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.concurrent.TimeUnit;
public class CryptoBench {
private static final boolean trackTime = Boolean.getBoolean("trackTime");
public static void main(String[] args) {
CryptoBench test = new CryptoBench();
while(true) {
test.execute();
}
}
public void execute() {
long N = 5 * 1000 * 1000;
RandomStringUtils randomStringUtils = new RandomStringUtils();
long ts = 0,tf = 0;
long timer1 = 0;
long timer2 = 0;
long bs = System.nanoTime();
for (long i = 0; i < N; i++) {
ts = trackTime ? System.nanoTime() : 0;
String text = randomStringUtils.generate();
tf = trackTime ? System.nanoTime() : 0;
timer1 += tf - ts;
ts = tf;
crypt(text);
tf = trackTime ? System.nanoTime() : 0;
timer2 += tf - ts;
ts = tf;
}
long bt = System.nanoTime() - bs;
System.out.print(String.format("Hash rate: %.2f Mm/s", 0.01 * (N * TimeUnit.SECONDS.toNanos(1) / bt / 10000)));
if (trackTime) {
System.out.print(String.format(" | Generation: %.1f %%", 0.1 * (1000 * timer1 / (timer1 + timer2))));
System.out.print(String.format(" | Hashing: %.1f %%", 0.1 * (1000 * timer2 / (timer1 + timer2))));
}
System.out.println();
}
public String crypt(String str) {
if (str == null || str.length() == 0) {
throw new IllegalArgumentException("String to encrypt cannot be null or zero length");
}
StringBuilder hexString = new StringBuilder();
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(str.getBytes());
byte[] hash = md.digest();
for (byte aHash : hash) {
if ((0xff & aHash) < 0x10) {
hexString.append("0" + Integer.toHexString((0xFF & aHash)));
} else {
hexString.append(Integer.toHexString(0xFF & aHash));
}
}
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
return hexString.toString();
}
}
</pre>
<a class="github" href="https://github.com/aragozin/proflab/blob/bench/cryptoprof/master/src/main/java/CryptoBench.java">code is available on github</a>
<p>
Now let’s use a Visual VM (a profiler bundled with Java 8) and look how much time is actually spent in <code>CryptoBench.crypt()</code> method.
</p>
<!--<img src="image1.png" width="95%"/>--> <p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-NpAyLkcnXhIc8KD1YcQaakoPIO906psoqPHEB1K3bXoIt5GGIvnDcKB8apj61BAaUorQRn2mRtInYOOBIsNPuUDiHTeSzsSEssog9axOUMmt0pHzE5hKIv8CWcTQk28iGwvivsynQCcMV1p8PIOD_P6zku9YSfqmp8WbTFK3JKk6bksdk-3bGK3XMImJ/s618/image1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="444" data-original-width="618" height="461" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg-NpAyLkcnXhIc8KD1YcQaakoPIO906psoqPHEB1K3bXoIt5GGIvnDcKB8apj61BAaUorQRn2mRtInYOOBIsNPuUDiHTeSzsSEssog9axOUMmt0pHzE5hKIv8CWcTQk28iGwvivsynQCcMV1p8PIOD_P6zku9YSfqmp8WbTFK3JKk6bksdk-3bGK3XMImJ/w640-h461/image1.png" width="640" /></a></div>
Something in definitely off in screenshot above!
<p></p>
<p class="b_red">
<code>CryptoBench.crypt()</code>, method doing actual cryptography, is attributed only to <b>33%</b> of execution time.<br />
At same time, <code>CryptoBench.execute()</code> has <b>67%</b> of self time, and that methods is doing nothing besides calling other methods.
</p>
<p class="sarcasm">
Probably I just need a cooler profiler here. /s
</p>
<p>
Let’s use Java Flight Recorder for the very same case.<br />
Below is screen shot from Mission Control. </p><p></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEKZLwl1IrAehr-jDts-_8St5aQjwKyY6Odbcj-JPhHPaXzxblO_WzVXLGn8iBKyBJSHkj2VjHdh_SwgsQA0cf36CSuglz0HDJ1FlWLP1RDtsnwwrTPlRvMzbKt9VhR9VIBKGLAL1Dw10shN2Fli1bgX9x3gDsDIXBObhriiFnSWC0EwtUIoF8VEGQZukN/s608/image2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="504" data-original-width="608" height="331" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgEKZLwl1IrAehr-jDts-_8St5aQjwKyY6Odbcj-JPhHPaXzxblO_WzVXLGn8iBKyBJSHkj2VjHdh_SwgsQA0cf36CSuglz0HDJ1FlWLP1RDtsnwwrTPlRvMzbKt9VhR9VIBKGLAL1Dw10shN2Fli1bgX9x3gDsDIXBObhriiFnSWC0EwtUIoF8VEGQZukN/w400-h331/image2.png" width="400" /></a></div><br /><p></p>
<p>
That looks much better!
</p>
<p class="b_red">
<code>CryptoBench.crypt()</code> is now <b>86%</b> of time our budget. Rest of time code spends in random string generation.<br />
These numbers are looking more belivable to me.
</p>
<p>Wait, wait, wait!</p>
<p>
<code>Integer.toHexString()</code> is taking as much time as actual MD5 calculation. I cannot belive that.
</p>
<p>
Numbers are better than ones produced by VisualVM but they are still fishy enough.
</p>
<p class="sarcasm">
Flight recorder is not cool enough for that task! We need really cool profiler! /s
</p>
<p>
Ok, let me bring some sense into this discrepancy between tools.
</p>
<p>
We were using thread stack sampling in both tools (Visual VM and Flight Recorder). Though, these tools capture stack traces differently.
</p>
<p>
Visual VM is actually sampling thread dumps (via thread dump support in JVM). Thread dumps include stack traces for every application thread in JVM, regardless of whatever thread's state is (blocked, sleeping or actually executing code) and this dump is taken atomically. It reflects instant execution state of whole JVM (which is important for deadlock/contention analysis). In practice, that implies short Stop the World pause for each dump. Stop the World pause means <a href="/2012/10/safepoints-in-hotspot-jvm.html">safepoint</a> in hotspot JVM. And safepoints brings some nuances.
</p>
<p>
When Visual VM requests thread dump, JVM notifies threads to suspend execution, but a thread executing Java code wouldn’t stop immediately (unless it is interpreted). The thread would continue to run until next safepoint check where it can suspend itself. Checks cost CPU cycles so they are sparse in JIT generated code.
</p>
<p>
Checks are placed inside of loops and after method returns. Though, checks are omitted for loops considered “fast” by JIT compiler (typically integer indexed loops). Small methods are aggressively inlined too, hence omiting safepoint check at return. As a consequence, a hot and calculation intensive code may be optimized by JIT into single chunk of machine code which is mostly free of safepoint checks.
</p>
<p>
If you are lucky, thread dump would show you a line invoking the method containing hot code. With less luck result would be even more misleading.
</p>
<p class="b_red">
So in Visual VM call tree we see method <code>CryptoBench.execute()</code> at top of the stack for <b>66%</b> of samples. If we would be able to see call tree at line number granularity is would be a line calling <code>CryptoBench.crypt()</code> method.
</p>
<p class="sarcasm">
Bad, ugly safepoint bias I’ve caught you red handed! /s
</p>
<p>
So, how Flight Recorder does sample stacks and why numbers are different?
</p>
<p>
Flight Recorder sampling doesn’t involve full thread dumps. Instead it freezes threads one by one using OS provided facilities. Once thread is frozen; we can get address of next instruction to be executed out of stack memory area. Address of instruction is converted into line number of java source code via byte code to machine code symbol map. The map is generated during JIT compilation. This is how stack trace is reconstructed.
</p>
<p>
In case of Flight Recorder safepoint bias does not apply. Though results are still looking inaccurate. Why?
</p>
<p>
Below is another session with Flight Recorder for the very same code.
</p>
<!--<img src="image3.png" width="95%"/>--> <div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgqME6_Z67lzTlsfUTC3sMLVxCZIFLZlFfneLEyciCbg402TnkKBj-gKALT2Ts52yIE805qem0Cj6MEVIGEaKYviq8yaiKw5WLlgG43Nc2kDzv1Jgdyqh3hmHzpMzcvHtdcGrkKPJUXJ2lglZtlebBItM5BwmQeMLE2IX3eAmsJVw7ZbP-nXaQW6Elo8nb/s649/image3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="649" data-original-width="611" height="400" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgqME6_Z67lzTlsfUTC3sMLVxCZIFLZlFfneLEyciCbg402TnkKBj-gKALT2Ts52yIE805qem0Cj6MEVIGEaKYviq8yaiKw5WLlgG43Nc2kDzv1Jgdyqh3hmHzpMzcvHtdcGrkKPJUXJ2lglZtlebBItM5BwmQeMLE2IX3eAmsJVw7ZbP-nXaQW6Elo8nb/w376-h400/image3.png" width="376" /></a></div><br />
<p>
Picture is different now.
</p>
<p class="b_red">
<code>Integer.toHexString()</code> is just <b>2.25%</b> of out execution budget which is more trustworthy in my eyes.
</p>
<p>
Flight Recoder has to resolve memory addresses back to reference of bytecode instruction (which is further transalted into Java source line). Mapping generated by JIT compiler is used for that purpose.
</p>
<p>
Though compiler is aware that we can see thread stack trace only at safepoints. By default, only safepoint checks are mapped into bytecode instruction indexes. Flight Recorder takes execution address from stack, then it finds next address mapped to Java code in symbol table. In case of aggressive inlining, Flight Recorder can map address to whole wrong point in code.
</p>
<p>
Though sampling itself is not biased by safepoints, symbol map generated by JIT compiler is.
</p>
<p>
In second example, I’ve used two JVM options to force more detailed symbol maps to be generated by JIT compler. Options are below.
</p>
<pre class="args" style="overflow: auto;">-XX:+UnlockDiagnosticVMOptions -XX:+DebugNonSafepoints
</pre>
<p>
More accurate, free of bias, symbol map allows Flight Recorder to produce more accurate stack traces.
</p>
<p>
In our mental model, code is being executed line by line (bytecode instruction by instruction). But complier lumps bunch of methods together and generates single blob of machine code, aggressively reordering operations in the middle of process to make code faster.<br />
Our mental model of line by line execution is totally broken by compiler optimization.
</p>
<p>
Though, in practice artifacts of operation reordering are not that striking as safepoint bias.
</p>
<p class="sarcasm">
So Java Flight Recorder is cool, Visual VM is not. Should I make this conclusion?
</p>
<p>
Let me present a counter example.
</p>
<h4>Case 2</h4>
<p>
Below is profiling reports from a differnt case.
</p>
<p>
Now I’m using flame graph generated from data captured by Visual VM and Flight Recorder (with <code class="args">–XX:+DebugNonSafepoints</code>).
</p>
<p>
Visual VM report </p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilMzeUv0VOLJuLVNpJ6iBxFqpa6qOtScIqDkja8MhKopTDQPSVnOmKJT-LxWZl1L5PqZMpOwhCdoSloETBZ22TDvqgDANpsn7c-nI-uw-AXK4RntEofXUQ4_3haSDAwtehGCfZRZHi_go13VvObb04NKXBLl5YwBBza90AZ3qHYVK1CIsHAJEwYA4iCaVn/s1200/sjk_flame.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="296" data-original-width="1200" height="99" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEilMzeUv0VOLJuLVNpJ6iBxFqpa6qOtScIqDkja8MhKopTDQPSVnOmKJT-LxWZl1L5PqZMpOwhCdoSloETBZ22TDvqgDANpsn7c-nI-uw-AXK4RntEofXUQ4_3haSDAwtehGCfZRZHi_go13VvObb04NKXBLl5YwBBza90AZ3qHYVK1CIsHAJEwYA4iCaVn/w400-h99/sjk_flame.png" width="400" /></a></div><p><!--<img src="sjk_flame.png" width="95%"/>-->
</p>
<p>
Flight Recorder report </p><p class="b_red"></p><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnkDGrDSm5gliRMF2rfZpf22THuOFNs0hoCwTqXZ0XWlXvqKojtfDoDUxEvfhOsP-9p97XHOV8IjdG4WZkNBRJgklCkX6unNkW3DFZk7JckkpJ4Al4gNdMFbu_9eKMNwA-VcGte5uICog9vp-sZOII04jqAVKrcoVRn53Ty-PQmAhvVuvjXHaNb37LQ0x2/s1200/jfr_flame.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="232" data-original-width="1200" height="78" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjnkDGrDSm5gliRMF2rfZpf22THuOFNs0hoCwTqXZ0XWlXvqKojtfDoDUxEvfhOsP-9p97XHOV8IjdG4WZkNBRJgklCkX6unNkW3DFZk7JckkpJ4Al4gNdMFbu_9eKMNwA-VcGte5uICog9vp-sZOII04jqAVKrcoVRn53Ty-PQmAhvVuvjXHaNb37LQ0x2/w400-h78/jfr_flame.png" width="400" /></a></div>Both graphs are showing <code>InflaterInputStream</code> to be a bottleneck. Though Visual VM assesses time spent as <b>98%</b>, but in Flight Recorder it is just <b>47%</b>.
<p></p>
<p>Who is right?</p>
<p class="b_red">
Correct answer is <b>92%</b> (which is approximated using differential analysis).
</p>
<p class="sarcasm">
My heart is broken! Flight Recorder is not a silver bullet. /s
</p>
<p>
What have gone wrong?
</p>
<p>
In this example, hot spot was related to JNI overhead involved with calling native code in zlib. It seems like Flight Recorder were unable reconstruct stack trace for certain samples outside of Java code and dropped these samples. Sample population was biased by native code execution. That bias has played against Flight Recorder in this case.
</p>
<h4>Conclusion</h4>
<p>
Both profilers are doing that they intended to do. Some sort of bias is natural for almost any kind of sampling.
</p>
<p>
Each sampling profiler could be categorized by three aspects.
</p>
<ul>
<li>
<b>Blind spots bias</b> – which samples are excluded from data set collected by profiler.
</li>
<li>
<b>Attractor bias</b> – how samples be attracted to specific discrete points (e.g. safe point).
</li>
<li>
<b>Resolution</b> – unit of code which profiling data is being aggregated to (e.g. method, line number etc).
</li>
</ul>
<p>Below is summary table for sampling methods mentioned in this article.</p>
<table>
<tbody><tr>
<td></td>
<th>Blind spot</th>
<th>Attractor</th>
<th>Resolution</th>
</tr>
<tr>
<th>JVM Thread Dump Sampling</th>
<td>non-java threads</td>
<td>safepoint bias</td>
<td>java frames only</td>
</tr>
<tr>
<th>Java Flight Recorder</th>
<td>non-java code execution</td>
<td>CPU pipeline bias <br />+ code to source mapping skew</td>
<td>java frames only</td>
</tr>
<tr>
<th>Java Flight Recorder <br /> + <code>DebugNonSafepoint</code></th>
<td>non-java code execution</td>
<td>CPU pipeline bias <br />+ code to source mapping skew</td>
<td>java frames only</td>
</tr>
</tbody></table>
<style>
p code {
font-size: 1.3em;
color: #238;
font-weight: bold;
}
p code.args {
color: #556;
}
a.github {
display: block;
font-size: 0.8em;
margin-top: -1em;
text-align: right;
}
.b_red b {
color: #700;
}
p.sarcasm {
font-style: italic;
color: #888;
font-weight: bold;
}
pre.args {
font-size: 1.3em;
color: #556;
font-weight: bold;
}
table {
font-size: 0.8em;
}
table code {
font-size: 1.3em;
color: #556;
font-weight: bold;
}
</style>
<link href="https://alexgorbatchev.com/pub/sh/3.0.83/styles/shCore.css" rel="stylesheet" type="text/css"></link>
<link href="https://alexgorbatchev.com/pub/sh/3.0.83/styles/shThemeDefault.css" rel="stylesheet" type="text/css"></link>
<script src="https://alexgorbatchev.com/pub/sh/3.0.83/scripts/XRegExp.js" type="text/javascript"></script>
<script src="https://alexgorbatchev.com/pub/sh/3.0.83/scripts/shCore.js" type="text/javascript"></script>
<script src="https://alexgorbatchev.com/pub/sh/3.0.83/scripts/shAutoloader.js" type="text/javascript"></script>
<script src="https://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushXml.js" type="text/javascript"></script>
<script src="https://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushJava.js" type="text/javascript"></script>
<script type="text/javascript">SyntaxHighlighter.all()</script>
<br /><br />Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com4tag:blogger.com,1999:blog-7735872642513631302.post-76601372714228467802018-05-30T00:57:00.000+01:002018-11-30T03:43:48.270+00:00SJK is learning new tricks<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZeoLeL0p1Di6X_Rc74BUOq4OFb9bvFlf7Uh9jZ3UCX00HkBaOdnrXIG9I-XQxiiSFtPZDDJvA8a0tA7m_d1Z7d1DuyoSkXlSe9R49OEQk_qwjKZdZH_rQt1fhB3xS92ld_P3KeLzfhEF-/s1600/flame-pic.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjZeoLeL0p1Di6X_Rc74BUOq4OFb9bvFlf7Uh9jZ3UCX00HkBaOdnrXIG9I-XQxiiSFtPZDDJvA8a0tA7m_d1Z7d1DuyoSkXlSe9R49OEQk_qwjKZdZH_rQt1fhB3xS92ld_P3KeLzfhEF-/s200/flame-pic.png" width="200" height="141" data-original-width="619" data-original-height="436" /></a></div>
<p><a href="https://github.com/aragozin/jvm-tools">SJK</a> or (Swiss Java Knife) was my secret weapon for firefighting various types of performance problems for long time.</p>
<p>A new version of SJK was released not too long ago and it contains а bunch of new and powerful features I would like to highlight.</p>
<h2><a href="https://github.com/aragozin/jvm-tools#ttop">ttop</a> contention monitoring</h2>
<p>SJK is living it's name by bundling a number of tool into single executable jar.
Though, <code>ttop</code> is a likely single most commonly used tool under SJK roof.</p>
<p><code>ttop</code> is a kind <code>top</code> for threads of JVM process.
Besides CPU usage counter (provided by OS) and allocation rate (tracked by JVM),
a new <strong>thread contention metrics</strong> was introduced in recent SJK release.</p>
<p>Thread contention metrics are calculated by JVM,
which counts and times when Java threads enters into <strong>BLOCKED</strong> or <strong>WAITING</strong> state. </p>
<p>If enabled, SJK is using these metrics to display rates and percentage of time spent in either state.</p>
<pre class="brush: java" style="overflow:scroll"><code>2018-05-29T14:20:03.382+0300 Process summary
process cpu=231.09%
application cpu=212.78% (user=195.86% sys=16.92%)
other: cpu=18.31%
thread count: 157
GC time=4.72% (young=4.72%, old=0.00%)
heap allocation rate 976mb/s
safe point rate: 6.3 (events/s) avg. safe point pause: 8.24ms
safe point sync time: 0.07% processing time: 5.09% (wallclock time)
[000180] user=19.40% sys= 0.31% wait=183.6/s(75.77%) block= 0/s( 0.00%) alloc= 110mb/s - hz._hzInstance_2_dev.cached.thread-8
[000094] user=16.92% sys= 0.16% wait=58.50/s(81.54%) block= 0/s( 0.00%) alloc= 94mb/s - hz._hzInstance_3_dev.generic-operation.thread-0
[000057] user=15.05% sys= 0.62% wait=56.91/s(82.35%) block= 0.20/s( 0.01%) alloc= 91mb/s - hz._hzInstance_2_dev.generic-operation.thread-0
[000095] user=15.21% sys= 0.00% wait=55.61/s(82.32%) block= 0.30/s( 0.04%) alloc= 87mb/s - hz._hzInstance_3_dev.generic-operation.thread-1
[000022] user=14.59% sys= 0.00% wait=56.01/s(83.42%) block= 0.30/s( 0.08%) alloc= 86mb/s - hz._hzInstance_1_dev.generic-operation.thread-1
[000058] user=13.97% sys= 0.16% wait=56.91/s(84.13%) block= 0.10/s( 0.02%) alloc= 81mb/s - hz._hzInstance_2_dev.generic-operation.thread-1
</code></pre>
<p>An important fact about these metrics is - <strong>CPU time</strong> + <strong>WAITING</strong> + <strong>BLOCKED</strong> should be 100% in ideal world. </p>
<p>In reality, you a likely to see a gap. A few reason why equation above is not holding:</p>
<ul>
<li>GC pauses are freezing thread execution, but not accounted by thread contention monitoring,</li>
<li>thread may be waiting for IO operation, but it is not accounted as <strong>BLOCKED</strong> or <strong>WAITING</strong> state by JVM,</li>
<li>system may starve on CPU resource and thread is waiting for CPU core on OS level (which is also not accounted by JVM).</li>
</ul>
<p>Contention monitoring is not enabled by default, use <code>-c</code> flag with <code>ttop</code> command to enabled it.</p>
<h2>HTML5 based flame graph</h2>
<p><a href="https://github.com/aragozin/jvm-tools">SJK</a> was able to produce flame graphs for sometime already.
Though, old flame graphs were generated as <a href="https://en.wikipedia.org/wiki/Scalable_Vector_Graphic">svg</a> with limited interactivity. </p>
<p>New version offers a new type of flame graphs based on HTML5 and <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-hflame/docs/flame_graph_ui.md">interactive</a>. Right in browser it allows:</p>
<ul>
<li>filtering data by threads,</li>
<li>zoom into specific paths or by presence of specific frame,</li>
<li>filtering data by thread state (if state information is available).</li>
</ul>
<p>HTML5 report is 100% self contained file with no dependencies,
it can sent it by email and open on any machine.
Here is <a href="https://training.ragozin.info/collateral/flame_demo.html">an example</a> of new flame graph you can play right now.</p>
<p>New <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-hflame/docs/FLAME.md"><code>flame</code></a> command is used to generate HTML5 flame graphs.</p>
<h2>`jstack` dump support</h2>
<p>SJK is accepting a number of input data formats for thread sampling data,
which is used for flame graphs and other types of performance analysis. </p>
<p>A new format added in 0.10 version is text thread dump formats produced by <code>jstack</code>.
Full list of input formats now:</p>
<ul>
<li>SJK native thread sampling format</li>
<li>JVisualVM sampling snapshots (.nps)</li>
<li>Java Flight Recorder recording (.jfr)</li>
<li>jstack produced text thread dumps</li>
</ul>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-13979104266400687552017-06-15T19:01:00.000+01:002017-06-15T19:01:58.366+01:00HeapUnit - Test your Java heap content<p>There are usually a number of tests which you would like to run for each build to make sure what your code does make sense. Typically, such tests would be focusing on business function of your code.</p>
<p>Though, on a rare occasion, you would really like to test certain non-functional aspects. A memory/resource would be a good example.</p>
<h2>How would you test memory leak?</h2>
<p>This is quite a challenge, right?</p>
<p>You can use debugger or profiler to inspect internal state of your system. Though, that approach assumes manual testing.</p>
<p>You can write test which would stress your system provoking <code>OutOfMemoryError</code> which would fail your test if code has defect. That generally works, though adding a stress test to mostly functional automatic test pack may not be a best idea. That approach may not work for other kind of resource leaks.</p>
<p>You can exploit <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/WeakReference.html">weak</a> or <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/PhantomReference.html">phantom</a> reference to trace garbage collector work. This approach makes test more lightweight compared to fully fledged stress testing, but it is not applicable in many cases. E.g. you may not have a reference to leak suspected objects.</p>
<p>For some time I was actively practising <a href="/2015/02/programatic-heapdump-analysis.html">automated inspection of JVM heap dumps</a> for diagnostic purposes.
JVM could easily produce its own heap dump (using JVM attach interface) and that dump can be <a href="https://github.com/aragozin/jvm-tools/tree/master/hprof-heap">inspected via API</a> to assert certain invariants (e.g. number of live instances of particular type). Why not use it for resource leak testing and similar cases?</p>
<h2>Resurrecting object from dump</h2>
<p><a href="https://github.com/aragozin/jvm-tools/tree/master/hprof-heap">Heap dump API</a> allows you to inspect fields of dumped objects; there is also <a href="https://github.com/aragozin/jvm-tools/blob/master/hprof-heap/HEAPPATH.md">heap path</a> notation for writing sophisticated selectors. Though, you cannot invoke methods, not even <code>toString()</code> or <code>equals()</code>, on objects from dump. For quantitative analysis of, this is ok. But for asserting complex conditions typical to test scenario, dealing with Java objects may be much more convenient, though.</p>
<p>Heap dump doesn?t contain full class information. But if dump is produced from JVM we are running in we can relay on class metedata available through reflection.</p>
<p><a href="http://objenesis.org/">Objenesis</a> library and Java reflection is used to convert instance data from heap dump back to normal Java objects.</p>
<p>At the end, usage of <a href="https://github.com/aragozin/heapunit">HeapUnit</a> is fairly simple. Using API you can </p>
<ul>
<li>take heap dump</li>
<li>select certain types of instance from dump by class or <a href="https://github.com/aragozin/jvm-tools/blob/master/hprof-heap/HEAPPATH.md">heap path</a> notation</li>
<li>inspect instance?s fields using symbolic names</li>
<li>or rehydrate instance into Java object</li>
</ul>
<h2>Example</h2>
<p>Below is a simple example listing Socket objects in JVM</p>
<pre class="brush: java" style="overflow:scroll"><code>@Test
public void printSockets() throws IOException {
ServerSocket ss = new ServerSocket();
ss.bind(sock(5000));
Socket s1 = new Socket();
Socket s2 = new Socket();
s1.connect(sock(5000));
s2.connect(sock(5000));
ss.close();
s1.close();
// s2 remains unclosed
HeapImage hi = HeapUnit.captureHeap();
for(HeapInstance i: hi.instances(SocketImpl.class)) {
// fd field in SocketImpl class is nullified when socket gets closed
boolean open = i.value("fd") != null;
System.out.println(i.rehydrate() + (open ? " - open" : " - closed"));
}
}
</code></pre>
<p><a href="https://github.com/aragozin/heapunit">HeapUnit</a> library is available in Maven Central repo. You can bring it to your project using Maven coordinates below.</p>
<pre class="brush: xml" style="overflow:scroll"><code><dependency>
<groupId>org.gridkit.heapunit</groupId>
<artifactId>heapunit</artifactId>
<version>0.2</version>
</dependency>
</code></pre>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-79221148140270805402016-10-25T04:04:00.000+01:002016-10-25T04:04:16.622+01:00HotSpot JVM garbage collection options cheat sheet (v4) <p>After <a href="/2013/11/hotspot-jvm-garbage-collection-options.html">three years</a>, I have decided to update my GC cheat sheet.</p>
<p>New version finally includes G1 options, thankfully there are not very many of
them. There are also few useful options introduced to CMS including parallel <em>inital mark</em> and initiating concurrent cycles by timer.</p>
<p>Finally, I made separate cheat sheet versions for Java 7 and Java 8.</p>
<p>Below are links to PDF versions</p>
<ul>
<li><a href="https://raw.githubusercontent.com/aragozin/sketchbook/download/Java%208%20-%20GC%20cheatsheet.pdf">Hotspot JVM GC options cheat sheet for Java 8</a></li>
<li><a href="https://raw.githubusercontent.com/aragozin/sketchbook/download/Java%207%20-%20GC%20cheatsheet.pdf">Hotspot JVM GC options cheat sheet for Java 7</a></li>
</ul>
<h4>Java 8 GC cheat sheet</h4>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJFKhh_G17YRPnVPIQPjKbP6rQl3Wldi8YdktTDRqmlZN6sJJCNyTHS6IxZK5aUI861GZrO-QanYBNlUmpDG0tq5w-Wobs-jnQ1yzBTIxdOYsCbeUBwswCWF4YbYoraTGAEOz9i_RGfOLZ/s1600/Java+8+-+GC+cheatsheet.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJFKhh_G17YRPnVPIQPjKbP6rQl3Wldi8YdktTDRqmlZN6sJJCNyTHS6IxZK5aUI861GZrO-QanYBNlUmpDG0tq5w-Wobs-jnQ1yzBTIxdOYsCbeUBwswCWF4YbYoraTGAEOz9i_RGfOLZ/s400/Java+8+-+GC+cheatsheet.png" width="301" height="400" /></a></div>
<h4>Java 7 GC cheat sheet</h4>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt9wCf_07KwvxWOvH9CiXDGq1NKghhFGEJ7WVqzlxRGVwWPA8TkTOKHU6oMY0ZIkUZ60IPXCwXVJetnWCFmFycPjkljriFddiYuDSt3DS3Ti5BMPqvyzBZ3kc3dfsKHcSCpRK091PJLFBb/s1600/Java+7+-+GC+cheatsheet.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgt9wCf_07KwvxWOvH9CiXDGq1NKghhFGEJ7WVqzlxRGVwWPA8TkTOKHU6oMY0ZIkUZ60IPXCwXVJetnWCFmFycPjkljriFddiYuDSt3DS3Ti5BMPqvyzBZ3kc3dfsKHcSCpRK091PJLFBb/s400/Java+7+-+GC+cheatsheet.png" width="301" height="400" /></a></div>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com5tag:blogger.com,1999:blog-7735872642513631302.post-5987305357208062822016-09-16T23:27:00.000+01:002016-09-28T23:33:46.744+01:00How to measure object size in Java?<p>You define fields, their names and types, in source of Java class, but it is JVM the one who decides how they will be stored in physical memory.</p>
<p>Sometimes you want to know exactly how much Java object weights in Java. Answering this question is surprisingly complicated. </p>
<h4>Challenge</h4>
<ul>
<li>Pointer size and Java object header size varies.</li>
<li>JVM could be build for 32 or 64 bit architecture.
On 64 bit architectures JVM may or may not use compressed pointers (<code>-XX:+UseCompressedOops</code>).</li>
<li>Object padding may be different (<code>-XX:ObjectAlignmentInBytes=X</code>).</li>
<li>Different field types may have different alignment rules.</li>
<li>JVM may reorder fields in object layout as it likes.</li>
</ul>
<p>Figure below illustrates how JVM may rearrange fields in memory.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxkV6BOR2-Xfk9_aLG00y2JbV8_CsZrUiztcKOlVrimxbqOA_fgPt9hZB2iL4KHnoc7zQmSkIgc4b1jcc1udVVCwrDX40uIafdvUcLJKpAXlQSpTrsxfE03_GfiuywvLIHYYs7pi8g5MMi/s1600/java+object+layout.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxkV6BOR2-Xfk9_aLG00y2JbV8_CsZrUiztcKOlVrimxbqOA_fgPt9hZB2iL4KHnoc7zQmSkIgc4b1jcc1udVVCwrDX40uIafdvUcLJKpAXlQSpTrsxfE03_GfiuywvLIHYYs7pi8g5MMi/s1600/java+object+layout.png" /></a></div>
<h4>Guessing object layout</h4>
<p>You can scrap class fields via reflection and try to guess layout chosen by JVM taking into account platform pointer size and other factors. </p>
<p>... at least you can try.</p>
<h4>Using the Unsafe</h4>
<p><a href="https://dzone.com/articles/understanding-sunmiscunsafe">sun.misc.Unsafe</a> is internal helper class used by JVM code. You should not use it, but you can (with some help from reflection). Unsafe is popular among people doing weird things with JVM. </p>
<p>Unsafe can let you query information about physical layout of Java object. Though, it would not tell you directly real size of object in memory. You would still have to do some error-prone math to calculate object's size.</p>
<p><a href="https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/java/com/hazelcast/util/JVMUtil.java">Here is example of such code</a>.</p>
<h4>Instrumentation agent</h4>
<p><a href="https://docs.oracle.com/javase/7/docs/api/java/lang/instrument/Instrumentation.html">java.lang.instrument.Instrumentation</a> is an API for profilers and other performance tools. You need to install agent into JVM to get instance of this class. This class has handy <code>getObjectSize(...)</code> method which would tell you real object size.</p>
<p>There is library <a href="https://github.com/jbellis/jamm">jamm</a> which exploit this option. You should use special JVM start options though.</p>
<h4>Threading MBean</h4>
<p>Threading MBean in JVM has a handy allocation counter. Using this counter you can easily measure object size by allocating new instance and checking delta of counter. Snippet below is doing just that.</p>
<pre class="brush: java" style="overflow:scroll"><code>import java.lang.management.ManagementFactory;
public class MemMeter {
private static long OFFSET = measure(new Runnable() {
@Override
public void run() {
}
});
/**
* @return amount of memory allocated while executing provided {@link Runnable}
*/
public static long measure(Runnable x) {
long now = getCurrentThreadAllocatedBytes();
x.run();
long diff = getCurrentThreadAllocatedBytes() - now;
return diff - OFFSET;
}
@SuppressWarnings("restriction")
private static long getCurrentThreadAllocatedBytes() {
return ((com.sun.management.ThreadMXBean)ManagementFactory.getThreadMXBean()).getThreadAllocatedBytes(Thread.currentThread().getId());
}
}
</code></pre>
<p>Below is simple usage example</p>
<pre class="brush: java" style="overflow:scroll"><code>System.out.println("size of java.lang.Object is "
+ MemMeter.measure(new Runnable() {
Object x;
@Override
public void run() {
x = new Object();
}
}));
</code></pre>
<p>Though, this approach require you to create new instance of object to measure its size. That may be an obstacle.</p>
<h4>jmap</h4>
<p><code>jmap</code> is a one of JDK tools. With <code>jmap -histo PID</code> command you can print histogram of your heap objects.</p>
<pre style="overflow:scroll"><code>num #instances #bytes class name
---------------------------------------------
1: 1413317 111961288 [C
2: 272969 39059504 <constMethodKlass>
3: 1013137 24315288 java.lang.String
4: 245685 22715744 [I
5: 272969 19670848 <methodKlass>
6: 206682 17868464 [B
7: 29355 17722320 <constantPoolKlass>
8: 659710 15833040 java.util.HashMap$Entry
9: 29355 12580904 <instanceKlassKlass>
10: 105637 12545112 [Ljava.util.HashMap$Entry;
11: 170894 11797400 [Ljava.lang.Object;
</code></pre>
<p>For objects, you can divide byte size by instance count to get individual instance size for class. This would not work for arrays, though.</p>
<h4>Java Object Layout tool</h4>
<p><a href="http://openjdk.java.net/projects/code-tools/jol/">Java Object Layout</a> tool is using number of different approaches for introspecting physical layout of Java object in memory.</p>
<link href='http://alexgorbatchev.com/pub/sh/3.0.83/styles/shCore.css' rel='stylesheet' type='text/css' />
<link href='http://alexgorbatchev.com/pub/sh/3.0.83/styles/shThemeDefault.css' rel='stylesheet' type='text/css' />
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/XRegExp.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shCore.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shAutoloader.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushXml.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushJava.js' type='text/javascript'></script>
<script type="text/javascript">
SyntaxHighlighter.all()
</script>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com2tag:blogger.com,1999:blog-7735872642513631302.post-86239463333295996862016-07-21T22:53:00.000+01:002016-07-21T22:53:16.736+01:00Rust, JNI, Java<div><a imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhC_jNtTKrm6x7zCRcNKl33eIvcUDo3uGNCEL0JEG3ujQnLnqK7scUYK7oKT4h5cGkQPs_5YnU1BjK1-Vb47ZFI60oAknnwgpEmif99HdQFYUnLOt5w-F0Kzk7eSs6Dy4KHlX17LUSoCZcl/s1600/rust.png" /></a></div>
<p>Recently, I had a necessity to do some calls
to <code>kernel32.dll</code> from my Java code.
Just a few system calls on Windows platform,
as simple as it sounds.
Plus I wanted to keep resulting size of binary
as small as possible.</p>
<p>Later requirement has added a fair challenge to that task.</p>
<p>How to call platform code for Java?</p>
<h4>JNI - Java Native Interface</h4>
<p>JNI is built in JVM and is part of Java standard.
Sounds good, there is a catch though.
To call native code from Java via JNI,
you have to write native code (e.g. using C language).
That is it, JNI requires some glue code (aka bindings) between
native calls and Java methods.</p>
<p>m... do we have other alternatives?</p>
<h4>JNA - Java Native Access</h4>
<p><a href="https://github.com/java-native-access/jna">JNA</a> is an alternative to JNI. You can call native code from Java, no glue code. Cool, what is the cost?</p>
<p>JNA jar has size of <strong>1.1 MiB</strong>. Extra megabyte just to do couple of simple calls to Windows kernel - not a deal.</p>
<h4>Back to JNI</h4>
<p>Ok, I need to write some glue code for JNI.
What language to choose?</p>
<p>C/C++ - no, just no.
C/C++ tool chain, compiler, headers, build tools, is an abomination,
especially on Windows. Please, I just need literally
half screen of code compiled to dll binary.
I do not want 10 GiB worth Visual Studio
to pollute my desktop.</p>
<p>Die hard Java guy is speaking :) </p>
<h4>Free Pascal</h4>
<p>Pascal is an ancient language.
It was programming language of my youth.
MS DOS, Turbo Pascal ... colors were so bright these days.</p>
<p>Twenty years later, I was surprised to find Pascal in pretty good shape.
<a href="http://www.freepascal.org">Free Pascal</a> has impressive list of supported platforms.
Pascal compiler is lighting fast.
Produced binaries have no dependency on libc / msvcrt.</p>
<p>Using Free Pascal I get my kernel32-to-JNI dll with size of <strong>33 KiB</strong>.
That sounds much, much better. </p>
<p>Can we do better?</p>
<h4>Rust</h4>
<p><a href="https://www.rust-lang.org">Rust</a> is a new kid in a language block.
It has a strong ambition to replace C/C++ as system
level language. It gives you all powers of C
plus memory safety, modernized build system,
language level modules (crates).</p>
<p>Sounds promising, let's try <a href="https://www.rust-lang.org">Rust</a>
for little JNI glue dll.</p>
<p>Calling </p>
<p><code>rustc -C debuginfo=0 --crate-type dylib myjni.rs</code></p>
<p>result is disappointing <strong>2.5 MiB</strong> binary.</p>
<p>Rust <code>dylib</code> is a dll which can be used by other Rust code,
so it is exposing a lot of language specific metadata.
<code>cdylib</code> is a new packaging introduced in <a href="https://blog.rust-lang.org/2016/07/07/Rust-1.10.html">Rust 1.10</a>,
which is more suitable for JNI bindings.</p>
<p>Command line </p>
<p><code>rustc -C lto -C debuginfo=0 --crate-type cdylib myjni.rs</code></p>
<p>has produced <strong>1.6 MiB</strong> binary.
<code>-C lto</code> option instructs compiler to do "link time optimization".
For some reason <code>cdylib</code> was not compiling without <code>lto</code> option for me.</p>
<p>Ok, direction is right, but we need to move much further. Let's try more compiler options.</p>
<p>Command line </p>
<p><code>rustc -C opt-level=3 -C lto -C debuginfo=0 --crate-type cdylib myjni.rs</code></p>
<p>has produced <strong>200 KiB</strong> binary.
Optimization allow compiler to throw away a big portion of standard library which I will never
need for my simple JNI binding.</p>
<p>Though, a large portion of standard library is still there.</p>
<p>In Rust you can fully turn off standard library (e.g. to run on bare metal). </p>
<p>Normally you would need at least memory management,
but for simple JNI binding you can get away using
stack allocation only.</p>
<p>At the moment, using Rust with <a href="https://doc.rust-lang.org/book/using-rust-without-the-standard-library.html">no_std</a> option
requires nightly build of compiler.
I have also rewrite some portion of kernel32 and JNI
declarations to avoid dependency on <code>libc</code> types.</p>
<p><code>rustc -C opt-level=3 -C panic=abort -C lto -C debuginfo=0 --crate-type cdylib myjni.rs</code></p>
<p>Binary size is <strong>22.5 KiB</strong>.</p>
<p>Cool, we have beaten Free Pascal.</p>
<p>One more tweak, execute <code>strip -s</code> on resulting dll and final binary size is <strong>16.9 KiB</strong>.</p>
<p>Honestly, 16.9 KiB for couple of calls is still
overkill. But, I'm not desperate enough
to try assembly for JNI binding, at least not today.</p>
<h4>Conclusion</h4>
<p><strong><a href="http://www.freepascal.org">Free Pascal</a></strong> IMHO, Free Pascal a good
choice if you need simple JNI bindings.
As a bonus, Free Pascal on Linux has no dependency
on platform's dynamic libraries, so you can
build cross-Linux-distro binaries.</p>
<p><strong><a href="https://www.rust-lang.org">Rust</a></strong>. I believe Rust have a great
potential. Rust has unique memory safety model
yet it let you to get as close to bare metal as C does.
Besides other features, Rust has really promising
<a href="https://blog.rust-lang.org/2016/05/13/rustup.html">cross compiling capabilities</a>,
which gives it a very strong position in embedded / IoT space. </p>
<p>Yet, Rust needs to get more stable.
<a href="https://doc.rust-lang.org/book/using-rust-without-the-standard-library.html">no_std feature</a> is not available in
latest (1.10) stable. <code>cdynlib</code> is not supported
by latest stable cargo tool.
Rust tool chain on Windows depends
either on MS Visual Studio or <a href="http://www.mingw.org/wiki/msys">MSys</a>.
Resulting binaries are slightly incompatible
to each other (Oracle JMV is build with
Visual Studio, so using <a href="http://www.mingw.org/wiki/msys">MSys</a> built JNI bindings
leads to process crash in certain cases).</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com3tag:blogger.com,1999:blog-7735872642513631302.post-90095662637309808242016-03-16T21:49:00.000+00:002016-09-28T23:40:48.844+01:00Finalizers and References in Java<p>Automatic memory management (garbage collection) is one of essential aspects of Java platform. Garbage collection relieves developers from pain of memory management and protects them from whole range of memory related issues. Though, working with external resources (e.g. files and socket) from Java becomes tricky, because garbage collector alone is not enough to manage such resources.</p>
<p>Originally Java had <a href="https://en.wikipedia.org/wiki/Finalizer">finalizers</a> facility. Later special <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/Reference.html">reference classes</a> were added to deal with same problem.</p>
<p>If we have some external resource which should be deallocated explicitly (common case with native libraries), this task could be solved either using finalizer or <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/PhantomReference.html">phantom reference</a>. What is the difference?</p>
<h3>Finalizer approach</h3>
<p>Code below is implementing resource housekeeping using Java finalizer.</p>
<pre class="brush: java" style="overflow:scroll"><code>public class Resource implements ResourceFacade {
public static AtomicLong GLOBAL_ALLOCATED = new AtomicLong();
public static AtomicLong GLOBAL_RELEASED = new AtomicLong();
int[] data = new int[1 << 10];
protected boolean disposed;
public Resource() {
GLOBAL_ALLOCATED.incrementAndGet();
}
public synchronized void dispose() {
if (!disposed) {
disposed = true;
releaseResources();
}
}
protected void releaseResources() {
GLOBAL_RELEASED.incrementAndGet();
}
}
public class FinalizerHandle extends Resource {
protected void finalize() {
dispose();
}
}
public class FinalizedResourceFactory {
public static ResourceFacade newResource() {
return new FinalizerHandle();
}
}
</code></pre>
<h3>Phantom reference approach</h3>
<pre class="brush: java" style="overflow:scroll"><code>public class PhantomHandle implements ResourceFacade {
private final Resource resource;
public PhantomHandle(Resource resource) {
this.resource = resource;
}
public void dispose() {
resource.dispose();
}
Resource getResource() {
return resource;
}
}
public class PhantomResourceRef extends PhantomReference<PhantomHandle> {
private Resource resource;
public PhantomResourceRef(PhantomHandle referent, ReferenceQueue<? super PhantomHandle> q) {
super(referent, q);
this.resource = referent.getResource();
}
public void dispose() {
Resource r = resource;
if (r != null) {
r.dispose();
}
}
}
public class PhantomResourceFactory {
private static Set<Resource> GLOBAL_RESOURCES = Collections.synchronizedSet(new HashSet<Resource>());
private static ResourceDisposalQueue REF_QUEUE = new ResourceDisposalQueue();
private static ResourceDisposalThread REF_THREAD = new ResourceDisposalThread(REF_QUEUE);
public static ResourceFacade newResource() {
ReferedResource resource = new ReferedResource();
GLOBAL_RESOURCES.add(resource);
PhantomHandle handle = new PhantomHandle(resource);
PhantomResourceRef ref = new PhantomResourceRef(handle, REF_QUEUE);
resource.setPhantomReference(ref);
return handle;
}
private static class ReferedResource extends Resource {
@SuppressWarnings("unused")
private PhantomResourceRef handle;
void setPhantomReference(PhantomResourceRef ref) {
this.handle = ref;
}
@Override
public synchronized void dispose() {
handle = null;
GLOBAL_RESOURCES.remove(this);
super.dispose();
}
}
private static class ResourceDisposalQueue extends ReferenceQueue<PhantomHandle> {
}
private static class ResourceDisposalThread extends Thread {
private ResourceDisposalQueue queue;
public ResourceDisposalThread(ResourceDisposalQueue queue) {
this.queue = queue;
setDaemon(true);
setName("ReferenceDisposalThread");
start();
}
@Override
public void run() {
while(true) {
try {
PhantomResourceRef ref = (PhantomResourceRef) queue.remove();
ref.dispose();
ref.clear();
} catch (InterruptedException e) {
// ignore
}
}
}
}
}
</code></pre>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4Nfpk6e5lTSJDpLN3PvsJITfPMLcPENMyEaJ-d3mqMGO65W8QuVKF0tGK_1_-3ev_KCdi1LNNSV8eBqC6U4HXXE3h3fa8ZJt-ormQUn1ppamegCs0gEsctjMGJbnMfjcRnxdmbh-dph7e/s1600/phantom_usage.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4Nfpk6e5lTSJDpLN3PvsJITfPMLcPENMyEaJ-d3mqMGO65W8QuVKF0tGK_1_-3ev_KCdi1LNNSV8eBqC6U4HXXE3h3fa8ZJt-ormQUn1ppamegCs0gEsctjMGJbnMfjcRnxdmbh-dph7e/s200/phantom_usage.png" /></a></div>
<p>Implementing same task using <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/PhantomReference.html">phantom reference</a> requires more boilerplate. We need separate thread to handle reference queue, in addition, we need to keep strong references to allocated reference objects.</p>
<h3>How finilaizers work in Java</h3>
<p>Under the hood, finilizers work very similarly to our <a href="https://docs.oracle.com/javase/7/docs/api/index.html?java/lang/ref/PhantomReference.html">phantom reference</a> implementation, though, JVM is hiding boilerplate from us. </p>
<p>Each time instance of object with finalizer is created, JVM creates instance of <a href="http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/ref/FinalReference.java/">FinalReference</a> class to track it. Once object becomes unreachable, <a href="http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/ref/FinalReference.java/">FinalReference</a> is triggered and added to global final reference queue, which is being processed by system finalizer thread.</p>
<p>So finalizes and phantom reference approach work very similar. Why should you bother with phantom references?</p>
<h3>Comparing GC impact</h3>
<p>Let's have simple test: resource object is allocated then added to the queue, once queue size hits limit oldest reference is evicted and thrown away. For this test we will monitor reference processing via GC logs.</p>
<p><strong>Running finalizer based implementation.</strong></p>
<pre><code>[GC [ParNew[ ... [FinalReference, 5718 refs, 0.0063374 secs] ...
Released: 6937 In use: 59498
</code></pre>
<p><strong>Running phantom based implementation.</strong></p>
<pre><code>[GC [ParNew[ ... [PhantomReference, 5532 refs, 0.0037622 secs] ...
Released: 5468 In use: 38897
</code></pre>
<p>As you can see, once object becomes unreachable, it needs to be handled in GC reference processing phase. Reference processing is a part of Stop-the-World pause. If, between collections, too many references becomes eligible for processing it may prolong Stop-the-World pause significantly.</p>
<p>In case above, there is no much difference between finalizers and phantom references. But let's change workflow a little. Now we would explicitly dispose 99% of handles and rely on GC only for 1% of references (i.e. semiautomatic resource management).</p>
<p><strong>Running finalizer based implementation.</strong></p>
<pre><code>[GC [ParNew[ ... [FinalReference, 6295 refs, 0.0070033 secs] ...
Released: 6707 In use: 1457
</code></pre>
<p><strong>Running phantom based implementation.</strong></p>
<pre><code>[GC [ParNew[ ... [PhantomReference, 625 refs, 0.0001551 secs] ...
Released: 21682 In use: 1217
</code></pre>
<p>For finalizer based implementation there is no difference. Explicit resource disposal doesn't help reduce GC overhead. But with phantoms, we can see what GC do not need to handle explicitly disposed references (so number of references process by GC is reduced by order of magnitude).</p>
<p>Why this is happening? When resource handle is disposed we drop reference to phantom reference object. Once phantom reference is unreachable, it would never be queued for processing by GC, thus saving time in reference processing phase. It is quite opposite with final references, once created it will be strong referenced by JVM until being processed by finalizer thread.</p>
<h3>Conclusion</h3>
<p>Using phantom references for resources housekeeping requires more work compared to plain finalizer approach.
But using phantom references you have far more granular control over whole process and implement number of optimizations such as hybrid (manual + automatic) resource management.</p>
<p>Full source code used for this article is available at <a href="https://github.com/aragozin/example-finalization">https://github.com/aragozin/example-finalization</a>. </p>
<link href='http://alexgorbatchev.com/pub/sh/3.0.83/styles/shCore.css' rel='stylesheet' type='text/css' />
<link href='http://alexgorbatchev.com/pub/sh/3.0.83/styles/shThemeDefault.css' rel='stylesheet' type='text/css' />
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/XRegExp.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shCore.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shAutoloader.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushXml.js' type='text/javascript'></script>
<script src='http://alexgorbatchev.com/pub/sh/3.0.83/scripts/shBrushJava.js' type='text/javascript'></script>
<script type="text/javascript">SyntaxHighlighter.all()</script>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com1tag:blogger.com,1999:blog-7735872642513631302.post-50321418769851994332016-01-24T18:40:00.000+00:002018-05-26T10:05:49.213+01:00Flame Graphs Vs. Cold Numbers<div class="separator" style="clear: both; text-align: center;"><a href="http://gridkit.github.io/other/jboss-flame-graph.svg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgJt-yY50T09tPENKKkhzEQe-3cNAkNiExm_-3mPfOkCEyGc2dhfBnzN62p13Q3i_wRmPUzMrcFKKobhbH8QkuUr1bDlKSm9fqpkXOlpTDj9qrM0t_uSt9j1I7reoVZUey_G8xww4C_ip9N/s400/jboss.png" /></a></div>
<p>Stack trace sampling is very powerful technique
for performance troubleshooting.
Advantages of stack trace sampling are</p>
<ul>
<li>it doesn't require upfront configuration</li>
<li>cost added by sampling is small and controllable</li>
<li>it is easy to compare analysis
result from different experiments</li>
</ul>
<p>Unfortunately, tools offered for stack trace analysis
by widespread Java profilers are very limited.</p>
<p>Solving performance problem in complex applications
(a lot of business logic etc) is one
of my regular challenges.
Let's assume I have another misbehaving
application at my hands.
First step would be to localize bottleneck
to specific part of stack.</p>
<h3>Meet call tree</h3>
<p>Call tree is built by digesting large number of stack traces. Each node in tree has a frequency - number
of traces passing though this node.</p>
<p>Usually tools allow you to navigate through
call tree reconstructed from stack trace population.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpeRQpdPUYg9uMrxvgAjnyJLJxCKB8YnDI5YOsVmDn6i4HBfryGIZBZa8QMu2DSrLyqyeSDp9N47ckvIpJaFmvNPhCZM3-LLhinQOd_IP3xJazdVohNWTECpQFZbWkDIXDpgGHjKqRdOkl/s1600/stree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjpeRQpdPUYg9uMrxvgAjnyJLJxCKB8YnDI5YOsVmDn6i4HBfryGIZBZa8QMu2DSrLyqyeSDp9N47ckvIpJaFmvNPhCZM3-LLhinQOd_IP3xJazdVohNWTECpQFZbWkDIXDpgGHjKqRdOkl/s400/stree.png" /></a></div>
<p>There is also <a href="https://github.com/brendangregg/FlameGraph">flame graphs</a> visualization
(shown at right top of page) which
is fancier but is just the same tree.</p>
<p>Looking at these visualization what can I see? - Not too much.</p>
<p>Why? Business logic somewhere in the middle of call tree
produces too many branches.
Tree beneath business logic
is blurred beyond point of usability.</p>
<h3>Dissecting call tree</h3>
<p>Application is build using frameworks.
For the sake of this article, I'm using example
based on <a href="http://www.jboss.org/">JBoss</a>, <a href="https://en.wikipedia.org/wiki/JavaServer_Faces">JSF</a>, <a href="http://seamframework.org/">Seam</a>, <a href="http://hibernate.org/">Hibernate</a>. </p>
<p>Now, if 13% of traces in our dump contain
JDBC we can conclude what 13% of time
is spent in JDBC / database calls. <br />
13% is reasonable number, so database is not to blame here.</p>
<p>Let's go down the stack, <a href="http://hibernate.org/">Hibernate</a> is next layer.
Now we need to calculate all traces
containing <a href="http://hibernate.org/">Hibernate</a> classes excluding
ones containing JDBC.
This way we can attribute traces to
particular framework and quickly get a picture
where time is spent at runtime.</p>
<p>I didn't find any tool that can do it kind
of analysis for me, so I build one for myself
few years ago. <a href="https://github.com/aragozin/jvm-tools">SJK</a> is my universal Java
troubleshooting toolkit.</p>
<p>Below is command doing analysis explained above.</p>
<pre style="overflow:scroll"><code>sjk ssa -f tracedump.std --categorize -tf **.CoyoteAdapter.service -nc
JDBC=**.jdbc
Hibernate=org.hibernate
"Facelets compile=com.sun.faces.facelets.compiler.Compiler.compile"
"Seam bijection=org.jboss.seam.**.aroundInvoke/!**.proceed"
JSF.execute=com.sun.faces.lifecycle.LifecycleImpl.execute
JSF.render=com.sun.faces.lifecycle.LifecycleImpl.render
Other=**
</code></pre>
<p>Below is output of this command.</p>
<pre style="overflow:scroll"><code>Total samples 2732050 100.00%
JDBC 405439 14.84%
Hibernate 802932 29.39%
Facelets compile 395784 14.49%
Seam bijection 385491 14.11%
JSF.execute 290355 10.63%
JSF.render 297868 10.90%
Other 154181 5.64%
</code></pre>
<p>Well, we clearly see a large amount of time spent
in <a href="http://hibernate.org/">Hibernate</a>.
This is very wrong, so it is first candidate
for investigation.
We also see that a lot of CPU is spent
on JSF compilation, though pages should
be compiled just once and cached
(it turned out to be configuration issue).
Actual application logic falls in JFS life cycle calls (<code>execute()</code>, <code>render()</code>).
I would be possible to introduce additional
category to isolate pure application logic
execution time, but looking at numbers,
I would say it is not necessary until
other problems are solved.</p>
<p><a href="http://hibernate.org/">Hibernate</a> is our primary suspect,
how to look inside?
Let's look at method histogram for traces
attributed to <a href="http://hibernate.org/">Hibernate</a> trimming away
all frames up to first <a href="http://hibernate.org/">Hibernate</a> method call.</p>
<p>Below is command to do this.</p>
<pre style="overflow:scroll"><code>sjk ssa -f --histo -tf **!**.jdbc -tt ogr.hibernate
</code></pre>
<p>Here is top of histogram produced by command</p>
<pre style="overflow:scroll"><code>Trc (%) Frm N Term (%) Frame
699506 87% 699506 0 0% org.hibernate.internal.SessionImpl.autoFlushIfRequired(SessionImpl.java:1204)
689370 85% 689370 10 0% org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
676524 84% 676524 0 0% org.hibernate.event.internal.DefaultAutoFlushEventListener.onAutoFlush(DefaultAutoFlushEventListener.java:58)
675136 84% 675136 0 0% org.hibernate.internal.SessionImpl.list(SessionImpl.java:1261)
573836 71% 573836 4 0% org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:264)
550968 68% 550968 1 0% org.hibernate.event.internal.AbstractFlushingEventListener.flushEverythingToExecutions(AbstractFlushingEventListener.java:99)
533892 66% 533892 132 0% org.hibernate.event.internal.AbstractFlushingEventListener.flushEntities(AbstractFlushingEventListener.java:227)
381514 47% 381514 882 0% org.hibernate.event.internal.AbstractVisitor.processEntityPropertyValues(AbstractVisitor.java:76)
271018 33% 271018 0 0% org.hibernate.event.internal.DefaultFlushEntityEventListener.onFlushEntity(DefaultFlushEntityEventListener.java:161)
</code></pre>
<p>Here is our suspect. We spent 87% of <a href="http://hibernate.org/">Hibernate</a>
time in <code>autoFlushIfRequired()</code> call
(and JDBC time is already excluded).</p>
<p>Using few commands we have narrowed down one performance bottleneck. Fixing it is another topic though.</p>
<p>In a case, I'm using as example,
CPU usage of application were reduced by 10 times.
Few problems found and addressed during that case were</p>
<ul>
<li>optimization of <a href="http://hibernate.org/">Hibernate</a> usage</li>
<li>facelets compilation caching were properly configure</li>
<li>work around performance bug in
Seam framework was implemented</li>
<li>JSF layouts were optimized to reduce
number of Seam injections / outjections</li>
</ul>
<h3>Limitations of this approach</h3>
<p>During statistical analysis of stack traces you deal with wallclock time, you cannot guest real CPU time using this method. If CPU on host is saturated, your number will be skewed by the threads idle time due to CPU starvation.</p>
<p>Normally you can get stack trace only at JVM <a href="/2012/10/safepoints-in-hotspot-jvm.html">safepoints</a>.
So if some methods are inlined by JIT compiler,
they may never appear at trace even
if they are really busy.
In other words, tip of stack trace may be skewed
by JIT effects.
Practically, it was never an obstacle for me,
but you should be keep in mind possibility
of such effect.</p>
<h3>What about flame graphs?</h3>
<p>Well, despite being not so useful,
they look good on presentations.
Support for flame graphs was added to <a href="https://github.com/aragozin/jvm-tools">SJK</a> recently.</p>
<div style="background-color: #FFC; border: solid; border-width: 2px; border-color: #822; margin: 0; padding: 0.5em">
<h3>Update</h3>
<p>After some time, I've found my self using flame graphs very actively. Yes, for certain situation this type of visualization doesn't make sense, but as a first bird eye look at the problem flame graphs are indispensable.</p>
</div>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-91450144251295365432015-10-12T20:53:00.000+01:002015-10-12T20:53:42.797+01:00Does Linux hate Java?<div class="separator" style="clear: both; text-align: center;"><a imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiuhGUtaLbQijAiHrM5hzIfN_kbdkG0SQXbeSwJTB-ZtEEeJ4JUOD0ddJk_XFcXepuFqip7yQtby49m-aam9lvBSv11lV82-Fevn6IM79sXuAl7enerRBF5uAgzh4a6sL1xp_5HqCdXST7u/s1600/angry_linux.png" /></a></div>
<p>
Recently, I have discovered a fancy bug affecting few version of Linux kernel. Without any warnings JVM just hangs in GC pause forever. Root cause is a improper memory access in kernel code. <a href="https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64">This post by Gil Tene</a> gives a good technical explanation with deep emotional coloring.
</p>
<p>
While this bug is not JVM specific, there are few other multithreaded processes you can find on typical Linux box.
</p>
<p>
This recent bug make me remember few other cases there Linux screws Java badly.
</p>
<h4>Transparent huge pages</h4>
<p>
<a href="https://www.kernel.org/doc/Documentation/vm/transhuge.txt">Transparent huge pages</a> feature was introduced in 2.6.38 version of kernel. While it was intended to improve performance, a lot of people reports negative effects related to this feature, especially for memory intensive processes such as JVM and some database engines.
<li><a href="https://blogs.oracle.com/linux/entry/performance_issues_with_transparent_huge">Oracle - Performance Issues with Transparent Huge Pages</a></li>
<li><a href="http://structureddata.org/2012/06/18/linux-6-transparent-huge-pages-and-hadoop-workloads/">Transparent Huge Pages and Hadoop workloads</a></li>
<li><a href="https://dzone.com/articles/why-tokudb-hates-transparent">Why TokuDB Hates Transparent Huge Pages</a></li>
</p>
<h4>Leap seconds bug</h4>
<p>
Famous <a href="http://www.datastax.com/dev/blog/linux-cassandra-and-saturdays-leap-second-problem">leap second bug</a> in Linux has produced a whole plague across data centers in 2012. Java and MySQL were affected most badly. What a common between Java and MySQL, both are using threads extensively.
</p>
<p>
So, Linux, could you be a little more gentle with Java, please ;)
</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.comtag:blogger.com,1999:blog-7735872642513631302.post-26985405012418682852015-08-04T05:52:00.000+01:002015-08-04T05:52:31.904+01:00SJK - missing link in Java profiling tool chain<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi09aFa2gPI0fABc2o8EqFXQyr2yYbN19eoV92iyEEirNtW8N5KffJ0-TEnyDa7z53dNfS3MVus4YlCtDPGICALSdgv13r4yu6a2F3IFbXL4JQsUGFAGjgGKjufv_fI_vuBmbDsFm4QXo9N/s1600/link.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi09aFa2gPI0fABc2o8EqFXQyr2yYbN19eoV92iyEEirNtW8N5KffJ0-TEnyDa7z53dNfS3MVus4YlCtDPGICALSdgv13r4yu6a2F3IFbXL4JQsUGFAGjgGKjufv_fI_vuBmbDsFm4QXo9N/s1600/link.png" /></a></div>
<p>Sometimes it just happens. You have a bloated Java application at your hand and it does not perform well. You may have built this application yourself or just got it as it is now. It doesn't matter, thing is - you do not have a slightest idea what is wrong here.</p>
<p>Java ecosystem have abundance of diagnostic tools (thank for interfaces exposed at JVM itself), but they are mostly focused on some specific narrow kinds of problems. Despite calling themselves intuitive, they assume you have a lot of background knowledge about JVM and profiling techniques. Honestly, even seasoned Java (I'm speaking for myself here) developer can feel lost first time looking at <a href="https://www.ej-technologies.com/products/jprofiler/overview.html">JProfiler</a>, <a href="https://www.yourkit.com/">YourKit</a> of <a href="http://www.oracle.com/technetwork/java/javaseproducts/mission-control/java-mission-control-1998576.html">Mission Control</a>.</p>
<p>If you have a performance problem at your hand, first you need is to classify problem: is it in Java or database or somewhere else? is CPU or memory kind of problem? Once you know what kind of problem you have, you can choose next diagnostic approach consciously.</p>
<h4>Are we CPU bound?</h4>
<p>One of first thing you would naturally do is to check CPU usage of your process. OS can show you process CPU usage. Which is useful, but the next question is which threads are consuming it. OS can show you threads usage too, you can even get OS IDs for your Java threads using <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html">jstack</a> and correlate them ... manually (sick).</p>
<p>A simple tool showing CPU usage per Java thread is the thing I wanted badly for the years.</p>
<p>Surprisingly, all information is already in JMX Threading MBean. All is left is to do trivial math and report per thread CPU usage. So I just did it and <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command">ttop</a> command become first in <a href="https://github.com/aragozin/jvm-tools">SJK</a> tool set.</p>
<p>Besides CPU usage JMX have another invaluable metric - per thread allocation counter.</p>
<p>Collecting information from JMX is safe and can be done on live application instance (in case if you do not have JMX port open, SJK can connect using process ID).</p>
<p>Below is example of <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command">ttop</a> command output.</p>
<pre style="overflow:scroll"><code>2014-10-01T19:27:22.825+0400 Process summary
process cpu=101.80%
application cpu=100.50% (user=86.21% sys=14.29%)
other: cpu=1.30%
GC cpu=0.00% (young=0.00%, old=0.00%)
heap allocation rate 123mb/s
safe point rate: 1.5 (events/s) avg. safe point pause: 0.14ms
safe point sync time: 0.00% processing time: 0.02% (wallclock time)
[000037] user=83.66% sys=14.02% alloc= 121mb/s - Proxy:ExtendTcpProxyService1:TcpAcceptor:TcpProcessor
[000075] user= 0.97% sys= 0.08% alloc= 411kb/s - RMI TCP Connection(35)-10.139.200.51
[000029] user= 0.61% sys=-0.00% alloc= 697kb/s - Invocation:Management
[000073] user= 0.49% sys=-0.01% alloc= 343kb/s - RMI TCP Connection(33)-10.128.46.114
[000023] user= 0.24% sys=-0.01% alloc= 10kb/s - PacketPublisher
[000022] user= 0.00% sys= 0.10% alloc= 11kb/s - PacketReceiver
[000072] user= 0.00% sys= 0.07% alloc= 22kb/s - RMI TCP Connection(31)-10.139.207.76
[000056] user= 0.00% sys= 0.05% alloc= 20kb/s - RMI TCP Connection(25)-10.139.207.76
[000026] user= 0.12% sys=-0.07% alloc= 2217b/s - Cluster|Member(Id=18, Timestamp=2014-10-01 15:58:3 ...
[000076] user= 0.00% sys= 0.04% alloc= 6657b/s - JMX server connection timeout 76
[000021] user= 0.00% sys= 0.03% alloc= 526b/s - PacketListener1P
[000034] user= 0.00% sys= 0.02% alloc= 1537b/s - Proxy:ExtendTcpProxyService1
[000049] user= 0.00% sys= 0.02% alloc= 6011b/s - JMX server connection timeout 49
[000032] user= 0.00% sys= 0.01% alloc= 0b/s - DistributedCache
</code></pre>
<p>Besides CPU and allocation, it also collect "true" GC usage and safe point statistics. Later two metrics are not available via JMX so they are available only for process ID connections.</p>
<p>CPU usage picture will give you good insight what to do next: should you profile your Java hot spots or all time is spent waiting result from DB.</p>
<h4>Garbage analysis</h4>
<p>Another common class of Java problems is related to garbage collection. If this is a case GC logs is first place to look at. </p>
<p>Do you have them enabled? If not, that is not a big deal, you can enable GC logging on running JVM process using <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jinfo.html">jinfo</a> command. You can also use <a href="https://github.com/aragozin/jvm-tools">SJK</a>'s <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#gc-command">gc</a> command to peek GC activity for your java process (it is not as full as GC logs tough).</p>
<p>If GC logs confirm what GC is causing you problems, next step is to identify where that garbage comes from.</p>
<p>Commercial profilers are good at memory profiling, but this kind of analysis slows down target application dramatically. </p>
<p><a href="http://www.oracle.com/technetwork/java/javaseproducts/mission-control/java-mission-control-1998576.html">Mission Control</a> stands out of pack here, it can profile by sampling TLAB allocation failures. This technique is cheap and generally produce good results, though it is inherently biased and may mislead you sometimes.</p>
<p>For long time <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html">jmap</a> and class histogram were main memory profiling instrument for me. Class histogram is simple and accurate.</p>
<p>In <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command">SJK] toolset, I have augmented vanila <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html">jmap</a> command a little to make it more useful (SJK's [hh</a> command).</p>
<p>Beware that <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html">jmap</a> (and thus <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command">hh</a> command) required Stop the World pause on target JVM while heap is being walked, so it may not be a good idea to execute it against live application under load.</p>
<p><strong>Dead heap histogram</strong> is calculated as difference between object population before and after forced GC (using <a href="http://docs.oracle.com/javase/7/docs/technotes/tools/share/jmap.html">jmap</a> class histogram command under hood).</p>
<p><strong>Dead young heap histogram</strong> enforces full GC then wait 10 seconds (by default) then produce dead object histogram by technique describe above. Thus you see a summary freshly allocated garbage.</p>
<p>This methods cannot not tell you where in your code that garbage was allocated (this is job for <a href="http://www.oracle.com/technetwork/java/javaseproducts/mission-control/java-mission-control-1998576.html">Mission Control</a> et al ). Though, if you know that is your top garbage objects, you may already know there they are allocated.</p>
<p><a href="https://github.com/aragozin/jvm-tools">SJK</a> have a <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md">few more tools</a> but these two <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#ttop-command">ttop</a> and <a href="https://github.com/aragozin/jvm-tools/blob/master/sjk-core/COMMANDS.md#hh-command">hh</a> are always in front lines when I need to tackle another performance related problem.</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com1tag:blogger.com,1999:blog-7735872642513631302.post-15171388874620410042015-02-23T12:14:00.001+00:002021-09-16T05:47:46.586+01:00So, you have dumped 150 GiB of JVM heap, now what?<p>150 GiB worth of JVM heap dump is laying on hard drive
and I need analyze specific problem detected in that process.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrPurvmYb9NbioeV50qXfxg0vuptOxY4vdwnIbymQWUP0t5zVbGC-zaJ5ozd-x7hVLFHrlnwlwG9_QPWaWD_O5rMCo2hsyJQ5c55yv7tgZJIfwm401rrfLNdnX-HbCaWXlbkARxSkOGmBJ/s1600/heapdump.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhrPurvmYb9NbioeV50qXfxg0vuptOxY4vdwnIbymQWUP0t5zVbGC-zaJ5ozd-x7hVLFHrlnwlwG9_QPWaWD_O5rMCo2hsyJQ5c55yv7tgZJIfwm401rrfLNdnX-HbCaWXlbkARxSkOGmBJ/s200/heapdump.jpg" /></a></div>
<p>This is a dump of proprietary hybrid of in-memory RDBMS
and CEP system, I'm responsible for.
All data are stored in Java heap, so heap size of some
installation is huge (400 GiB heap is largest to the date).</p>
<p>Problem of analyzing huge heap dumps were on my radar
for some time, so I wasn't unprepared.</p>
<p>To be honest, I haven't tried to open this file
in Eclipse Memory Analyzer, but I doubt it could handle it.</p>
<p>For me, for some time, most useful tool in heap analyzers
was JavaScript based queries. Clicking through millions
objects is not fun. It is much better to walk object graph
with code, not with mouse.</p>
<p>Heap dump is just a serialized graph of objects,
my goal is to extract specific information from this graph.
I do not really need a fancy UI, API to heap graph would be even better.</p>
<p>How I can analyze heap dump programmatically?</p>
<p>I have started my research with NetBeans profiler (it was a year ago).
NetBeans is open source and have visual heap dump analyzer
(same component is also used in JVisualVM). It turns out,
what heap dump processing code is separate module and API
it provides is suitable for custom analysis logic.</p>
<p>NetBeans heap analyzer has a critical limitation, though.
It is using temporary file to keep internal index of heap dump.
This file is typically around 25% of heap dump itself.
But most important it takes a time to build this file,
before any query to heap graph is possible.</p>
<p>After taking better look, I decided, I could remove this temporary file.
I have forked library (<a href="https://github.com/aragozin/heaplib/tree/master/hprof-heap">my fork is available at GitHub</a>).
Some functions was lost together with temporary file
(e.g. backward reference traversing),
but they are not need for my kind of tasks.</p>
<p>Another important change to original library,
was implementing HeapPath.
<br/>
<a href="https://github.com/aragozin/heaplib/blob/master/hprof-heap/HEAPPATH.md">HeapPath</a> is an expression language for object graph.
It is useful both as generic predicate language in graph traversal
algorithms and as simple tool to extract data from object dump.
HeapPath automatically converts strings, primitives and few other
simple types from heap dump structures to normal objects.</p>
<p>This library proved itself very useful in our daily job.
One of its application was memory reporting tool for our
database/CEP system which automatically report actual memory
consumption of every relational transformation node
(there could be few hundred nodes in single instance).</p>
<p>For interactive exploring API + Java is not best set of tools, tough.
But it lets me do my job (and 150 GiB of dump leave me no alternatives).</p>
<p>Should I be adding some JVM scripting language to the mix ...</p>
<p>BTW: Single pass through 150 GiB is taking about 5 minutes.
Meaning full analysis usually employ multiple iterations,
but processing times are fairly reasonable even
for that heap size.</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com19tag:blogger.com,1999:blog-7735872642513631302.post-81340022178380704482015-02-11T21:19:00.001+00:002015-02-11T21:25:06.465+00:00Binary search - is it still most optimal?<p>If you have a sorted collection of elements,
how would you find index of specific value?
<br/>
"Binary search" is likely to be your answer.
<br/>
Algorithms theory is teaching us what binary search
is most optimal algorithm for this task with log(N) complexity.
<br/>
Well, hash table can do better, if you need to find key by exact match.
In many cases, though, you have reasons to have your collection sorted, not hashed.</p>
<p>On my job, I'm working on sophisticated in-memory database
tailored for streaming data processing. We have a lot of places
where we deal with sorted collection of integers (data row references, etc).</p>
<p>Algorithms theory is good, but in reality there are things
like <b>cache hierarchy</b>, <b>branch prediction</b>, <b>super scalar execution</b>
which may skew performance at edge cases.</p>
<p>Question is - where lie borders between reality ruled
by CPU quirks and lawful space of classic algorithms theory?</p>
<p>If you have a doubt - do an experiment.</p>
<p>Experiment is simple: I'm generating a large number of sorted arrays
of 32 bit integers. When I search random key in random array multiple times.
In each experiment average size of array is fixed.
Large number of arrays used to ensure cold memory access.
Average time search time is measured.</p>
<p>All code written in Java and measured using <a href="http://openjdk.java.net/projects/code-tools/jmh/">JMH</a> tool.</p>
<p>Participants are</p>
<ul>
<li>Binary search - <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Arrays.html#binarySearch" title="int[],%20int"><code>java.util.Arrays.binarySearch()</code></a></li>
<li>Linear search - simple loop over array until key is found</li>
<li>Linear search 2 - looping over every second element in array, if greater key is found, check <code>i - 1</code> index too</li>
</ul>
<p>X axis is average array length
<br/>
Y axis is average time of single search in microseconds
<br/>
Measurments have been done on 3 different types CPU.
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyoCzoAky60LloVXPP67AWtY7aKlD1kG7rPhnjynQyJ3ffarj1bg7wL0GY7sBXUpwCChQLBE_9NMxMUJWnE926xFUUiZqxqYsp6kOeD9zR3O717DfiOAd3R_3TjbU_eh-LwyNviV0fkufA/s1600/bin-chart-1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhyoCzoAky60LloVXPP67AWtY7aKlD1kG7rPhnjynQyJ3ffarj1bg7wL0GY7sBXUpwCChQLBE_9NMxMUJWnE926xFUUiZqxqYsp6kOeD9zR3O717DfiOAd3R_3TjbU_eh-LwyNviV0fkufA/s400/bin-chart-1.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgc4w_-iV_koxkXyG_UQbGLksyi7VsGzGYqtk7nkE-zrfTQBLuNmgWLwoyfrAbiz0zwyoLkrfPekC_ALnDZXNKZo9-QO6KrYG4KdZpEE5_8Yv-TP-KjlvyQfVjLWAgROodxjdkS_p69HaH/s1600/bin-chart-2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgc4w_-iV_koxkXyG_UQbGLksyi7VsGzGYqtk7nkE-zrfTQBLuNmgWLwoyfrAbiz0zwyoLkrfPekC_ALnDZXNKZo9-QO6KrYG4KdZpEE5_8Yv-TP-KjlvyQfVjLWAgROodxjdkS_p69HaH/s400/bin-chart-2.png" /></a></div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0h7jOLRO7sP4Jof_GgONuihYHwjKHzfMpLRnG_IMBH5v4ePOAYHDPPhLxjgopIE4HYE-yiqpKBCUK8PEjwBalY2AhiFaFeuwHW6ZQz2P069NYA_eqb4e51bDqyfWqw5eMDLG39V_x_Oii/s1600/bin-chart-3.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg0h7jOLRO7sP4Jof_GgONuihYHwjKHzfMpLRnG_IMBH5v4ePOAYHDPPhLxjgopIE4HYE-yiqpKBCUK8PEjwBalY2AhiFaFeuwHW6ZQz2P069NYA_eqb4e51bDqyfWqw5eMDLG39V_x_Oii/s400/bin-chart-3.png" /></a></div>
<p>Results speak for themselves.</p>
<p>I was surprised a little, as I were expecting
binary search to outperform linear at length of 32 or 64,
but it seems that modern processors are very good
at optimizing linear memory access.</p>
<p>Provided that 8 - 128 is a practical range for BTree
like structures, I will likely to reconsider some of
data structures used in our database.</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-83226075690468280202014-03-01T13:00:00.000+00:002014-03-14T03:35:54.015+00:00Tech Talk: "Casual" mass parallel data processing in Java<p>
On March 1st, I was speaking on <a href="http://www.belarusjug.org/events/nosql-meetup">NoSQL day</a> meet up in Minsk, Belarus.
</p>
<p>
<b>"Casual" mass parallel data processing in Java</b> may sound like a weird topic. Never less, sometimes you have to get job done and setting up computation grid infrastructure may not be a shortest path.
</p>
<p>
Below is slide deck from event.
<br/>
<iframe src="http://www.slideshare.net/slideshow/embed_code/32299175" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px 1px 0; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>
</p>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0Minsk, Belarus53.9 27.56666670000004253.6005025 26.921219700000041 54.1994975 28.212113700000042tag:blogger.com,1999:blog-7735872642513631302.post-66800190654075649002013-12-17T19:00:00.000+00:002013-12-23T06:19:59.113+00:00TechTalk: Java Garbage Collection - Theory and Practice<p>
Below are slide decks for open event held in Moscow Technology Center of Deutsche Bank.
</p>
<p>
Topic of event was garbage collection in JVM.
</p>
<p>Part 1 by Alexey Ragozin
<iframe src="http://www.slideshare.net/slideshow/embed_code/29377749" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe>
</p>
<p>Part 2 by Alexander Ashitkin
<iframe src="http://www.slideshare.net/slideshow/embed_code/29377811" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe>
</p>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-16861339864483167192013-12-12T15:00:00.000+00:002014-03-14T03:55:41.494+00:00TechTalk: Virtualizing Java in Java<p>
On 12th December, I was speaking at <a href="http://jug.ru">JUG</a> in Saint-Petersburg, Russia.
</p>
<p>
It was a long talk about using <a href="/2013/01/remote-code-execution-in-java-made.html">NanoCloud</a>.
</p>
<p>
Below is video
<br/>
<iframe width="427" height="300" src="//www.youtube.com/embed/F9uAJ4o5zls" frameborder="0" allowfullscreen></iframe>
<iframe width="427" height="300" src="//www.youtube.com/embed/EcoJrYJczqc" frameborder="0" allowfullscreen></iframe>
<br/>
and slide deck from event
<br/>
<iframe src="http://www.slideshare.net/slideshow/embed_code/29217944" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC; border-width:1px 1px 0; margin-bottom:5px; max-width: 100%;" allowfullscreen> </iframe>
</p>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0Saint Petersburg, Russia60.076238300000007 30.12138290000007159.058059800000009 27.539595900000073 61.094416800000005 32.70316990000007tag:blogger.com,1999:blog-7735872642513631302.post-20915939779014385062013-12-05T21:10:00.000+00:002013-12-06T09:22:38.705+00:00Coherence SIG - Filtering 100M objects in cache<p>Today I was speaking on Coherence SIG event in London.</p>
<p>
My topic was "Filtering 100M objects. What can go wrong?". It was a story of solving particular problem and obstacles we have encountered. One noticeable thing about this project - out team was using Performance Test Driven Development approach.
</p>
<p>
We have started with simplest solution, then were focusing on problem identified by testing.
</p>
<p>Slide deck from presentation is below.</p>
<iframe src="http://www.slideshare.net/slideshow/embed_code/28955745" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-27706693804131467462013-11-14T22:01:00.000+00:002014-02-27T16:15:19.557+00:00Coherence 101 - Soothing the Guardian<p><a href="http://docs.oracle.com/middleware/1212/coherence/COHDG/api_guardian.htm">Guardian</a> was introduced in Oracle Coherence 3.5 as uniform and reliable mean to detect and report various stalls and hangs on data grid members.
In addition to monitoring internal components of Coherence, Guardian has an API accessible for application developer.</p>
<p>While out-of-box Guardian does its job pretty well, there are few aspects you can improve. </p>
<p>There 3 techniques to work with Coherence Guardian. Your can choose to employ all of them or just few.</p>
<h4>Guardian heartbeats</h4>
<p>Guardian is using heartbeat mechanics to detect thread stalls.
Internally Coherence code explicitly heartbeat in appropriate points in code.
Application code could use similar technique if long execution time is expected.
CacheStores are good example of this.</p>
<ul>
<li><a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/GuardSupport.html#heartbeat__"><code>GuardSupport.heartbeat()</code></a> – sends normal heartbeat</li>
<li><a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/GuardSupport.html#heartbeat_long_"><code>GuardSupport.heartbeat(long)</code></a> – allows you to pass expected time till next heartbeat (e.i. if you expect that SQL query to take several minutes, you could prevent log warning by passing reasonably long timeout before execution SQL statement)</li>
</ul>
<h4>Implementing guardable</h4>
<p>Normally the guardian would try to <em>"recover"</em> thread if no heartbeats were received during timeout (eigther specified in configuration or last <code>heartbeat(...)</code> call).
<br/>
This behavior can be overridden though.
Application can register own <a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/Guardable.html">Guardable</a> and temporary disable monitoring of current thread.
Below is a code snippet which wraps cache loader operations in <a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/Guardable.html">Guardable</a> preventing thread interruption
(default way to <em>"recover"</em> worker thread).</p>
<pre style="overflow:scroll"><code>public static class GuardianAwareCacheLoader implements CacheLoader {
private CacheLoader loader;
public GuardianAwareCacheLoader(CacheLoader loader) {
this.loader = loader;
}
@Override
public Object load(Object key) {
GuardContext ctx = GuardSupport.getThreadContext();
if (ctx != null) {
KeyLoaderGuard guard = new KeyLoaderGuard(Collections.singleton(key));
GuardContext klg = ctx.getGuardian().guard(guard);
GuardSupport.setThreadContext(klg);
}
try {
return loader.load(key);
}
finally {
if (ctx != null) {
GuardContext klg = GuardSupport.getThreadContext();
GuardSupport.setThreadContext(ctx);
klg.release();
}
}
}
@Override
@SuppressWarnings({ "rawtypes", "unchecked" })
public Map loadAll(Collection keys) {
GuardContext ctx = GuardSupport.getThreadContext();
if (ctx != null) {
KeyLoaderGuard guard = new KeyLoaderGuard(keys);
GuardContext klg = ctx.getGuardian().guard(guard);
GuardSupport.setThreadContext(klg);
// disable current context
ctx.heartbeat(TimeUnit.DAYS.toMillis(365));
}
try {
return loader.loadAll(keys);
}
finally {
if (ctx != null) {
GuardContext klg = GuardSupport.getThreadContext();
GuardSupport.setThreadContext(ctx);
klg.release();
// reenable current context
ctx.heartbeat();
}
}
}
}
public static class KeyLoaderGuard implements Guardable {
Collection<Object> keys;
GuardContext context;
public KeyLoaderGuard(Collection<Object> keys) {
this.keys = keys;
}
@Override
public GuardContext getContext() {
return context;
}
@Override
public void setContext(GuardContext context) {
this.context = context;
}
@Override
public void recover() {
System.out.println("got RECOVER signal");
context.heartbeat();
}
@Override
public void terminate() {
System.out.println("got TERMINATE signal");
}
@Override
public String toString() {
return "KeyLoaderGuard:" + keys;
}
}
</code></pre>
<p>Using custom <a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/Guardable.html">Guardable</a> provides following advantages:</p>
<ul>
<li>Additional context information is available and is logged for custom <a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/Guardable.html">Guardable</a> (e.g. SQL statement causing problems).</li>
<li>Custom code can choose how to react on timeout. You can choose to continue or try to cancel request somehow (e.g. closing JDBC connection).</li>
</ul>
<h4>Custom service failure policy</h4>
<p><a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/ServiceFailurePolicy.html">Service failure policy</a> is responsible for reaction on guardian timeouts and critical service failures.
Reaction is configurable, but for standalone Coherence processes I prefer to override this policy.</p>
<p>Below is example of service failure policy, which I find more reasonable for dedicated Coherence nodes.</p>
<pre style="overflow:scroll"><code>public class ServiceFailureHandler implements ServiceFailurePolicy {
private final static Logger LOGGER = LogManager.getLogger(ServiceFailureHandler.class);
@Override
public void onGuardableRecovery(Guardable guarable, Service service) {
LOGGER.warn("Soft timeout detected. Service: " + service.getInfo().getServiceName() + " Task: " + guarable);
guarable.recover();
}
@Override
public void onGuardableTerminate(Guardable guarable, Service service) {
LOGGER.error("Hard timeout detected. Service: " + service.getInfo().getServiceName()
+ " Task: " + guarable + ". Node will be terminated.");
halt();
}
@Override
public void onServiceFailed(Cluster cluster) {
LOGGER.error("Service failure detected. Node will be terminated.");
halt();
}
private static void halt() {
try {
ThreadUtil.logThreadDump(LOGGER);
LogManager.shutdown();
System.out.flush();
System.err.flush();
} finally {
Runtime.getRuntime().halt(1);
}
}
}
</code></pre>
<p>Compared to standard policy it has following advantages:</p>
<ul>
<li>In case of service failure processes would be terminated quickly (without waiting for shutdown hooks etc).
In my case, process would be restarted by external watch dog immediately then.</li>
<li>"Soft timeouts" will not pollute log with thread dumps.
The only thread dump will be logged just before termination of process (which is especially important in case of implementing custom <a href="http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/net/Guardable.html">Guardable</a>).</li>
</ul>
<h4>Conclusion</h4>
<p>Integrating you application with Coherence <a href="http://docs.oracle.com/middleware/1212/coherence/COHDG/api_guardian.htm">Guardian</a> doesn't require too much code, but could make your logs more clear and troubleshooting less painful.
While it will not make your application work faster, it could save hours of digging though logs.</p>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-89003103137650216262013-11-06T20:38:00.001+00:002016-10-25T04:08:34.465+01:00HotSpot JVM garbage collection options cheat sheet (v3) <div>
<h4><a href="/2016/10/hotspot-jvm-garbage-collection-options.html">Updated version is available!</a></h4>
</div>
<p>
Two years ago I have published <a href="/2011/09/hotspot-jvm-garbage-collection-options.html">cheat sheet for garbage collection options in HotSpot JVM</a>.
</p>
<p>
Recently I decided give that work some refreshing and today I'm publishing first HostSpot JVM options ref card covering generic GC options and CMS tuning. (G1 have got a plenty of tuning options during last two years so it will have dedicated ref card).
</p>
<p>Content-wise GC log rotation options have been added and few esoteric CMS diagnostic options have been removed.</p>
<p><a href="https://dl.dropboxusercontent.com/u/1704203/HotSpot%20JVM%20GC%20options%20cheatsheet%20-%20A4%201%2B2.pdf">Two page PDF version</a></p>
<p><a href="https://dl.dropboxusercontent.com/u/1704203/HotSpot%20JVM%20GC%20options%20cheatsheet%20-%20A3%201%2B2.pdf">Single page PDF version</a></p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMHX0jXO_FsDkIMh8aibqvpw253DL5cKEnC1-ORAG7I-cWaU_BHgBnUvt37gfaFjAV_r1ASfp63jD1fThpZBfCR0ZZIWaBFCzB-dQvmBaGbXWHPO8xm9kdQ9RZYjR9WuYsxzbI_GTM6ZuP/s1600/HotSpot+JVM+GC+options+cheatsheet+-+A3+1+2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgMHX0jXO_FsDkIMh8aibqvpw253DL5cKEnC1-ORAG7I-cWaU_BHgBnUvt37gfaFjAV_r1ASfp63jD1fThpZBfCR0ZZIWaBFCzB-dQvmBaGbXWHPO8xm9kdQ9RZYjR9WuYsxzbI_GTM6ZuP/s1600/HotSpot+JVM+GC+options+cheatsheet+-+A3+1+2.png" width="90%"/></a></div>
<div>
<script type="text/javascript">
var
dzone_url =
'http://blog.ragozin.info/2013/11/hotspot-jvm-garbage-collection-options.html';
</script><br />
<script type="text/javascript">
var dzone_title = 'HotSpot JVM GC options cheat sheet';
</script><br />
<script type="text/javascript">
var dzone_style = '2';
</script><br />
<script language="javascript" src="http://widgets.dzone.com/links/widgets/zoneit.js">
</script>
</div>Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com7tag:blogger.com,1999:blog-7735872642513631302.post-43668091761565171732013-10-29T20:00:00.000+00:002013-10-31T02:37:48.412+00:00JVM deep dive at HighLoad++ 2013 (Moscow)<p>
Today was speaking at HighLoad++ 2013 Moscow. I had two presentation covering deep internals of JVM. One about JIT compilation and other concerning pauseless garbage collection algorithms.
</p>
<p>Slide decks are below (in Russian)</p>
<div style="margin-bottom:5px"> <strong> <a href="https://www.slideshare.net/aragozin/jit-java" title="JIT-компиляция в виртуальной машине Java" target="_blank">JIT-компиляция в виртуальной машине Java (HighLoad++ 2013)</a> </strong></div>
<iframe src="http://www.slideshare.net/slideshow/embed_code/27759912" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe>
<div style="margin-bottom:5px"> <strong> <a href="https://www.slideshare.net/aragozin/c-java" title="Cборка мусора в Java без пауз (HighLoad++ 2013)" target="_blank">Cборка мусора в Java без пауз (HighLoad++ 2013)</a> </strong></div>
<iframe src="http://www.slideshare.net/slideshow/embed_code/27760442" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe> Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com5tag:blogger.com,1999:blog-7735872642513631302.post-41601479362621584972013-10-25T20:00:00.000+01:002013-10-31T02:30:42.258+00:00Performance Test Driven Development (CEE SECR 2013 Moscow)<p>
Today I was speaking at <a href="http://2013.secr.ru/lang/en/program/agenda">CEE SECR 2013 at Moscow</a>.
</p>
<p>Below is a slide deck from presentation (in Russian)</p>
<div style="margin-bottom:5px"> <strong> <a href="https://www.slideshare.net/aragozin/cee-secr2013performancetestdrivendevelopment" title="Performance Test Driven Development (CEE SERC 2013 Moscow)" target="_blank">Performance Test Driven Development</a></strong></div>
<iframe src="http://www.slideshare.net/slideshow/embed_code/27759366" width="427" height="356" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" style="border:1px solid #CCC;border-width:1px 1px 0;margin-bottom:5px" allowfullscreen> </iframe> Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-162867532861397352013-09-10T19:00:00.000+01:002013-09-10T19:00:05.858+01:00Coherence 101 - EntryProcessor traffic amplification<p>Oracle Coherence data grid has a powerful tool for inplace data manipulation - <a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/InvocableMap.EntryProcessor.html">EntryProcessor</a>.
Using entry processor you can get reasonable atomicity guarantees without locks or transactions
(and without drastic performance fees associated).</p>
<p>One good example of entry processor would be built-in <a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/processor/ConditionalPut.html">ConditionalPut</a> processor,
which will verify certain condition before overriding value. This, in turn, could be used for
implementing optimistic locking and other patterns.</p>
<p><a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/processor/ConditionalPut.html">ConditionalPut</a> could accept only one value, but <a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/processor/ConditionalPutAll.html">ConditionalPutAll</a> processor is also available.
<a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/processor/ConditionalPutAll.html">ConditionalPutAll</a> accepts a map of key/values. Using it, we can update multiple cache entries with single call to <a href="http://docs.oracle.com/html/E22843_01/com/tangosol/net/NamedCache.html">NamedCache</a> API. </p>
<p>But there is one caveat.</p>
<p>We have placed values for <strong><em>all keys</em></strong> in <strong><em>single map instance</em></strong> inside of entry processor object.
On the other side, in distributed cache <strong><em>keys are distributed</em></strong> across different processes.
<br/>
How right values would be transferred to right keys? </p>
<p>Answer is simple - every node, owning at least one of keys to be updated, will receive a copy of whole map of values.
<br/>
In other words, in mid size cluster (i.e. 20 nodes) you may actually transfer <strong><em>20 times more data over network</em></strong> than really needed.</p>
<p>Modern networks are quite good and you may not notice this traffic amplification effect for some time (as long as you network bandwidth can handle it).
But once traffic has reached network limit things are starting to break apart. </p>
<p>Coherence TCMP protocol is very aggressive at grabbing as much of network bandwidth as it can, so other communications protocols will likely perish first.
<br/>
JDBC connections are likely victim of bandwidth shortage.
<br/>
Coherence*Extend connection may also suffer (it is using TCP) and proxy nodes may start to fail in unusual ways (e.g. with OutOfMemoryError due transmission backlog overflow).</p>
<p>This problem may be hard to diagnose. TCP is much more vulnerable to bandwidth shortage and you will be kept distracted with TCP communication problems
while root cause is excessive TCMP cluster traffic. </p>
<p>Monitoring TCMP statistics (available via MBean) could give you an insight about network bandwidth consumption by TCMP and network health and help to find root cause.</p>
<p><em>Isolating TCMP in separate switch is also a good practice, BTW</em></p>
<h4>But how to fix it?</h4>
<h5>Manual data splitting</h5>
<p>Simple solution is to split keys set by owning nodes, and then invoke entry processor for each subset individually. Coherence API allows you to find node owning particular key.
<br/>
This approach is far from ideal though:</p>
<ul>
<li>it will not work for Extend clients,</li>
<li>you either have to process all subset sequentially or use threads to do several parallel calls to Coherence API,</li>
<li>splitting of key set complicates application logic.</li>
</ul>
<h5>Triggers</h5>
<p>Another option is relocating your logic from entry processor to trigger and replacing <code>invokeAll()</code> by <code>putAll()</code> (<code>putAll()</code> does not suffer from traffic amplification).
This solution is fairly good and fast, but has certain drawbacks too:</p>
<ul>
<li>it is less transparent (<code>put()</code> is not <em>just</em> <code>put()</code> now),</li>
<li>trigger is configured once for all cache operations (not just one <code>putAll()</code> call),</li>
<li>you can only have one trigger and it should handle all your data update needs.</li>
</ul>
<h5>Synthetic data keys</h5>
<p>Finally you can use <a href="https://github.com/gridkit/gridkit-coherence-toolkit/blob/master/cohkit/src/main/java/org/gridkit/coherence/misc/store/DataSplittingProcessor.java">DataSplittingProcessor</a> from <a href="https://github.com/gridkit/gridkit-coherence-toolkit">CohKit</a> project.
This utility class is using virtual cache keys to transfer data associated with keys, then it is using backing map API to access real entries.</p>
<p>This solution has its PROs and CONs too:</p>
<ul>
<li>good drop-in replacement for <a href="http://docs.oracle.com/html/E22843_01/com/tangosol/util/processor/ConditionalPutAll.html">ConditionalPutAll</a> and alike,</li>
<li>prone to deadlocks if running concurrently with other bulk updates (it is partially mitigated by sorting keys before locking).</li>
</ul>
<h4>Choosing right solution</h4>
<p>In practice I was using all three technique listed above. </p>
<p>Sometimes triggers fit overall cache design quite good.
<br/>
Sometimes manual data split has its advantages.
<br/>
And sometimes <a href="https://github.com/gridkit/gridkit-coherence-toolkit/blob/master/cohkit/src/main/java/org/gridkit/coherence/misc/store/DataSplittingProcessor.java">DataSplittingProcessor</a> is just right remedy for existing entry processors.</p>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com3tag:blogger.com,1999:blog-7735872642513631302.post-33341243316352816522013-09-09T04:30:00.000+01:002013-09-09T09:32:54.429+01:00SJK (JVM diagnostic/troubleshoting tools) is learning new tricks.<div>
<p><a href="https://github.com/aragozin/jvm-tools#swiss-java-knife">SJK</a> is small command line tool implementing number of helpful commands for JMV troubleshooting. Internally SJK is using same diagnostic APIs as standard JDK tools (e.g. jps, jstack, jmap, jconsole).</p>
<p>
Recently I've made few noteworthy additions to SJK package and would like to announce them here.
</p>
<h4>Memory allocation rates for Java threads</h4>
<p>
<code>ttop</code> command now displays memory allocation per thread and cumulative memory allocation for whole JVM process.
<br/>
Memory allocation rate is key information for GC tuning, in past I was using GC log to derive these numbers. On contrast, per thread allocation counters give you more precise information in real time.
<br/>
Process allocation rate is calculated by aggregating thread allocation rate.
</p>
<p>
<a href="https://github.com/aragozin/jvm-tools/edit/master/sjk-core/COMMANDS.md#ttop-command">more details about ttop</a>
</p>
<h4>Support for remote JMX connections</h4>
Historically SJK were using PID to connect to JVM's MBean server. Using PID does not require you to explicitly enable JMX in JVM's command line and offers you OS level security.
<br/>
Sometime you already have JMX port up and running (e.g. for other monitoring tools) and connection using host and port is more convenient.
<br/>
Now all JVM based commands (<code>ttop</code>, <code>gcrep</code>, <code>mx</code>, <code>mxdump</code>) support socket based JMX connections (with optional user/password security).
</p>
<h4>Invoking arbitrary MBean operation</h4>
<p>
New command (<code>mx</code>) allows to get/set arbitrary MBean attributes and call arbitrary MBean operations.
<br/>
This one is paralytically useful for scripting (I didn't find to invoke operation for custom MBean from command line, so I have added it to SJK).
</p>
<p>
<a href="https://github.com/aragozin/jvm-tools/edit/master/sjk-core/COMMANDS.md#mx-command">more details about ttop</a>
</p>
<p>
Code and binaries are available at GitHub
<br/>
<a href="https://github.com/aragozin/jvm-tools">https://github.com/aragozin/jvm-tools</a>
</p>
</div>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0tag:blogger.com,1999:blog-7735872642513631302.post-19392255243964007482013-07-28T18:00:00.000+01:002013-07-28T18:00:00.022+01:00Java GC in Numbers - Compressed OOPs<div>
<p>
Compressed OOPs (OOP – ordinary object pointer) is a technique reducing size of Java object in 64 bit environments. HotSpot wiki has a <a href="https://wikis.oracle.com/display/HotSpotInternals/CompressedOops">good article explaining details</a>. Downside of this technique is what address uncompressing is required before accessing memory referenced by compressed OOPs. Instruction set (e.g. x86) may support such addressing type directly, but still, additional arithmetic would affect processing pipeline of CPU.
</p>
<p>
Young GC involves a lot of reference walking, so its time is expected to be affected by OOPs compression.
</p>
<p>
In this article, I’m comparing young GC pause time for 64 bit HotSpot JVM with and without OOPs compression. Methodic from <a href="/2013/06/java-gc-in-numbers-parallel-young.html">previous article</a> is used and benchmark code is available at <a href="https://github.com/aragozin/jvm-tools/tree/master/ygc-bench">github</a>. There is one caveat though. With compressed OOPs size of object is smaller and same amount of heap could accommodate more objects. Benchmark is autoscaling number of entries to fill heap based entry footprint and old space size, thus with fixed old space size experiments with compression enabled have to deal with slightly larger number of objects (entry footprints are 288 uncompressed and 246 compressed).
</p>
<p>Chart below shows absolute young GC pause times.</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs9VSdBpdSHQhNNZd1NkkeqtyG2mVGQRrxa1SZmayxT3rrvgJqnhiQtcIbrVKTTM0GYRsUWuWYmu1PmJwKQIcD90bHUVcABtyp2W6O4XgmgEDK5vSR6iCSYEbfPqT6oVNEud4XQP-8hwpP/s1600/coops_j7_abs.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhs9VSdBpdSHQhNNZd1NkkeqtyG2mVGQRrxa1SZmayxT3rrvgJqnhiQtcIbrVKTTM0GYRsUWuWYmu1PmJwKQIcD90bHUVcABtyp2W6O4XgmgEDK5vSR6iCSYEbfPqT6oVNEud4XQP-8hwpP/s400/coops_j7_abs.png" /></a></div>
<p>As you can see, compressed case is consistently slower, which is not a surprise.</p>
<p>
Another char is showing relative difference between two cases (compressed GC pause mean / uncompressed GC pause mean for same case).
</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4czU1ixFkJC_FQWPbWxofJ3tJkfBraDjEzq982GKylHVFkH5iSs9RS8lLrWxqoHVcyoZSKchFtoxhFJJOSm0lnkGIax-US4aLztm8mEgy9uPBDXWoOOqYXLun2jqkchUSufADKCrsHv2s/s1600/coops_j7_rel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4czU1ixFkJC_FQWPbWxofJ3tJkfBraDjEzq982GKylHVFkH5iSs9RS8lLrWxqoHVcyoZSKchFtoxhFJJOSm0lnkGIax-US4aLztm8mEgy9uPBDXWoOOqYXLun2jqkchUSufADKCrsHv2s/s400/coops_j7_rel.png" /></a></div>
<p>
Fluctuating line suggests that I should probably increase number of runs for each data points. But, let’s try to make some conclusion from what we have.
</p>
<p>
For heaps below 4GiB JVM is using special strategy (32 address could be used without uncompressing in this case). This difference is visible from chart (please note that point with 4GiB of old space, means that total heap size is above 4GiB and this optimization is inapplicable).
</p>
<p>
Above 4 GiB we see 10-30% increase in pause times. You should also not to forget that compressed case have to deal with 17% more data.
</p>
<h4>Conclusions</h4>
<p>
Using compressed OOPs affects young GC pause time which is not a surprise (especially taking increase amount of data). Using compression for heaps below 4GiB seems to be a total win, for larger heaps it seems to be reasonable price for increase capacity.
</p>
<p>
But main conclusion is that experiment has not revealed any surprises neither bad nor good ones. This may be not very exciting but is useful information anyway.
</p>
</div>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com5tag:blogger.com,1999:blog-7735872642513631302.post-83897979491430308982013-07-18T21:03:00.000+01:002013-07-21T09:12:50.790+01:00Coherence SIG: Performance Test Driven Development<p>Today was speaking at <a href="http://www.ukoug.org/events/ukoug-coherence-sig-jul/">Oracle Coherence SIG at London</a>.</p>
<p>Below you can find slide deck from my presentation.<p>
<iframe src="http://www.slideshare.net/slideshow/embed_code/24392336" width="476" height="400" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<a href="http://www.slideshare.net/aragozin/coherence-sig-performance-test-driven-development">http://www.slideshare.net/aragozin/coherence-sig-performance-test-driven-development</a>
Alexey Ragozinhttp://www.blogger.com/profile/13720493857045012756noreply@blogger.com0