Comments on Alexey Ragozin: Safepoints in HotSpot JVM

Thanks for your quick updates. There are no prof...

2016-10-31T14:02:10.899+00:00

Thanks for your quick updates.

There are no profiler attached to the java process.

I don't see any log of general Threaddumps it means they are Internal Thread dumps as you mentioned.

Any other suggestions what to look for next?

Thanks,
KEshav

I would guess, it is result of profiler. Thread du...

2016-10-29T15:04:45.365+01:00

I would guess, it is result of profiler. Thread dumps are widely used by profiler and sometimes by monitoring tools. Java code could also cause thread dump for itself.

Hi, I am getting millions of following messages:...

2016-10-28T12:13:48.502+01:00

Hi,

I am getting millions of following messages:
54.104: ThreadDump [ 153 2 3 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
54.104: ThreadDump [ 153 3 4 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
54.104: ThreadDump [ 153 1 6 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
54.104: ThreadDump [ 153 2 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
54.105: ThreadDump [ 153 0 2 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
54.105: ThreadDump [ 153 1 6 ] [ 0 0 0 0 0 ] 0
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count

What could be the reason of ThreadDump activity in SafePoint?

Running VMStat for 5 hours has given the following...

2013-11-14T15:51:24.945+00:00

Running VMStat for 5 hours has given the following result, i am providing a part of the output:
swap free re mf pi po 40336468 4025208 383 5473 465 59 40336132 4025732 383 5477 465 59 40336020 4025732 383 5478 465 59 40335940 4025752 383 5479 465 59 40335860 4025776 383 5479 465 59 40335776 4025796 383 5480 465 59 40335696 4025816 383 5481 465 59 40335584 4025816 383 5482 464 59 40335504 4025836 383 5483 464 59 40335420 4025856 383 5484 464 59

Can we inference something from this output

Hi Alexey, as suggested by you i have posted quest...

2013-11-14T13:34:58.573+00:00

Hi Alexey, as suggested by you i have posted question on stackoverflow.com and here is the link

http://stackoverflow.com/questions/19979030/gc-is-taking-mammoth-time

Also here is the start up parameters and PrintGCDetails are not enabled and will take time as it is production server.

-Xms6144m -Xmx6144m -XX:MaxPermSize=256m -Djava.security.policy=/bea/wlserver_10.3/server/lib/weblogic.policy -Dweblogic.ProductionModeEnabled=true -da -Dplatform.home=/bea/wlserver_10.3 -Dwls.home=/bea/wlserver_10.3/server -Dweblogic.home=/bea/wlserver_10.3/server -Dweblogic.management.discover=true -Dwlw.iterativeDev=false -Dwlw.testConsole=false -Dwlw.logErrorsToConsole=false -Dweblogic.ext.dirs=/bea/patch_wls1036/profiles/default/sysext_manifest_classpath -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Duser.timezone=GMT -Dfile.encoding=UTF-8 -Duser.language=en -Duser.country=US -Dweblogic.wsee.wstx.wsat.deployed=false -XX:+DisableExplicitGC

Hi, To be able to give you a reasonable advise, I...

2013-11-14T12:02:08.969+00:00

Hi,

To be able to give you a reasonable advise, I need
- your JVM start parameters
- excerpt from your GC logs with at least -Xx:+PrintGCDetails enabled

I would also suggest you to post question on stackoverflow.com (and post link here) as it is better platform for that kind of questions.

Regards,
Alexey

Hi Alexey, your articles are very informative. Rec...

2013-11-14T10:17:57.740+00:00

Hi Alexey, your articles are very informative. Recently i have faced a situation where GC is taking mamooth time and not sure what can be the reason. Here is the output of jstat -gc command

S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT

77440.0 73088.0 22896.4 0.0 1946624.0 222690.4 4194304.0 3638965.1 262144.0 216641.1 1093 11258.452 3 10031.493 21289.944

Hi Alexey, thanks for the feedback. We did not alw...

2013-10-03T07:08:28.527+01:00

Hi Alexey, thanks for the feedback. We did not always turn on -XX:+PrintSafepointStatistics because the output is so obscure. Our test program that recreates the stall just uses log4j and does not make calls to JNI but it's great to know about this option PrintJNIGCStalls.

Have you enabled -XX:+PrintSafepointStatistics ? ...

2013-10-03T06:52:32.808+01:00

Have you enabled -XX:+PrintSafepointStatistics ?

Sometimes I've seen JVM spending too much time trying to enter to safe point. Safe point initiation time is accounted to GC pause time.

Another suspects are native threads taking GC lock via JNI (+XX:+PrintJNIGCStals may help to identify if this is a case).

Regards,
Alexey

Very informative article, thank you. We have seen ...

2013-09-24T06:05:38.998+01:00

Very informative article, thank you. We have seen due to IO overload inside Linux. When this happens, GC log entries show use_time ~0, sys_time ~ 0, and real-time ~ at least 1 second. We are able to recreate this type of stalls in the lab too. It turns out that deferred writes to append a file can be blocked for a long time when the write is blocked by journal commit. Or when dirty_ratio is exceeded. We straced the Java process and could correlate some but not all of the stalls to GC threads when they write to the gc.log file. If GC threads do not have park the Java threads running in kernel mode, we are stumped about what else could have caused the stall (where user_time ~ 0, sys_time ~0). Any other data/traces you would recommend to help us understand the issue better? Many thanks.

No, I didn't track exact bug. Slight change of...

2013-04-29T10:46:14.174+01:00

No, I didn't track exact bug. Slight change of code has solved issue in my case.
Yep, 2221291 is a nasty one.

We switched to 1.6.0_43, at the time that happened...

2013-04-29T08:34:06.410+01:00

We switched to 1.6.0_43, at the time that happened we had 1.6.0_31. One of the reasons was bug 2221291. Can you tell me the bug ID for the problem related to JIT?

First enable safe point logging -XX:+PrintSafepoin...

2013-04-24T19:35:29.330+01:00

First enable safe point logging -XX:+PrintSafepointStatistics
-XX:PrintSafepointStatisticsCount=1
This will allow you to understand whenever safepoint is culprit.

Last problem with slow safepoints, was bug in JIT combined with weird application code.

Trying latest JVM is another step.

Thank you for an eye-opening article on safepoints...

2013-04-24T13:27:45.401+01:00

Thank you for an eye-opening article on safepoints. Do you know if there is any way to identify the reason for a huge pause of hundreds of seconds that does not appear to be related to GC activity?

Total time for which application threads were stopped: 0.0020916 seconds
Total time for which application threads were stopped: 0.0677614 seconds
Total time for which application threads were stopped: 0.0016208 seconds
Total time for which application threads were stopped: 195.2580105 seconds
Total time for which application threads were stopped: 0.0313111 seconds
Total time for which application threads were stopped: 0.0005465 seconds
Total time for which application threads were stopped: 0.0006269 seconds

Using page faults for read barrier ("quickly ...

2012-10-10T09:17:51.134+01:00

Using page faults for read barrier ("quickly trap moved/GC'ed addresses") would be prohibitively expensive. Azul JVM does not use page faults for read barrier, though it is using this technique for defragmenting physical memory associated with large object.

Azul is using custom page mapping to facilitate software read barrier, but this technique does not relay on page faults.

Or at least it was that way last time I was working with Azul.

Nice article Alexey. Do you know what part of the ...

2012-10-01T19:15:42.144+01:00

Nice article Alexey. Do you know what part of the JDK code really does this - "JVM unmaps page with that address provoking page fault on application thread"?

I think the Azul JVM also used to do this to quickly trap moved/GC'ed addresses.