Term Stop-the-World pause is usually associated with garbage collection. Indeed GC is a major contributor to STW pauses, but not the only one.
In HotSpot JVM Stop-the-World pause mechanism is called safepoint. During safepoint all threads running java code are suspended. Threads running native code may continue to run as long as they do not interact with JVM (attempt to access Java objects via JNI, call Java method or return from native to java, will suspend thread until end of safepoint).
Stopping all threads are required to ensure what safepoint initiator have exclusive access to JVM data structures and can do crazy things like moving objects in heap or replacing code of method which is currently running (On-Stack-Replacement).
How safepoints work?
Safepoint protocol in HotSpot JVM is collaborative. Each application thread checks safepoint status and park itself in safe state in safepoint is required.
For compiled code, JIT inserts safepoint checks in code at certain points (usually, after return from calls or at back jump of loop). For interpreted code, JVM have two byte code dispatch tables and if safepoint is required, JVM switches tables to enable safepoint check.
Safepoint status check itself is implemented in very cunning way. Normal memory variable check would require expensive memory barriers. Though, safepoint check is implemented as memory reads a barrier. Then safepoint is required, JVM unmaps page with that address provoking page fault on application thread (which is handled by JVM’s handler). This way, HotSpot maintains its JITed code CPU pipeline friendly, yet ensures correct memory semantic (page unmap is forcing memory barrier to processing cores).
When safepoints are used?
Below are few reasons for HotSpot JVM to initiate a safepoint:
- Garbage collection pauses
- Code deoptimization
- Flushing code cache
- Class redefinition (e.g. hot swap or instrumentation)
- Biased lock revocation
- Various debug operation (e.g. deadlock check or stacktrace dump)
Trouble shooting safepoints
Normally safepoints just work. Thus, you can care less about them (most of them, except GC ones, are extremely quick). But if something can break it will break eventually, so here is useful diagnostic:
- -XX:+PrintGCApplicationStoppedTime – this will actually report pause time for all safepoints (GC related or not). Unfortunately output from this option lacks timestamps, but it is still useful to narrow down problem to safepoints.
- -XX:+PrintSafepointStatistics –XX:PrintSafepointStatisticsCount=1 – this two options will force JVM to report reason and timings after each safepoint (it will be reported to stdout, not GC log).
· How does JVM handle locks – quick info about biased locking