Thursday, June 2, 2011

Understanding GC pauses in JVM, HotSpot's CMS collector.

Concurrent Mark Sweep (CMS) is one of HotSpot JVM low pause garbage collectors. CMS can do most of its work for reclaiming memory concurrently with application (without stopping it). But still it requires few stop-the-world pauses to make its work. This article will explain nature of these pauses and how to minimize them.

Basics of concurrent mark sweep

HotSpot’s CMS is a generational collector, it means that heap is separated into young and old (tenured) space and these spaces are collected independently. For young space collection usual HotSpot’s copy collector is use (see previous article about HotSpot’s young space collector).  To enable of using CMS collector you have to specify ‑XX: +UseConcMarkSweepGC in JVM’s command line.
Concurrent Mark Sweep is used only to collect old space. CMS collection cycle has following phases:
  • Initial mark – this is stop-the-world phase while CMS is collecting root references.
  •  Concurrent mark – this phase is done concurrently with application, garbage collector traverses though object graph in old space marking live objects.
  • Concurrent pre clean – this is another concurrent phase, basically it is another mark phase which will try to account references changed during previous mark phase. Main reason for this phase is reduce time of stop-the-world remark phase.
  • Remark – once concurrent mark is finished, garbage collector need one more stop-the-world pause to account references which have been changed during concurrent mark phase.
  • Concurrent sweep – garbage collector will scan through whole old space and reclaim space occupied by unreachable objects.
  • Concurrent reset – after CMS cycle is finished, some structures have to be reset before next cycle can start.
Unlike most other garbage collectors, CMS does not do compaction of heap space. Instead of moving objects to make unoccupied space continuous, CMS keeps lists of all fragments of free memory. This way CMS is avoiding cost associated with relocating of live objects (and relocating of objects is expensive operation which require stop-the-world pause), but as down size of this heap space is prone to fragmentation. To minimize risk of fragmentation CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.

Length of CMS pauses

CMS itself has only two pauses, but your application will also experience pauses of young space collector which is working in conjunction with CMS. See previous article about pauses of young space collector.

Initial mark

During   initial mark CMS should collect all root references to start marking of old space. This includes:
  • References from thread stacks,
  • References from young space.
References from stacks are usually collected very quickly (less than 1ms), but time to collect references from young space depends on size of objects in young space. Normally initial mark starts right after young space collection, so Eden space is empty and only live objects are in one of survivor space. Survivor space is usually small and initial mark after young space collection often takes less than millisecond. But if initial mark is started when Eden is full it may take quite long (usually longer than young space collection itself).
Once CMS collection is triggered, JVM may wait some time for young collection to happen before it will start initial marking. JVM configuration option –XX:CMSWaitDuration=<t> can be used to set how long CMS will wait for young space collection before start of initial marking. If you want to avoid long initial marking pauses, you should configure this time to be longer than typical period of young collections in your application.

Remark

Most of marking is done in parallel with application, but it may not be accurate because application may modify object graph during marking. When concurrent marking is finished; garbage collector should stop application and repeat marking to be sure that all reachable objects marked as alive. But collector doesn’t have to traverse through whole object graph; it should traverse only reference modified since start of marking (actually since start pre clean phase). Card table (see card marking write barrier) is used to identify modified portions of memory in old space, but thread stacks and young space should be scanned once again.
 Usually most time of remark phase is spent of scanning young space. This time will be much shorter if we collect garbage in young space before starting of remark. We can instruct JVM to always force young space collection before CMS remark. Use JVM parameter –XX:+CMSScavengeBeforeRemark to enable this option.
Even is young space is empty, remark phase still have to scan through modified references in old space, this usually takes time close to normal young collection pause (due scanning of old space done during young collection is similar to scanning required for remark).

When CMS collection starts?

Unlike stop-the-world old space collectors, CMS collection cycle should start before old space become full. CMS collection is triggered when amount of free memory in old space falls below certain threshold (this threshold can be chosen by JVM based of runtime statistics or set via parameters) and actual start of CMS collection cycle may be delayed until next young collection.
Normally objects are allocated in old space only during young space collection (which may promote some objects to old space). So CMS cycle usually starts right after young space collection, which is good because init mark pause will be very small.
But in certain cases object may be allocated directly in old space and CMS cycle could start while Eden has lots of objects. In this case initial mark can be 10-100 times slower which is bad. Usually this is happening due to allocation of very large objects (few megabyte arrays).  To avoid these long pauses you should configure reasonable –XX:CMSWaitDuration.

Configuring fixed threshold for CMS start

You can set fixed threshold for olds space occupation for triggering CMS cycle by using JVM options ‑XX:+UseCMSInitiatingOccupancyOnly ‑XX:CMSInitiatingOccupancyFraction=70 (this will force CMS cycle to start when more than 70% of old space is used).

Explicitly invoking CMS cycle

You can also configure JVM to start CMS cycle by invocation of System.gc() by ‑XX:+ExplicitGCInvokesConcurrent command line option.

Full GC with CMS

If CMS cannot free enough in old space, JVM may fallback to compacting collector. Compacting collector will force stop-the-world pause so it can be considered emergency case. Normally you would like to avoid full GC and long stop-the-world pause associated with it. Full GC may happen either if CMS is not fast enough for dealing with garbage (or collection cycle has been started too late) or due to fragmentation of old space (there is no large enough continuous space for object to be allocated). Also it is possible that you just didn’t give JVM enough memory and after full GC it will through OutOfMemoryExpection anyway.

Permanent generation collection

One of reasons why CMS may end up in full GC is garbage in permanent space. By default CMS does not reclaim unused space in permanent space. If your application is using multiple class loaders and/or reflection you may need to enable collecting of garbage in permanent space. JVM option ‑XX:+CMSClassUnloadingEnabled will allow CMS collector to clean permanent space. Remember that objects in permanent space may have references to normal old space thus even if permanent space is not full itself, references from perm to old space may keep some dead objects unreachable for CMS if class unloading is not enabled.

Utilizing multiple cores

CMS has multiple phases. Some of them are concurrent; others are stop-the-world pauses but may be executed in parallel to compressed application freeze time.
‑XX:+CMSConcurrentMTEnabled – allows CMS to use multiple cores for concurrent phase.
‑XX:ConcGCThreads=<n> – specifies number of thread for concurrent phases.
‑XX:ParallelGCThreads=<n> – specifies number of thread for parallel work during stop-the-world pauses (by default it equals to number of physical cores).
‑XX:+UseParNewGC – instructs JVM to use parallel collector for young space collections in conjunction with CMS.

See also

HotSpot JVM garbage collection options cheat sheet 
Java GC, HotSpot's CMS and heap fragmentation
Other articles about garbage collection in this blog

9 comments:

  1. Hi Alexey,

    ‑XX:ConcGCThreads= is not supported by using -XX: command option, instead, it's set programmatically by using -XX:ParallelCMSThreads or -XX:ParallelMarkingThread=

    -Baihao Yuan

    ReplyDelete
  2. Hi,

    Which JVM version are you talking about?

    -XX:ParallelCMSThreads -XX:ParallelMarkingThread
    are not recognized by JVM 1.7 and 1.6 (at least since u24)

    -Alexey

    ReplyDelete
  3. hi, Alexey
    "But in certain cases object may be allocated directly in old space and CMS cycle could start while Eden has lots of objects. " Does it mean that CMS gc will trigger even olds space occupation is less than 70%
    ? ‑XX:+UseCMSInitiatingOccupancyOnly ‑XX:CMSInitiatingOccupancyFraction=70

    ReplyDelete
  4. Direct allocation in old space would increase old space occupation and may exceed threshold. In Java 7 and below, concurrent cycle could be also triggered by allocation in Perm gen, which also has occupancy threshold.

    ReplyDelete
  5. hi, Alexey, thank you for your help . Could you help me to analyze the reason of CMS gc,which is triggered by minor gc ?
    This is my jvm command :
    "CommandLine flags: -XX:CICompilerCount=4 -XX:+CMSClassUnloadingEnabled -XX:CMSFullGCsBeforeCompaction=2 -XX:CMSInitiatingOccupancyFraction=65 -XX:+CMSParallelRemarkEnabled -XX:+CMSScavengeBeforeRemark -XX:CMSWaitDuration=60000 -XX:ErrorFile=/opt/logs/mobile/adsms/adsms.vmerr.log -XX:+ExplicitGCInvokesConcurrent -XX:HeapDumpPath=/opt/logs/mobile/adsms/adsms.heaperr.log -XX:InitialHeapSize=12582912000 -XX:MaxHeapSize=12582912000 -XX:MaxTenuringThreshold=15 -XX:NewRatio=2 -XX:OldPLABSize=16 -XX:+ParallelRefProcEnabled -XX:PermSize=134217728 -XX:PretenureSizeThreshold=309857600 -XX:+PrintClassHistogramAfterFullGC -XX:+PrintClassHistogramBeforeFullGC -XX:+PrintFlagsFinal -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintJNIGCStalls -XX:+PrintReferenceGC -XX:+PrintTenuringDistribution -XX:StringTableSize=30011 -XX:SurvivorRatio=6 -XX:+TieredCompilation -XX:-UseBiasedLocking -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC -XX:+UseParNewGC"

    to be continued

    ReplyDelete
  6. this is my gc log :
    "2017-05-13T10:51:22.981+0800: 161814.005: [GC2017-05-13T10:51:22.981+0800: 161814.005: [Class Histogram (before full gc):
    2017-05-13T10:51:25.395+0800: 161816.419: [CMS[YG occupancy: 2702192 K (3584000 K)]2017-05-13T10:51:28.026+0800: 161819.051: [weak refs processing2017-05-13T10:51:28.026+0800: 161819.051: [SoftReference, 21 refs, 0.0008800 secs]2017-05-13T10:51:28.027+0800: 161819.052: [WeakReference, 961 refs, 0.0005750 secs]2017-05-13T10:51:28.028+0800: 161819.052: [FinalReference, 241 refs, 0.0006460 secs]2017-05-13T10:51:28.028+0800: 161819.053: [PhantomReference, 0 refs, 3 refs, 0.0010360 secs]2017-05-13T10:51:28.030+0800: 161819.054: [JNI Weak Reference, 0.0003030 secs], 0.0035570 secs]2017-05-13T10:51:28.030+0800: 161819.054: [class unloading, 0.0266950 secs]2017-05-13T10:51:28.057+0800: 161819.081: [scrub symbol table, 0.0060380 secs]2017-05-13T10:51:28.063+0800: 161819.087: [scrub string table, 0.0009950 secs]: 4142448K->4135557K(8192000K), 3.1914830 secs]2017-05-13T10:51:28.586+0800: 161819.611: [Class Histogram (after full gc):
    Heap after GC invocations=5935 (full 10):
    par new generation total 3584000K, used 2702192K [0x000000050a000000, 0x0000000604000000, 0x0000000604000000)
    eden space 3072000K, 87% used [0x000000050a000000, 0x00000005ae7ad0e8, 0x00000005c5800000)
    from space 512000K, 1% used [0x00000005c5800000, 0x00000005c5f2ef48, 0x00000005e4c00000)
    to space 512000K, 0% used [0x00000005e4c00000, 0x00000005e4c00000, 0x0000000604000000)
    concurrent mark-sweep generation total 8192000K, used 4135557K [0x0000000604000000, 0x00000007f8000000, 0x00000007f8000000)
    concurrent-mark-sweep perm gen total 131072K, used 68094K [0x00000007f8000000, 0x0000000800000000, 0x0000000800000000)
    }
    {Heap before GC invocations=5935 (full 10):
    par new generation total 3584000K, used 2702192K [0x000000050a000000, 0x0000000604000000, 0x0000000604000000)
    eden space 3072000K, 87% used [0x000000050a000000, 0x00000005ae7ad0e8, 0x00000005c5800000)
    from space 512000K, 1% used [0x00000005c5800000, 0x00000005c5f2ef48, 0x00000005e4c00000)
    to space 512000K, 0% used [0x00000005e4c00000, 0x00000005e4c00000, 0x0000000604000000)
    concurrent mark-sweep generation total 8192000K, used 4135557K [0x0000000604000000, 0x00000007f8000000, 0x00000007f8000000)
    concurrent-mark-sweep perm gen total 131072K, used 68094K [0x00000007f8000000, 0x0000000800000000, 0x0000000800000000)
    2017-05-13T10:51:30.778+0800: 161821.803: [Full GC2017-05-13T10:51:30.778+0800: 161821.803: [Class Histogram (before full gc):
    2017-05-13T10:51:33.004+0800: 161824.028: [CMS2017-05-13T10:51:33.822+0800: 161824.846: [SoftReference, 1437 refs, 0.0002190 secs]2017-05-13T10:51:33.822+0800: 161824.847: [WeakReference, 4480 refs, 0.0002930 secs]2017-05-13T10:51:33.822+0800: 161824.847: [FinalReference, 32 refs, 0.0000160 secs]2017-05-13T10:51:33.822+0800: 161824.847: [PhantomReference, 0 refs, 24 refs, 0.0000120 secs]2017-05-13T10:51:33.822+0800: 161824.847: [JNI Weak Reference, 0.0003360 secs]: 4135557K->4137966K(8192000K), 3.8480810 secs]2017-05-13T10:51:36.852+0800: 161827.877: [Class Histogram (after full gc):
    6837749K->4137966K(11776000K), [CMS Perm : 68094K->68052K(131072K)], 6.6932460 secs] [Times: user=6.49 sys=0.00, real=6.70 secs
    "
    CMS threshold is 65%.In my opinion, CMS will be triggered Only if old space occupation is greater than 8192000K* 65%= 5324800K .From the gc log ,when CMS gc was triggered ,old space occupation is 4135557K . What's the reason ? Thank you very much !

    ReplyDelete
  7. You have -XX:+ExplicitGCInvokesConcurrent option enabled. This GC may have been triggered by System.gc(). RMI is common culprit here, do you have -Dsun.rmi.dgc.server.gcInterval option?

    ReplyDelete
  8. No, I don't set the -Dsun.rmi.dgc.server.gcInterval option。Full gc happened once a day and we don't use RMI function。 I suspect The Fullgc was caused by old space fragmentation. Application was allocate a big array (about 1G bytes) when the fullgc happened.

    ReplyDelete
    Replies
    1. Fragmentation would cause a Full GC, but in your logs it looks like concurrent GC. If you are using JMX it is using RMI under hood.

      Delete