Wednesday, July 6, 2011

HotSpot JVM garbage collection options cheat sheet

In this article I have collected a list of options related to GC tuning in JVM. This is not a comprehensive list, I have only collected options which I use in practice (or at least understand why I may want to use them).

HotSpot GC collectors

HotSpot JVM may use one of 6 combinations of garbage collection algorithms listed below.
Young collector
Old collector
JVM option
Serial (DefNew)
Serial Mark-Sweep-Compact
-XX:+UseSerialGC
Parallel scavenge (PSYoungGen)
Serial Mark-Sweep-Compact (PSOldGen)
-XX:+UseParallelGC
Parallel scavenge (PSYoungGen)
Parallel Mark-Sweep-Compact (ParOldGen)
-XX:+UseParallelOldGC
Serial (DefNew)
Concurrent Mark Sweep
-XX:+UseConcMarkSweepGC
-XX:-UseParNewGC
Parallel (ParNew)
Concurrent Mark Sweep
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
G1
-XX:+UseG1GC

GC logging options

JVM option
Description
General options
-verbose:gc or -XX:+PrintGC
Print basic GC info
-XX:+PrintGCDetails
Print more elaborated GC info
-XX:+PrintGCTimeStamps
Print timestamps for each GC event (seconds count from start of JVM)
-Xloggc:<file>
Redirects GC output to file instead of console
-XX:+PrintTenuringDistribution
Print detailed demography of young space after each collection
-XX:+PrintTLAB
Print TLAB allocation statistics
-XX:+PrintGCApplication\
StoppedTime
Print pause summary after each stop-the-world pause
-XX:+PrintGCApplication\
ConcurrentTime
Print time for each concurrent phase of GC
-XX:+HeapDumpAfterFullGC
Creates heap dump file after full GC
-XX:+HeapDumpBeforeFullGC
Creates heap dump file before full GC
-XX:+HeapDumpOnOutOfMemoryError
Creates heap dump in out-of-memory condition
-XX:HeapDumpPath=<path>
Specifies path to save heap dumps
CMS specific options
-XX:PrintCMSStatistics=<n>
Print additional CMS statistics if n >= 1
-XX:+PrintCMSInitiationStatistics
Print CMS initiation details
-XX:PrintFLSStatistics=2
Print additional info concerning free lists
-XX:PrintFLSCensus=2
Print additional info concerning free lists
-XX:+CMSDumpAtPromotionFailure
Dump useful information about the state of the CMS old generation upon a promotion failure.
-XX:+CMSPrintChunksInDump
In a CMS dump enabled by option above, include more detailed information about the free chunks.
-XX:+CMSPrintObjectsInDump
In a CMS dump enabled by option above, include more detailed information about the allocated objects.

JVM sizing options

JVM option
Description
-Xms<size> -Xmx<size>
or
‑XX:InitialHeapSize=<size>
‑XX:MaxHeapSize=
<size>
Initial and max size of heap space (young space + tenured space). Permanent space does not count to this size.
-XX:NewSize=<size> 
-XX:MaxNewSize=<size>
Initial and max size of young space.
-XX:NewRatio=<ratio>
Alternative way to specify young space size. Sets ration of young vs tenured space (e.g. -XX:NewRatio=2 means that young space will be 2 time smaller than tenuted space).
-XX:SurvivorRatio=<ratio>
Sets size of single survivor space as a portion of Eden space size (e.g. -XX:NewSize=64m -XX:SurvivorRatio=6 means that each survivor space will be 8m and eden will be 48m).
-XX:PermSize=<size> 
-XX:MaxPermSize=<size>
Initial and max size of permanent space.
-Xss=<size> or
-XX:ThreadStackSize=<size>
Sets size of stack area dedicated to each thread. Thread stacks do not count to heap size.
-XX:MaxDirectMemorySize=<value>
Maximum size of off-heap memory available for JVM

Young collection tuning

JVM option
Description
-XX:InitialTenuringThreshold=<n>
Initial value for tenuring threshold (number of collections before object will be promoted to tenured space).
-XX:MaxTenuringThreshold=<n>
Max value for tenuring threshold.
-XX:PretenureSizeThreshold=<size>
Max object size allowed to be allocated in young space (large objects will be allocated directly in old space). Thread local allocation bypasses this check so if TLAB is large enough object exciding size threshold still may be allocated in young.
-XX:+AlwaysTenure
Promote all objects surviving young collection immediately to tenured space (equivalent of -XX:MaxTenuringThreshold=0)
-XX:+NeverTenure
Objects from young space will never get promoted to tenured space while survivor space is large enough to keep them.
Thread local allocation blocks
-XX:+UseTLAB
Use thread local allocation blocks in young space. Enabled by default.
-XX:+ResizeTLAB
Allow JVM to adaptively resize TLAB for threads.
-XX:TLABSize=<size>
Initial size of TLAB for thread
-XX:MinTLABSize=<size>
Minimal allowed size of TLAB

CMS tuning options

JVM option
Description
Controlling initial mark phase
-XX:+UseCMSInitiatingOccupancyOnly
Only use occupancy as a criterion for starting a CMS collection.
-XX:CMSInitiating\
OccupancyFraction=<n>
Percentage CMS generation occupancy to start a CMS collection cycle. A negative value means that CMSTriggerRatio is used.
-XX:CMSBootstrapOccupancy=<n>
Percentage CMS generation occupancy at which to initiate CMS collection for bootstrapping collection stats.
-XX:CMSTriggerRatio=<n>
Percentage of MinHeapFreeRatio in CMS generation that is allocated before a CMS collection cycle commences.
-XX:CMSTriggerPermRatio=<n>
Percentage of MinHeapFreeRatio in the CMS perm generation that is allocated before a CMS collection cycle commences, that also collects the perm generation.
-XX:CMSWaitDuration=<timeout>
Once CMS collection is triggered, it will wait for next young collection to perform initial mark right after. This parameter specifies how long CMS can wait for young collection.
Controlling remark phase
-XX:+CMSScavengeBeforeRemark
Force young collection before remark phase.
-XX:+CMSScheduleRemark\
EdenSizeThreshold
If Eden used is below this value, don't try to schedule remark
-XX:CMSScheduleRemark\
EdenPenetration=<n>
The Eden occupancy % at which to try and schedule remark pause
-XX:CMSScheduleRemark\
SamplingRatio=<n>
Start sampling Eden top at least before young generation occupancy reaches 1/n of the size at which we plan to schedule remark
Parallel execution
-XX:+UseParNewGC
Use parallel algorithm for young space collection.
-XX:+CMSConcurrentMTEnabled
Use multiple threads for concurrent phases.
-XX:ConcGCThreads=<n>
Number of parallel threads used for concurrent phase.
-XX:+ParallelGCThreads=<n>
Number of parallel threads used for stop-the-world phases.
CMS incremental mode
-XX:+CMSIncrementalMode
Enable incremental CMS mode. Incremental mode is meant for severs with small number of CPU.
Miscellaneous options
-XX:+CMSClassUnloadingEnabled
If not enabled, CMS will not clean permanent space. You should always enable it in multiple class loader environments such as JEE or OSGi.
-XX:+ExplicitGCInvokesConcurrent
Let System.gc() trigger concurrent collection instead of full GC.
‑XX:+ExplicitGCInvokesConcurrent\
AndUnloadsClasses
Same as above but also triggers permanent space collection.

Miscellaneous GC options

JVM option
Description
-XX:+DisableExplicitGC
JVM will ignore application calls to System.gc()

Tuesday, June 28, 2011

How to tame java GC pauses? Surviving 16GiB heap and greater.

Memory is cheap and abundant on modern servers. Unfortunately there is a serious obstacle for using these memory resources to their full in Java programs. Garbage collector pauses are a serious treat for a JVM with a large heap size. There are very few good sources of information about practical tuning of Java GC and unfortunately they seem to be relevant for 512MiB - 2GiB heaps size. Recently I have spent a good amount of time investigating performance of various JVMs with 32GiB heap size. In this article I would like to provide practical guidelines for tuning HotSpot JVM for large heap sizes.

Full article is available at Javalobby http://java.dzone.com/articles/how-tame-java-gc-pauses

Thursday, June 2, 2011

Understanding GC pauses in JVM, HotSpot's CMS collector.

Concurrent Mark Sweep (CMS) is one of HotSpot JVM low pause garbage collectors. CMS can do most of its work for reclaiming memory concurrently with application (without stopping it). But still it requires few stop-the-world pauses to make its work. This article will explain nature of these pauses and how to minimize them.

Basics of concurrent mark sweep

HotSpot’s CMS is a generational collector, it means that heap is separated into young and old (tenured) space and these spaces are collected independently. For young space collection usual HotSpot’s copy collector is use (see previous article about HotSpot’s young space collector).  To enable of using CMS collector you have to specify ‑XX: +UseConcMarkSweepGC in JVM’s command line.
Concurrent Mark Sweep is used only to collect old space. CMS collection cycle has following phases:
  • Initial mark – this is stop-the-world phase while CMS is collecting root references.
  •  Concurrent mark – this phase is done concurrently with application, garbage collector traverses though object graph in old space marking live objects.
  • Concurrent pre clean – this is another concurrent phase, basically it is another mark phase which will try to account references changed during previous mark phase. Main reason for this phase is reduce time of stop-the-world remark phase.
  • Remark – once concurrent mark is finished, garbage collector need one more stop-the-world pause to account references which have been changed during concurrent mark phase.
  • Concurrent sweep – garbage collector will scan through whole old space and reclaim space occupied by unreachable objects.
  • Concurrent reset – after CMS cycle is finished, some structures have to be reset before next cycle can start.
Unlike most other garbage collectors, CMS does not do compaction of heap space. Instead of moving objects to make unoccupied space continuous, CMS keeps lists of all fragments of free memory. This way CMS is avoiding cost associated with relocating of live objects (and relocating of objects is expensive operation which require stop-the-world pause), but as down size of this heap space is prone to fragmentation. To minimize risk of fragmentation CMS is doing statistical analysis of object’s sizes and have separate free lists for objects of different sizes.

Length of CMS pauses

CMS itself has only two pauses, but your application will also experience pauses of young space collector which is working in conjunction with CMS. See previous article about pauses of young space collector.

Initial mark

During   initial mark CMS should collect all root references to start marking of old space. This includes:
  • References from thread stacks,
  • References from young space.
References from stacks are usually collected very quickly (less than 1ms), but time to collect references from young space depends on size of objects in young space. Normally initial mark starts right after young space collection, so Eden space is empty and only live objects are in one of survivor space. Survivor space is usually small and initial mark after young space collection often takes less than millisecond. But if initial mark is started when Eden is full it may take quite long (usually longer than young space collection itself).
Once CMS collection is triggered, JVM may wait some time for young collection to happen before it will start initial marking. JVM configuration option –XX:CMSWaitDuration=<t> can be used to set how long CMS will wait for young space collection before start of initial marking. If you want to avoid long initial marking pauses, you should configure this time to be longer than typical period of young collections in your application.

Remark

Most of marking is done in parallel with application, but it may not be accurate because application may modify object graph during marking. When concurrent marking is finished; garbage collector should stop application and repeat marking to be sure that all reachable objects marked as alive. But collector doesn’t have to traverse through whole object graph; it should traverse only reference modified since start of marking (actually since start pre clean phase). Card table (see card marking write barrier) is used to identify modified portions of memory in old space, but thread stacks and young space should be scanned once again.
 Usually most time of remark phase is spent of scanning young space. This time will be much shorter if we collect garbage in young space before starting of remark. We can instruct JVM to always force young space collection before CMS remark. Use JVM parameter –XX:+CMSScavengeBeforeRemark to enable this option.
Even is young space is empty, remark phase still have to scan through modified references in old space, this usually takes time close to normal young collection pause (due scanning of old space done during young collection is similar to scanning required for remark).

When CMS collection starts?

Unlike stop-the-world old space collectors, CMS collection cycle should start before old space become full. CMS collection is triggered when amount of free memory in old space falls below certain threshold (this threshold can be chosen by JVM based of runtime statistics or set via parameters) and actual start of CMS collection cycle may be delayed until next young collection.
Normally objects are allocated in old space only during young space collection (which may promote some objects to old space). So CMS cycle usually starts right after young space collection, which is good because init mark pause will be very small.
But in certain cases object may be allocated directly in old space and CMS cycle could start while Eden has lots of objects. In this case initial mark can be 10-100 times slower which is bad. Usually this is happening due to allocation of very large objects (few megabyte arrays).  To avoid these long pauses you should configure reasonable –XX:CMSWaitDuration.

Configuring fixed threshold for CMS start

You can set fixed threshold for olds space occupation for triggering CMS cycle by using JVM options ‑XX:+UseCMSInitiatingOccupancyOnly ‑XX:CMSInitiatingOccupancyFraction=70 (this will force CMS cycle to start when more than 70% of old space is used).

Explicitly invoking CMS cycle

You can also configure JVM to start CMS cycle by invocation of System.gc() by ‑XX:+ExplicitGCInvokesConcurrent command line option.

Full GC with CMS

If CMS cannot free enough in old space, JVM may fallback to compacting collector. Compacting collector will force stop-the-world pause so it can be considered emergency case. Normally you would like to avoid full GC and long stop-the-world pause associated with it. Full GC may happen either if CMS is not fast enough for dealing with garbage (or collection cycle has been started too late) or due to fragmentation of old space (there is no large enough continuous space for object to be allocated). Also it is possible that you just didn’t give JVM enough memory and after full GC it will through OutOfMemoryExpection anyway.

Permanent generation collection

One of reasons why CMS may end up in full GC is garbage in permanent space. By default CMS does not reclaim unused space in permanent space. If your application is using multiple class loaders and/or reflection you may need to enable collecting of garbage in permanent space. JVM option ‑XX:+CMSClassUnloadingEnabled will allow CMS collector to clean permanent space. Remember that objects in permanent space may have references to normal old space thus even if permanent space is not full itself, references from perm to old space may keep some dead objects unreachable for CMS if class unloading is not enabled.

Utilizing multiple cores

CMS has multiple phases. Some of them are concurrent; others are stop-the-world pauses but may be executed in parallel to compressed application freeze time.
‑XX:+CMSConcurrentMTEnabled – allows CMS to use multiple cores for concurrent phase.
‑XX:ConcGCThreads=<n> – specifies number of thread for concurrent phases.
‑XX:ParallelGCThreads=<n> – specifies number of thread for parallel work during stop-the-world pauses (by default it equals to number of physical cores).
‑XX:+UseParNewGC – instructs JVM to use parallel collector for young space collections in conjunction with CMS.

See also

HotSpot JVM garbage collection options cheat sheet 
Java GC, HotSpot's CMS and heap fragmentation
Other articles about garbage collection in this blog