Thursday, October 20, 2011

CMS heap fragmentation. Follow up 1

I have recently published an article Java GC, HotSpot's CMS and heap fragmentation. Post has lead to a number of interesting discussions and here I want to present some hard numbers from my experiments to back up these discussions.

Experiment details

Application

Test application is a synthetic application. It is allocating byte arrays of random size and eventually releasing them (with fairly complex pattern). Though application is rather simple, it shows heap space fragmentation symptoms similar to symptoms of real application. Application has 2 threads which are consuming 100% of CPU core time each. Object allocation sequence is deterministic, so I use number of young collections as time axis on diagrams.

JVM settings

64bit HotSpot JVM was used in experiments. Base line configuration is below

-server
-Xmx1700m -Xms1700m -Xmn300m
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+UseConcMarkSweepGC
-XX:-UseCompressedOops
-XX:+PrintPromotionFailure
-XX:CMSInitiatingOccupancyFraction=30
-XX:+UseCMSInitiatingOccupancyOnly
-XX:ParallelGCThreads=2
-XX:ConcGCThreads=1
-XX:PrintFLSStatistics=1

JVM process was limited to 2 cores (using taskset) to simulate high CPU load (real application is consuming all cores at pick loads, but for test application I would like to stick with single thread to keep its behavior deterministic and reproducible).

Live data set of application is about 1GiB (which leaves about 400MiB in old space as head room for CMS collector). These numbers are scaled down from real application setup.

For test runs with reduced old PLAB size, following options were added to JVM

-XX:OldPLABSize=16
-XX:-ResizeOldPLAB

Diagrams

Diagrams are showing max chunk size and total amount of free memory in old space on Y axis (in heap words, heap word = 8 bytes). X axis a showing number of young collections since start of JVM.

Fragmentation in different HotSpot revisions

As you can see something is really broken in HotSpot 6u24. Upgrading to version of 6u25 or above should be a good idea. See CR-6999988 for more details.

Anyway, in both revisions, reducing old PLAB size is considerably reducing fragmentation.

Adding more memory

Let's see how adding more memory to JVM will affect fragmentation. In next experiment heap size is increased by 200 MiB without increasing of live data set.

As you can see increasing CMS headroom drastically decreases fragmentation (though max available chunk size continues to decrease over time).

Adding more CPU cores

As I said before test application consumes all available cores and concurrent GC thread have to compete for CPU cycles with application. In this test, I have increased number of cores available to JVM process to three. No other parameters of experiment have been changed.

Results are little surprising. Even though average amount of free memory in lower for 3 core test vs. 2 core test , fragmentation issue goes away!

In other words, if your application is starving on CPU in may lead to increased fragmentation of CMS space and eventual Full GC freeze, even though amount of free memory reported in GC logs and via JMX does not show any indication of problem.

I really wish CMS free list space statistics would be exposed via JMX. Now I have to reconsider my baseline JVM health monitoring approach for production environments :)

Conclusions

HotSpot prior to 6u25 have severe issue leading to increased fragmentation.

Increasing CMS headroom (lag between live data set and old space size) is drastically reducing fragmentation.

CPU shortage may leading to fragmentation of memory, this problem cannot be detected by normal monitoring (only by parsing GC logs with options above).

Limiting OldPLABSize shows less fragmentation compared with default PLAB settings in all experiments (though it has negative performance impact).

Though synthetic CMS fragmentation tests have produced some interesting data it is still just a synthetic test. Real systems are much less predictable and not so easy to test unfortunately.
So, my main advice to everyone having to deal with GC issues - use thoughtful monitoring in your production environment.

17 comments:

Xiaodong XieNovember 21, 2011 at 4:01 PM
Hello, thank you very much for sharing this. But are there anything wrong with the title of the graphs in "Adding more memory" and "Adding more CPU cores" chapters? It should be JDK6u24 rather than JDK6u27?
ReplyDelete
Replies
Alexey RagozinNovember 21, 2011 at 4:38 PM
"Adding more memory" and "Adding more CPU cores" both are using JDK6u27.

For JDK6u24 best remedy is upgrade to u27. Fortunately I can do this, so I was interested in improving JDK6u27 base line.

For each case I'm showing 2 diagrams.
Fisrt - comparing JDK6u27 default PLAB size vs reduced PLAB.
Second - comparing base line JDK6u27 default PLAB size (from fisrt experiment) vs same JDK6u27 default PLAB size but in new resources (more CPU or larger -Xmx)

What be a litle confusing, but it show you
- benefit of add resource vs not adding resource
- benefit of reducing PLAB with new resource constraints
ReplyDelete
Replies
Xiaodong XieNovember 22, 2011 at 9:20 AM
Thanks for the explanation.
ReplyDelete
Replies
RichardJanuary 16, 2012 at 7:07 PM
What tool do you use to graph the gc log results?
ReplyDelete
Replies
Alexey RagozinJanuary 18, 2012 at 6:28 PM
Hand made script in java to extract numbers from GC logs and Excel and for charts.
ReplyDelete
Replies
Alexander AshitkinMay 12, 2012 at 9:16 AM
Hy, Alexey
Could you please advice - i see unexpected contionous CMS collections (jdk 1.6.17). I can't explain them as they start far before configured CMSInitiatingOccupancyFraction (UseCMSInitiatingOccupancyOnly is set). Could it be related to a heap fragmentation?
ReplyDelete
Replies
AzumSeptember 13, 2012 at 4:52 PM
Hi Alexey
I have tuned my JVM as i had concurrent mode failures and the performance of the app was impacted heavily.

Now the performance issue has gone away but still I can see the fragmentation in the max. chunk space from the FLS stats.

I am using the below GC algorithm:
-verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:MaxPermSize=512m
-XX:NewSize=512m
-XX:+UseConcMarkSweepGC
-XX:PrintFLSStatistics=1
-XX:ParallelGCThreads=2
-XX:CMSInitiatingOccupancyFraction=65
-XX:+UseCMSInitiatingOccupancyOnly

with this tuning promotion failure is still happening once in a week. Do I need to consider PLAB resizing or there is any better way in reducing the fragmentation.

regards

Azum
ReplyDelete
Replies
AzumSeptember 14, 2012 at 9:38 AM
Thanks Alexey - i will try to get the memory added.

Very useful article - much thanks.

regards
Azum

ReplyDelete
Replies

Add comment

Alexey Ragozin

Pages

Thursday, October 20, 2011

CMS heap fragmentation. Follow up 1

Experiment details

Application

JVM settings

Diagrams

Fragmentation in different HotSpot revisions

Adding more memory

Adding more CPU cores

Conclusions

17 comments:

About Me

Search This Blog

Popular Posts

Labels

Blog Archive

Alexey Ragozin

Pages

Thursday, October 20, 2011

CMS heap fragmentation. Follow up 1

Experiment details

Application

JVM settings

Diagrams

Fragmentation in different HotSpot revisions

Adding more memory

Adding more CPU cores

Conclusions

17 comments:

About Me

Search This Blog

Popular Posts

Labels

Blog Archive

Subscribe To