my Patch mentioned in this post (RFE-7068625) for JVM garbage collector was accepted into HotSpot JDK code base and available starting from 7u40 version of HotSport JVM from Oracle.
This was a reason for me to redo some of my GC benchmarking experiments. I have already mentioned ParGCCardsPerStrideChunk in article related to patch. This time, I decided study effect of this option more closely.
Parallel copy collector (ParNew), responsible for young collection in CMS, use ParGCCardsPerStrideChunk value to control granularity of tasks distributed between worker threads. Old space is broken into strides of equal size and each worker responsible for processing (find dirty pages, find old to young references, copy young objects etc) a subset of strides. Time to process each stride may vary greatly, so workers may steal work from each other. For that reason number of strides should be greater than number of workers.
By default ParGCCardsPerStrideChunk =256 (card is 512 bytes, so it would be 128KiB of heap space per stride) which means that 28GiB heap would be broken into 224 thousands of strides. Provided that number of parallel GC threads is usually 4 orders of magnitude less, this is probably too many.
Synthetic benchmark
First, I have run GC benchmark from previous
article using 2k, 4k and 8K for this option. HotSpot JVM 7u3 was used in
experiment.
It seems that default value (256 cards per strides) is too
small even for moderate size heaps. I decided to continue my experiments with
stride size 4k as it shows most consistent improvement across whole range of
heap sizes.
Benchmark above is synthetic and very simple. Next step is to choose more realistic use case. I usual, my choice is to use Oracle Coherence storage node as my guinea pig.
Benchmarking Coherence storage node
In this experiment I’m filling cache node with objects (object
70% of old space filled with live objects), then put it under mixed read/write
load and measuring young GC pauses of JVM. Experiment was conducted with two
different heap sizes (28 GiB and 14 GiB), young space for both cases was
limited by 128MiB, compressed pointers were enabled.
Coherence node with 28GiB of heap
JVM
|
Avg. pause
|
Improvement
|
7u3
|
0.0697
|
0
|
7u3, stride=4k
|
0.045
|
35.4%
|
0.0546
|
21.7%
|
|
Patched OpenJDK 7,
stride=4k
|
0.0284
|
59.3%
|
Coherence node with 14GiB of heap
JVM
|
Avg. pause
|
Improvement
|
7u3
|
0.05
|
0
|
7u3, stride=4k
|
0.0322
|
35.6%
|
This test is close enough to real live Coherence work
profile and such improvement of GC pause time has practical importance. I have
also included JVM built from OpenJDK trunk with enabled RFE-7068625
patch for 28 GiB test, as expected effect of patch is cumulative with
stride size tuning.
Stock JVMs from Oracle are supported
Good news is that you do not have to wait for next version of JVM, ParGCCardsPerStrideChunk option is available in all Java 7 HotSpot JVMs and most recent Java 6 JVMs. But this option is classified as diagnostic so you should enable diagnostic options to use it.
-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=4096
This is great. I assume you will post your findings in the OpenJDK mailing list?
ReplyDeleteBest,
Ismael
Well, as a part of patch review, JDK guys did some testing with different stride size. So this is should not be a news for them.
DeleteWhy not make 4096 the default when committing your patch, then?
DeleteWhy would total heap size matter if you constrain young gen to 128MB?
ReplyDeleteThis is counter-intuitive but young pause time depends on old space size.
DeleteOne reason is explained here (http://blog.ragozin.info/2011/06/understanding-gc-pauses-in-jvm-hotspots.html).
Other reason is reducing of cache locality for various internal JVM data structures.
That makes sense. Good explanation, thanks!
DeleteHi, Alexey
ReplyDeleteyesterday i tested 7u5 on the Solaris 10/Sparc/ParGCCardsPerStrideChunk=2048 and found newgen pauses could be decreased with the proposed options ~20% (footprint ~1.3G)
Config Avg Pause diff, %
4096 -22,03
2048 -23,81
1024 -17,05
Hy Alexey,
ReplyDeleteFirst - many thanks for your great blog - very useful indeed!!
According to Oracle the path has been merge into hs24. Am i correct in thinking this translates into jdk 1.7u12 or higher?
Thanks,
Dany
I'm tracking it yet. If 1.7u12 is going to be based on hs24, patch is likely to be there.
DeleteDid not quite understand as I did not have enough time or background, but are you saying for large heaps > 15 GB in our case, these actually improve old gen pauses guaranteed? How do you choose the stride size and what is the tradeoff?
ReplyDeletejava -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)
In theory, in stride size is too big, work distribution could be uneven reducing effective parallelism.
DeleteIn practice 4k seems good enough. Increasing it does marginal gain.
Now, we plan some 300+ GiB of heap deployment, I plan to try 8K size for that case.
BTW upgrading to 1.7.0_40 should yield better result, as it includes patch mentioned in post.
Awesome thanks, upgrading my JDK now.
Delete