Wednesday, June 6, 2012

Story of relentless CMS

Recently, a comment on other article of this blog has led me to noteworthy issue with CMS (Concurrent Mark Sweep) garbage collector.
Problem has appeared after minor release of application. JVM was configured to use CMS and it was working fine, but after a change its behavior has changed. Normally CMS is doing collection cycle when old space usage meets certain threshold, so you can see famous saw of heap usage.
Most time CMS is staying idle, just occasionally doing a collection. But after release, heap usage diagram have changed to something like this.

For some reason CMS is not waiting for heap usage threshold anymore and doing one old space GC cycle right after another. Heap usage diagram may not look bad by itself, but continuous background GC means that at least one core is constantly occupied by marking and sweeping old space. In addition, sweeping over memory also impacts cache utilization in CPUs causing additional impact to performance.
My first guess about –XX:+UseCMSInitiatingOccupancyOnly flag being missed was wrong (you know without that flag JVM could adjust CMS initiation threshold at runtime according to internal heuristics), CMS setup was fine.
After scanning through options, –XX:+CMSClassUnloadingEnabled flag has drawn my attention. By default CMS will not collect permanent space; you should use that flag to enable it. Permanent space is a special memory space used by Java class objects and some JVM data structures; it is not part of application heap and being sized separately. It means, in particular, that permanent space has its own memory usage which is not correlated with old space memory usage.
So, if CMS for permanent space is enabled, GC cycle will be triggered if either old space or permanent space has reached usage threshold. This turned out to be a problem. Permanent space was a little too small for application, so CMS were trying to collect it relentlessly.
Increasing permanent space size (-XX:PermSize=size) has solved an issue.
Alternative approach could be using different threshold for old and permanent space (i.e. –XX: CMSInitiatingPermOccupancyFraction=percent). Also it may make a sense to turn off permanent space collection at all, many applications just do not need it (it was called “permanent” for reason after all).