Sunday, July 28, 2013

Java GC in Numbers - Compressed OOPs

Compressed OOPs (OOP – ordinary object pointer) is a technique reducing size of Java object in 64 bit environments. HotSpot wiki has a good article explaining details. Downside of this technique is what address uncompressing is required before accessing memory referenced by compressed OOPs. Instruction set (e.g. x86) may support such addressing type directly, but still, additional arithmetic would affect processing pipeline of CPU.

Young GC involves a lot of reference walking, so its time is expected to be affected by OOPs compression.

In this article, I’m comparing young GC pause time for 64 bit HotSpot JVM with and without OOPs compression. Methodic from previous article is used and benchmark code is available at github. There is one caveat though. With compressed OOPs size of object is smaller and same amount of heap could accommodate more objects. Benchmark is autoscaling number of entries to fill heap based entry footprint and old space size, thus with fixed old space size experiments with compression enabled have to deal with slightly larger number of objects (entry footprints are 288 uncompressed and 246 compressed).

Chart below shows absolute young GC pause times.

As you can see, compressed case is consistently slower, which is not a surprise.

Another char is showing relative difference between two cases (compressed GC pause mean / uncompressed GC pause mean for same case).

Fluctuating line suggests that I should probably increase number of runs for each data points. But, let’s try to make some conclusion from what we have.

For heaps below 4GiB JVM is using special strategy (32 address could be used without uncompressing in this case). This difference is visible from chart (please note that point with 4GiB of old space, means that total heap size is above 4GiB and this optimization is inapplicable).

Above 4 GiB we see 10-30% increase in pause times. You should also not to forget that compressed case have to deal with 17% more data.

Conclusions

Using compressed OOPs affects young GC pause time which is not a surprise (especially taking increase amount of data). Using compression for heaps below 4GiB seems to be a total win, for larger heaps it seems to be reasonable price for increase capacity.

But main conclusion is that experiment has not revealed any surprises neither bad nor good ones. This may be not very exciting but is useful information anyway.

5 comments:

  1. What hardware is this test running on ? I get significant shorter pauses on i7 (with another test ofc.). How is the estimated promotion rate of the benchmark ?

    regards,
    Rüdiger

    ReplyDelete
    Replies
    1. Test box was equipped with two 12 core x 2 hardware threads AMD CPUs (totaling in 48 hardware threads).

      They are not fastest once. Another factor is a benchmark, it is quite aggressive. On real applications numbers usually 2-3 times lower.

      BTW you can run benchmark on your hardware.
      Just read instruction on github (link in article). It will calculate and report average GC pause itself (not need for log scrapping).

      Delete
  2. I think the scale is 10 to high ? Anyway a very valuable read like all your GC/VM stuff :-)

    ReplyDelete
    Replies
    1. I have pretty exact 10 times lower numbers on i7 (not the same test), so i was wondering ..
      However retesting on an older AMD dual socket i got in a similar area. Seems like Gaming Ram can be a major performance boost for Java applications :-)

      Delete