Sunday, July 28, 2013

Java GC in Numbers - Compressed OOPs

Compressed OOPs (OOP – ordinary object pointer) is a technique reducing size of Java object in 64 bit environments. HotSpot wiki has a good article explaining details. Downside of this technique is what address uncompressing is required before accessing memory referenced by compressed OOPs. Instruction set (e.g. x86) may support such addressing type directly, but still, additional arithmetic would affect processing pipeline of CPU.

Young GC involves a lot of reference walking, so its time is expected to be affected by OOPs compression.

In this article, I’m comparing young GC pause time for 64 bit HotSpot JVM with and without OOPs compression. Methodic from previous article is used and benchmark code is available at github. There is one caveat though. With compressed OOPs size of object is smaller and same amount of heap could accommodate more objects. Benchmark is autoscaling number of entries to fill heap based entry footprint and old space size, thus with fixed old space size experiments with compression enabled have to deal with slightly larger number of objects (entry footprints are 288 uncompressed and 246 compressed).

Chart below shows absolute young GC pause times.

As you can see, compressed case is consistently slower, which is not a surprise.

Another char is showing relative difference between two cases (compressed GC pause mean / uncompressed GC pause mean for same case).

Fluctuating line suggests that I should probably increase number of runs for each data points. But, let’s try to make some conclusion from what we have.

For heaps below 4GiB JVM is using special strategy (32 address could be used without uncompressing in this case). This difference is visible from chart (please note that point with 4GiB of old space, means that total heap size is above 4GiB and this optimization is inapplicable).

Above 4 GiB we see 10-30% increase in pause times. You should also not to forget that compressed case have to deal with 17% more data.

Conclusions

Using compressed OOPs affects young GC pause time which is not a surprise (especially taking increase amount of data). Using compression for heaps below 4GiB seems to be a total win, for larger heaps it seems to be reasonable price for increase capacity.

But main conclusion is that experiment has not revealed any surprises neither bad nor good ones. This may be not very exciting but is useful information anyway.

Thursday, July 11, 2013

Coherence 101, Filters performance and indexing

In this post, I would like to share some knowledge about optimizing indexes in Oracle Coherence.

Normally you should not abuse queering features of your data grid and, hence, you are unlikely to ever need to tune indexing/queering (besides choosing which indexes to create). But sometimes, you really need to squeeze as much performance as you can from your filter based operations. If it is your case, then few tricks described below may be helpful.

Extractor used to add index, should be "equal" to extractor used in filter

You are probably aware of this fact, but it is of critical importance and repeating this one more time will not do any harm. All query planning in Coherence relies on matching (using equals() method) of extractors used in index and filter.

Typical mistakes you could do here:

  • Use semantically equivalent, but different types of extractors (e.g. ReflectionExtractor and ChainedExtractor may extract exactly same attribute, but they will not be equal in Java sense).
  • Use custom extractor classes without implementing equals() and hashCode().
  • Mixing reflection based and POF based extractors.

In all cases above, your code will work, but index will not be used.

Indexing attributes with low-cardinality

Sometimes your query may include criterion for low-cardinality attribute. Not indexing this attribute will cause deserialization of all candidate entries to check attribute value.

Deserialization is something you really want to avoid in Coherence cluster under heavy load. Besides being CPU consuming, deserialization will produce a lot of garbage, risking to bringing you GC out of balance.

Adding index may bring another risk though. If you put your predicates in wrong order, such index may only slow down query.

Below is result of simple benchmark. I was using 2 Coherence storage nodes and 1000000 as data set. Ticker predicate is matching 1000 objects, and side predicate matching 500000. EqualsFilter and AndFilter were used to build query. Execution time of count aggregator was measured.

Tests were run on my laptop, so absolute numbers are not important (and not statistically sound to be honest).

Without indexes
  • side & ticker -- 5780 ms
  • ticker & side -- 5687 ms
Index by ticker
  • side & ticker -- 66 ms
  • ticker & side -- 66 ms
Both ticker and side indexed
  • side & ticker -- 496 ms
  • ticker & side -- 10 ms

As you can see, if you are unlucky and your query is not in right order, adding index may actually harm query performance.

There is a trick to protect you in this case. NoIndexFilter is a filter wrapper, which disables inverted index lookup for nested index. Forward map of index remains accessible, so testing attribute value will not require desrialization.

Both ticker and side indexed
  • no_index(side) & ticker -- 17 ms
  • ticker & no_index(side) -- 18 ms

As you can see, it takes some toll on "good query", but negates effect of "wrong order of predicates". You can also see that it is still 3 times faster than case where "side" was not indexed.

Exploiting composite indexes

You can make query above even more faster if you really need to.

Normally, with Coherence, you do not use composite indexes (instead you are indexing attributes individually). Creation of composite index is possible, but you will have to use specially composed queries to exploit composite index.

Code to add composite index will look like

ValueExtractor[] ve = {
    new ReflectionExtractor("getTicker"),
    new ReflectionExtractor("getSide")          
};
MultiExtractor me = new MultiExtractor(ve);
cache.addIndex(me, false, null);

and filter exploiting it will look like

ValueExtractor[] ve = {
    new ReflectionExtractor("getTicker"),
    new ReflectionExtractor("getSide")          
};
MultiExtractor me = new MultiExtractor(ve);
EqualsFilter composite = new EqualsFilter(me, Arrays.asList(ticker, side));

Below are results compared with traditional index/query.

Without indexes
  • ticker & side -- 5687 ms
  • composite -- 5998 ms
All Indexes
  • ticker & side -- 11 ms
  • composite -- 3 ms

Composite index is awkward to use, but, if it matches your case, you can get significant performance gain.

Few more links

That is it, for this post. You can also take a look at my slide deck from one of London Coherence SIGs, it explains few more advanced topics about indexes in Oracle Coherence.