Thursday, July 11, 2013

Coherence 101, Filters performance and indexing

In this post, I would like to share some knowledge about optimizing indexes in Oracle Coherence.

Normally you should not abuse queering features of your data grid and, hence, you are unlikely to ever need to tune indexing/queering (besides choosing which indexes to create). But sometimes, you really need to squeeze as much performance as you can from your filter based operations. If it is your case, then few tricks described below may be helpful.

Extractor used to add index, should be "equal" to extractor used in filter

You are probably aware of this fact, but it is of critical importance and repeating this one more time will not do any harm. All query planning in Coherence relies on matching (using equals() method) of extractors used in index and filter.

Typical mistakes you could do here:

  • Use semantically equivalent, but different types of extractors (e.g. ReflectionExtractor and ChainedExtractor may extract exactly same attribute, but they will not be equal in Java sense).
  • Use custom extractor classes without implementing equals() and hashCode().
  • Mixing reflection based and POF based extractors.

In all cases above, your code will work, but index will not be used.

Indexing attributes with low-cardinality

Sometimes your query may include criterion for low-cardinality attribute. Not indexing this attribute will cause deserialization of all candidate entries to check attribute value.

Deserialization is something you really want to avoid in Coherence cluster under heavy load. Besides being CPU consuming, deserialization will produce a lot of garbage, risking to bringing you GC out of balance.

Adding index may bring another risk though. If you put your predicates in wrong order, such index may only slow down query.

Below is result of simple benchmark. I was using 2 Coherence storage nodes and 1000000 as data set. Ticker predicate is matching 1000 objects, and side predicate matching 500000. EqualsFilter and AndFilter were used to build query. Execution time of count aggregator was measured.

Tests were run on my laptop, so absolute numbers are not important (and not statistically sound to be honest).

Without indexes
  • side & ticker -- 5780 ms
  • ticker & side -- 5687 ms
Index by ticker
  • side & ticker -- 66 ms
  • ticker & side -- 66 ms
Both ticker and side indexed
  • side & ticker -- 496 ms
  • ticker & side -- 10 ms

As you can see, if you are unlucky and your query is not in right order, adding index may actually harm query performance.

There is a trick to protect you in this case. NoIndexFilter is a filter wrapper, which disables inverted index lookup for nested index. Forward map of index remains accessible, so testing attribute value will not require desrialization.

Both ticker and side indexed
  • no_index(side) & ticker -- 17 ms
  • ticker & no_index(side) -- 18 ms

As you can see, it takes some toll on "good query", but negates effect of "wrong order of predicates". You can also see that it is still 3 times faster than case where "side" was not indexed.

Exploiting composite indexes

You can make query above even more faster if you really need to.

Normally, with Coherence, you do not use composite indexes (instead you are indexing attributes individually). Creation of composite index is possible, but you will have to use specially composed queries to exploit composite index.

Code to add composite index will look like

ValueExtractor[] ve = {
    new ReflectionExtractor("getTicker"),
    new ReflectionExtractor("getSide")          
};
MultiExtractor me = new MultiExtractor(ve);
cache.addIndex(me, false, null);

and filter exploiting it will look like

ValueExtractor[] ve = {
    new ReflectionExtractor("getTicker"),
    new ReflectionExtractor("getSide")          
};
MultiExtractor me = new MultiExtractor(ve);
EqualsFilter composite = new EqualsFilter(me, Arrays.asList(ticker, side));

Below are results compared with traditional index/query.

Without indexes
  • ticker & side -- 5687 ms
  • composite -- 5998 ms
All Indexes
  • ticker & side -- 11 ms
  • composite -- 3 ms

Composite index is awkward to use, but, if it matches your case, you can get significant performance gain.

Few more links

That is it, for this post. You can also take a look at my slide deck from one of London Coherence SIGs, it explains few more advanced topics about indexes in Oracle Coherence.

2 comments:

  1. I've added an index that use a custom extractor that extends AbstractExtractor and overrides only the extract method to return a List of Strings. Then I have a ContainsFilter which uses the same custom extractor that looks for the occurence of a single String in the List of Strings. It does not look like my index is being used. What am I doing wrong? Also, is there some debugging I can switch on to see which indices are used?

    ReplyDelete
  2. Does your custor extractor implements "equals" and "hashCode"?
    Extractor instance themself is used a key to find index to use.
    Take a look at http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/util/aggregator/QueryRecorder.html, it should help you to get execution plan information for query.

    ReplyDelete