Read-through is a technique which allows
cache to automatically populate entry for external data source up on cache
miss. Oracle Coherence supports this technique via read-write-backing-map and application
provided cache loaders (you can read more in Coherence
documentation).
CacheLoader/CacheStore vs. BinaryEntryStore
You cache loader/store plug-in may either implement
CacheLoader/CacheStore interface or BinaryEntryStore interface. BinaryEntryStore have following key advantages:
Why Coherence is doing load() before store()?
Assume that we working with key which does
not exist in cache. If you just put(…) new key via named cache
interface, Coherence would work as expected. It will add object to a cache and
call store(…) in cache store plug-in. But if you will use entry processor and setValue(…) for entry which is not in cache – surprise, surprise – Coherence will
first load(…) key and then store(…) new value.
Reason is simple, setValue(…) should return pervious value as result of operation. Use other
version of method – setValue(value,
false) to avoid unnecessary load(…)
call. BTW way putAll(…) should be preferred over put(…) for same reason – putAll(…)
is not required to return previous value.
load() vs. loadAll() methods
Assume that your cache loader using SQL to
fetch data from RDBMS. It is clear what single SQL select retrieving N entries (e.g. using in (…) in where
clause) at once is better than N
subsequent SQL selects each fetching only one entry.
Prior to Coherence 3.7, read-write backing
map implementation were using sequential approach (making bulk cache preloading
with read-though impractical). In Coherence 3.7 this was fixed (but you should use
at least 3.7.1.3 version, earlier versions have known bugs related to read-through).
So, in 3.7 getAll() will use loadAll()
under hood (but remember that your key set will be split to partitions,
distributed across storage members and each storage member will process
read-though in partition-parallel fashion).
But will it work with aggregators and entry
processors invoked over collection of keys? – not so fast …
BTW If you stick
with 3.6 or earlier you can read about work around here.
Aggregator warm up
Assume that you know key set you want to
aggregate using Coherence distributed
aggregation, but some many of these keys may not be in cache (i.g. not-yet-loaded).
Read-though is enabled.
Instance of your aggregator started on
storage node will receive set of BinaryEntrys from Coherence. But
it does mean that all these entries are present in cache, Coherence will not
try to preload working set for aggregator. Actually aggregator may decide to
ignore data-not-in-cache (see isPresent() method). But if it call any kind of “get” methods on entry,
Coherence will load value via cache loader plug-in. Problem is – it will be
done in sequential manner, so this may take A LOT of time.
Can we work this around? - Sure.
Simplest workaround is call getAll()
before invoking aggregator (but it kills idea of distributed
aggregation). A smarter way is dig though internal cache layers and load
entries via call to read-write-backing-map. Snippet below can be used for
effective preloading for set of entries in aggregators and entry processors.
public static void preloadValuesViaReadThrough(Set<BinaryEntry> entries) { CacheMap backingMap = null; Set<Object> keys = new HashSet<Object>(); for (BinaryEntry entry : entries) { if (backingMap == null) { backingMap = (CacheMap) entry.getBackingMapContext().getBackingMap(); } if (!entry.isPresent()) { keys.add(entry.getBinaryKey()); } } backingMap.getAll(keys); }
Aggregation, expiry and past expiry entry resurrection
If you are using read-write backing map in
combination with expiry, you may be prone to following effect.
Assume that your cache is idle for some time
and some of cache entries are already past their expiry. Now you are issuing an
aggregator over all cache data (in my case it was a regular housekeeping job interested
only in live cache data). Filters in Coherence can match only cache data (they
never trigger read-though), but surprisingly, operation described above starts
storming DB with read-through requests!
What has happen?
Lazy expiry
Local cache (acting as internal map for
read-write backing map) is doing expiry passively. If you are not touching it,
it cannot expire anything. But if you call any of its method, expiry check will
be triggered and entries may be physically removed for cache at this point.
Key index of partitioned cache service
Partitioned cache service has internal
structure called “key index” – it is simply a set of all keys in local backing
map. When you issuing a filter based operation, Coherence calculates key set
first (using filter), then perform operation (e.g. aggregation) over know set
of keys. A set of all keys are passed to filter, then filter may decide which
keys to process (it can consult with indexes at this point) and whenever
further filtering by value is required. AlwaysFilter
is very simple; it does not require any value filtering, so Coherence just
passing whole “key index” content as input for aggregation without consulting
with backing map.
Together
A lot of entries in cache are past expiry,
but they are still in cache because it is idle and local cache has no
opportunity to perform expiry check. Aggregator with AlwaysFilter
is issued, and Coherence storage member will perform aggregation against all
keys currently in “key index” (including key past their expiry). Access to
first entry from aggregator will trigger expiry check in backing map,
effectively wiping out expired entries. But aggregator instance is already started
and its entry set already has these keys. By processing recently expired entries,
which are in its entry set, aggregator will be triggering read-though resurrecting
them (and of cause it would be doing it one by one – read SLOW).
How to prevent this?
Well, my conditions are little exotics. You
probably never hit exactly this problem, but still understanding of such
effects may be helpful for related cases.
Workaround is dead simple – call size() on
cache just before issuing an aggregator. size() will hit
backing map, it will have a chance to process expiry, and by the moment of aggregator
arrival dead entries will be removed from “key index” thus no unexpected
read-though would happen.
Conclusion
Live is full of surprises when it comes to
complex distributed systems. Keep your eyes open ;)
No comments:
Post a Comment