Tuesday, December 4, 2012

Coherence 101, Beware of cache listeners

Cache events facility is a quite useful feature of Oracle Coherence. For example, continuous queries and near cache features are build on top of cache event system.
Unfortunately it could be also abused easily. In particular, they are noticeably bad at scale unless you are very careful.
Please note. This article is covering only partitioned cache topology (distributed cache scheme).

Client side map listeners

UPDATE: I was very wrong in my previous description of client side synchronous map listeners. Section below was rewritten to reflect more accurate picture.

Client side map listeners are usually added via NamedCache API. They typically receive events from caches hosted on remote JVMs (storage nodes). But regardless of whenever cache event is produced at remote or local JVM, Coherence will deliver it to listeners using dedicated event dispatch thread (or service thread itself for listeners marked as synchronous).

Each cache service has only one event dispatch thread, and it could easily become a bottle neck, limiting speed of cache event processing on client.

Few tips to mitigate this design aspect are below.

  • Do not do anything time consuming in listener itself, offload processing to other thread instead.
  • Be careful with synchronization – avoid lock contention in listener code.
  • When event hits your listener, its data are still in binary form. To avoid deserialization cost, do not access key or value in event dispatch thread, instead pass reference to map event object to own processing thread (or thread pool).

Last advice may not be intuitive, but deserialization of map event in Coherence’s event dispatch thread often becomes a bottleneck slowing down event processing rate.

Synchronous and normal map listeners

There is a marker interface SynchronousListener in com.tangosol.util package.
You could implement it in your map listener. But this wouldn’t make map event delivery to your listener synchronous with cache operation (as you may think), instead it would affect in which thread your listener is invoked.

Normal listeners are invoked in event dispatch thread.

“Synchronous” listeners would be invoked in service thread

What are the differences?

  • Imagine you have near cache and you are using entry processor to update entry. If cache event would be processed in event dispatch thread, data in near cache may remain stale for short time between entry processor call have returned, but event is not processed yet.
    Using of synchronous listeners would solve this, because event would be guaranteed to be processed before processing response message from entry processor invocation.
  • Time consuming custom map listeners could slow down event dispatch thread increasing event delays. This would affect Coherence build-in facilities such as near caches and CQC would be affected because they use synchronous listeners internally - you can consider it extra level of protection from misbehaving developer :)

But let me stress it again, for any type of listener events are delivered asynchronously relative to other cluster nodes.

Backing map listeners

Backing map listeners are used less often (but being abused more frequently). Backing map listeners are usually configured via XML cache configuration and work on storage side.
On storage side, Coherence could use pool of worker threads to perform operations in parallel. You may assume that you backing map listener would also be invoked in parallel …
… but that is wrong. Backing map listener could process one map event at time for given cache, regardless of thread pool size.
First time, I was also surprised by such behavior. This is not fundamental limitation of Coherence, but all out-of-box variations of backing map use cache global lock to dispatch map event. Even for partitioned backing map Coherence will use ObservableSplittingBackingMap wrapper which is, again, using global lock.
So, if you are using backing map listeners, be aware of that limitation. Live object pattern also relay on backing mapping listener and thus limited by this scalability constraint.

Map triggers

Fortunately map triggers work as a part of cache update transaction on cache service level. In other words map trigger would not harm performance more than entry processors do.
One possible workaround for baking map listeners concurrency issue could be invocation of map listener from map trigger.

9 comments:

  1. Your diagram for the synchronous listeners shows the event processing on the listener executing synchronously within the put.
    Isn't the put on the service thread and not the event dispatch thread so the put should be able to complete asynchronously with the update event processing on the clients?

    ReplyDelete
  2. That is way it is called "synchronous" PUT is not complete until all "synchronous" listeners are completed.
    It doesn't mean that worker thread execution PUT is waiting for them. But client requested a PUT will not get ACK until "synchronous" have being executed.
    This feature allows for example keep strong consistency guaranties for near caches.

    ReplyDelete
    Replies
    1. I have found myself to be wrong. If PUT and listener are on different cache node. Cache event will be processes asynchronously anyway.

      Article was updated.

      Delete
  3. Hi Alexey,
    I have two questions:
    1. Did you mean ObservableSplittingBackingMap in the diagram for backing map listeners?
    2.You said "Backing map listener could process one map event at time for given cache, regardless of thread pool size". Does it mean only one mapevent will be handled by a cacheservice (which can have many caches under it) at one time, as I expect there would be a single eventDispatchthread per cacheservice.

    ReplyDelete
    Replies
    1. 1. A sort of :) , I had "backing map implementing ObservableMap" in my mind
      2. Backing map event listeners are called from cache service thread pool (if enabled). It is possible for events from DIFFERENT caches to be processed concurrently.
      Event dispatch thread is used only cache listeners (not backing map listeners).

      Delete
  4. Hi Alexey,

    We have two storage enabled nodes which are not in application's heap. After some time , cache operations are getting slower in time and in the end ,like after 6 days, it stops to get service from coherence cache. But the case is, my listeners are all good and healty even in struggling time. There is no memory, cpu or networking problems. Can this be a problem about working thread's count?

    ReplyDelete
    Replies
    1. You should try to raise your problem at Coherence support forum (https://forums.oracle.com/forums/forum.jspa?forumID=480&start=0). It has very helpful community.

      Delete
  5. You could use a pool of worker threads that the backing map listener hands of processing too, to prevent stalling the service thread

    ReplyDelete
    Replies
    1. Yes, but this would make your listeners asynchronous.
      Using map triggers instead of backing map listeners IMHO is a better approach. Trigger have documented semantics and executed concurrently.

      Delete