Alexey Ragozin: 2012

Tuesday, December 4, 2012

Coherence 101, Beware of cache listeners

Cache events facility is a quite useful feature of Oracle Coherence. For example, continuous queries and near cache features are build on top of cache event system.

Unfortunately it could be also abused easily. In particular, they are noticeably bad at scale unless you are very careful.

Please note. This article is covering only partitioned cache topology (distributed cache scheme).

Client side map listeners

UPDATE: I was very wrong in my previous description of client side synchronous map listeners. Section below was rewritten to reflect more accurate picture.

Client side map listeners are usually added via NamedCache API. They typically receive events from caches hosted on remote JVMs (storage nodes). But regardless of whenever cache event is produced at remote or local JVM, Coherence will deliver it to listeners using dedicated event dispatch thread (or service thread itself for listeners marked as synchronous).

Each cache service has only one event dispatch thread, and it could easily become a bottle neck, limiting speed of cache event processing on client.

Few tips to mitigate this design aspect are below.

Do not do anything time consuming in listener itself, offload processing to other thread instead.
Be careful with synchronization – avoid lock contention in listener code.
When event hits your listener, its data are still in binary form. To avoid deserialization cost, do not access key or value in event dispatch thread, instead pass reference to map event object to own processing thread (or thread pool).

Last advice may not be intuitive, but deserialization of map event in Coherence’s event dispatch thread often becomes a bottleneck slowing down event processing rate.

Synchronous and normal map listeners

There is a marker interface SynchronousListener in com.tangosol.util package.
You could implement it in your map listener. But this wouldn’t make map event delivery to your listener synchronous with cache operation (as you may think), instead it would affect in which thread your listener is invoked.

Normal listeners are invoked in event dispatch thread.

“Synchronous” listeners would be invoked in service thread

What are the differences?

Imagine you have near cache and you are using entry processor to update entry. If cache event would be processed in event dispatch thread, data in near cache may remain stale for short time between entry processor call have returned, but event is not processed yet.
Using of synchronous listeners would solve this, because event would be guaranteed to be processed before processing response message from entry processor invocation.
Time consuming custom map listeners could slow down event dispatch thread increasing event delays. This would affect Coherence build-in facilities such as near caches and CQC would be affected because they use synchronous listeners internally - you can consider it extra level of protection from misbehaving developer :)

But let me stress it again, for any type of listener events are delivered asynchronously relative to other cluster nodes.

Backing map listeners

Backing map listeners are used less often (but being abused more frequently). Backing map listeners are usually configured via XML cache configuration and work on storage side.

On storage side, Coherence could use pool of worker threads to perform operations in parallel. You may assume that you backing map listener would also be invoked in parallel …
… but that is wrong. Backing map listener could process one map event at time for given cache, regardless of thread pool size.

First time, I was also surprised by such behavior. This is not fundamental limitation of Coherence, but all out-of-box variations of backing map use cache global lock to dispatch map event. Even for partitioned backing map Coherence will use ObservableSplittingBackingMap wrapper which is, again, using global lock.

So, if you are using backing map listeners, be aware of that limitation. Live object pattern also relay on backing mapping listener and thus limited by this scalability constraint.

Map triggers

Fortunately map triggers work as a part of cache update transaction on cache service level. In other words map trigger would not harm performance more than entry processors do.

One possible workaround for baking map listeners concurrency issue could be invocation of map listener from map trigger.

Monday, October 1, 2012

Safepoints in HotSpot JVM

Term Stop-the-World pause is usually associated with garbage collection. Indeed GC is a major contributor to STW pauses, but not the only one.

Safepoints

In HotSpot JVM Stop-the-World pause mechanism is called safepoint. During safepoint all threads running java code are suspended. Threads running native code may continue to run as long as they do not interact with JVM (attempt to access Java objects via JNI, call Java method or return from native to java, will suspend thread until end of safepoint).

Stopping all threads are required to ensure what safepoint initiator have exclusive access to JVM data structures and can do crazy things like moving objects in heap or replacing code of method which is currently running (On-Stack-Replacement).

How safepoints work?

Safepoint protocol in HotSpot JVM is collaborative. Each application thread checks safepoint status and park itself in safe state in safepoint is required.

For compiled code, JIT inserts safepoint checks in code at certain points (usually, after return from calls or at back jump of loop). For interpreted code, JVM have two byte code dispatch tables and if safepoint is required, JVM switches tables to enable safepoint check.

Safepoint status check itself is implemented in very cunning way. Normal memory variable check would require expensive memory barriers. Though, safepoint check is implemented as memory reads a barrier. Then safepoint is required, JVM unmaps page with that address provoking page fault on application thread (which is handled by JVM’s handler). This way, HotSpot maintains its JITed code CPU pipeline friendly, yet ensures correct memory semantic (page unmap is forcing memory barrier to processing cores).

When safepoints are used?

Below are few reasons for HotSpot JVM to initiate a safepoint:

Garbage collection pauses
Code deoptimization
Flushing code cache
Class redefinition (e.g. hot swap or instrumentation)
Biased lock revocation
Various debug operation (e.g. deadlock check or stacktrace dump)

Trouble shooting safepoints

Normally safepoints just work. Thus, you can care less about them (most of them, except GC ones, are extremely quick). But if something can break it will break eventually, so here is useful diagnostic:

-XX:+PrintGCApplicationStoppedTime – this will actually report pause time for all safepoints (GC related or not). Unfortunately output from this option lacks timestamps, but it is still useful to narrow down problem to safepoints.
-XX:+PrintSafepointStatistics –XX:PrintSafepointStatisticsCount=1 – this two options will force JVM to report reason and timings after each safepoint (it will be reported to stdout, not GC log).

References

· How does JVM handle locks – quick info about biased locking

· HotSpot JVM thread management

Thursday, September 6, 2012

Back to school, tech meet-ups in Moscow this September

I’m regularly hosting meeting of technology enthusiasts in Moscow, and it is time to announce couple of events for this month. But wait, we are in Moscow so let me speak Russian.

Сегодня, я хочу сделать анонс очередных двух круглых столов IT энтузиастов в Москве.

13ое сентября

Вячеслав Воробьёв расскажет нам о проблеме автоматического планирования.

Проблема всплыла за кружкой пива после последней нашей встречи посвящённой автоматизации деплоймента и Вячеслав любезно согласился сделать ликбез посвящённый академическому подходу к проблеме.<

27ое сентября

Data persistence, lies, darn lies and delusions - флейм на тему СУБД от вашего покорного слуги.

Я хочу вспомнить о фундаментальных основах систем хранения данных: механизмы изоляции транзакций, WAL и undo логи, двух фазный коммит и paxos, BTrees и LSM-Trees – а потом хорошенько потролить NoSQL, SQL и вообще всё что как-то хранит данные.

Зарегистрироваться на круглые столы и подписаться на анонс мероприятий можно по ссылке.

http://aragozin.timepad.ru/

Wednesday, June 6, 2012

Story of relentless CMS

Recently, a comment on other article of this blog has led me to noteworthy issue with CMS (Concurrent Mark Sweep) garbage collector.

Problem has appeared after minor release of application. JVM was configured to use CMS and it was working fine, but after a change its behavior has changed. Normally CMS is doing collection cycle when old space usage meets certain threshold, so you can see famous saw of heap usage.

Most time CMS is staying idle, just occasionally doing a collection. But after release, heap usage diagram have changed to something like this.

For some reason CMS is not waiting for heap usage threshold anymore and doing one old space GC cycle right after another. Heap usage diagram may not look bad by itself, but continuous background GC means that at least one core is constantly occupied by marking and sweeping old space. In addition, sweeping over memory also impacts cache utilization in CPUs causing additional impact to performance.

My first guess about –XX:+UseCMSInitiatingOccupancyOnly flag being missed was wrong (you know without that flag JVM could adjust CMS initiation threshold at runtime according to internal heuristics), CMS setup was fine.

After scanning through options, –XX:+CMSClassUnloadingEnabled flag has drawn my attention. By default CMS will not collect permanent space; you should use that flag to enable it. Permanent space is a special memory space used by Java class objects and some JVM data structures; it is not part of application heap and being sized separately. It means, in particular, that permanent space has its own memory usage which is not correlated with old space memory usage.

So, if CMS for permanent space is enabled, GC cycle will be triggered if either old space or permanent space has reached usage threshold. This turned out to be a problem. Permanent space was a little too small for application, so CMS were trying to collect it relentlessly.

Increasing permanent space size (-XX:PermSize=size) has solved an issue.

Alternative approach could be using different threshold for old and permanent space (i.e. –XX: CMSInitiatingPermOccupancyFraction=percent). Also it may make a sense to turn off permanent space collection at all, many applications just do not need it (it was called “permanent” for reason after all).

Thursday, May 31, 2012

Tech talk at London Coherence SIG: Database Backed Cache, Tips, Tricks and Patterns

Today I was speaking at London Coherence SIG. Below you can find slides from my presentation.

London Coherence SIGs are never boring, but this one was especially interesting.

We had two presentations from Randy Stafford (Oracle), Groovy presentation from Jonathan "Gridman" Knight, yet another Coherence transaction framework (and counting) from David Whitmarsh, "lore" about read-write-backing-map from Phil Wheeler and other interesting stuff.

I'm really glad, I was able to get to this event.

Four flavours of "Out of Memory" in HotSpot JVM

If you are working with Java, I bet you are well aware about OutOfMemoryError. But did you know that there are 4 different conditions when OutOfMemoryError is being thrown in Oracle’s HotSpot JVM? These conditions can be distinguished by message provided by exception.

OutOfMemoryError: Java heap space

This is probably most common case for out of memory error. It indicates that heap space is not large enough to hold object created by application. To be precise, this exception is thrown if, after last full GC cycle, free space in heap is below 2% (can be configured via –XX:GCHeapFreeLimit=p JVM option).

Normally you either have to increase JVM heap size or fix memory leak in your application to remedy this.

OutOfMemoryError: GC overhead limit exceeded

This problem is trickier. It means that GC is spending “too much” time cleaning memory. Portion of free space may be above 1%, but process is spending 50 times more time managing memory than actually executing application code. Threshold can be tweaked via -XX:GCTimeLimit=p JVM option, default is 2% (1/50). Usually if you will increase heap size problem will go away. But this error may also indicate that heap is badly configured, i.e. young space is too small for application.

OutOfMemoryError: PermGen space

As you may know, HotSpot JVM is using special memory space for certain internal structures – permanent space. Despite its name object from permanent space could be collected (name is just a historic artifact). But you may run out of memory in perm space same way as you can in normal heap. Thing which may affect permanent space usage are

loading/generating classes and creating with class loaders,
reflection,
calling String.intern() method.

Permanent space does not count to application heap size. If you need to resize it, use –XX:MaxPermSize=s JVM option.

OutOfMemoryError: Direct buffer memory

All previous types of OOME were indicating that GC failed to free memory for new Java object (i.e. they are thrown as result of GC cycle). This last type of OOME is different. Since Java 1.5, JVM have API to manage memory outside of heap. Out of heap memory is not a subject for garbage collection, but is also limited. Capacity of off heap memory pool is configured via –XX:MaxDirectMemorySize=s JVM option.

My statement about off heap memory not being garbage collected is only partially true. Off heap memory blocks are deallocated in ByteBuffer class finalizer so garbage collection of heap is also driving reclamation of associated off-heap memory.

Friday, May 25, 2012

Tech talk at DUMP IT (Ekaterinburg), Garbage collection in JVM

Today I was speaking at Ekaterinburg at DUMP-IT.
Below are slides from presentation (in russian).

Thursday, May 17, 2012

Tech meet up, distributeted caching and data grid, Moscow 17 May

I would like to announce tech meet up devoted to topic of caching in distributed systems and data grid technology. Event will be held at Moscow on May 17.

Slides from event:

Main talk by Max Alexejev

Bonus presentation by me

Monday, May 14, 2012

Asymmetric read/write path – a trend for scalable architecture

A few years ago I was blogging about architecture using composition of different storage technologies for queering and persistence of data. Rationale behind this was further explained in other my post.

Below is a sketch of such composition:

Unlike classical Von Neumann's memory hierarchy, this composition is asymmetric in terms of read and write paths.

I’m glad to see similar ideas implemented in commercial products marking a trend. In this post I want draw your attention to two very interesting middleware products having principle of composition of specialized storages in their core.

CloudTran

CloudTran is very ambitious product promising performance and scalability for a wide class of application without much extra effort.

CloudTran leverages Oracle Coherence and its integration with EclipseLink to build scalable applications using JPA for persistence. Coherence + EclipseLink (TopLink Grid) is already capable of executing JPQL queries in cache instead of database using rich querying capabilities of Coherence. Missing piece in this tandem was transaction support.

CloudTran is filling this gap adding specialized component for managing transactions. Durability of CloudTran’s transactions is provided by write-ahead disk log (many RDBMSes are using same technique). But unlike RDBMS, CloudTran’s log is not limited to single disk/server, it is distributed (same way as data in Coherence grid) and can benefit from throughput of dozens of disks in cluster.

Full picture of CouldTran based solution is triangle of technologies:

Coherence for fast data retrieval,

CloudTran transaction log for fast and durable transaction persistence,

backend database (relational or NoSQL) is a system of record and long term storage.

Backend database being updated asynchronously is on critical path for neither read nor write, thus database is not limiting application performance. On other side data in Coherence are updated synchronously, so application logic can enjoy strong consistency and ACID transactions (which is huge, eventual constancy is a lot of pain for typical enterprise application with sophisticated data model).

Datomic

Datomic is another young and interesting product promising combination of ACID and scalability. Datomic is also featuring triangle of technologies, but it has own implementation for both in-memory database/cache and transaction persistence (that component is called transactor in Datomic). For system of record you can use either RDBMS or NoSQL storage (Amazon Dynamo). Datomic is offering own API (and own unique approach) for working with data. Datomic is highly influenced by functional paradigm, which would probably make porting existing applications to Datomic non trivial, but for new projects idea of simple but scalable platform featuring ACID data manipulation may be attractive.

Cool toys for enterprise developers

I’m very glad to see such innovative products addressing scalability in not-so-fancy class of enterprise application (of cause both products are not limited to enterprise). IMHO there are enough clones of Google’s BigTable and Amazon’s dynamo in this world already. People working on inventories, reservation systems and other enterprisy stuff also need cool distributed toys.

I’m a little skeptical about future for these products (goals they have set for themselves are just too challenging), but I sincerely which luck to both projects.

Please prove my skepticism wrong ;)

Wednesday, April 25, 2012

Using HotSpot Attach API for fun and profit

JConsole and its successor JVisualVM are awesome tools, but most of information used by these tools is available via JMX for any JMX capable tool (though JVisualVM extensively use other APIs too). JMX and especially set of standard JVM MBeans are awesome too. You can get a lot diagnostic from stock JVM for free.

Damn, would JMX have standard restful HTTP protocol it would be best thing since sliced bread! :)

But despite their awesomeness, there are few things which regularly frustrate me. Here they are

I often have SSH only access to server box (other ports are blocked). JMX is using java RMI and RMI is very unfriendly to constrained networks. While it is possible to setup RMI tunneling via SSH it is a real pain (and for chained SSH environment it is hardly possible at all).

I want to see portion of CPU consumed by threads. You know like top but for threads in JVM. Why this simple thing is not in JConsole?

I still have to parse GC logs. Heap usage diagrams are nice, but I need numbers. Visual GC is cool at diagramming, though.

What I would really like is a command line tool which could get me that information from running JVM.

With Java 1.6 and above it is actually fairly simple to achieve, thank to attach API (which allows you to execute agent on other JVM by process ID). Some digging into JConsole code and woala … my wish come true.

jtop

jtop connects to JVM by PID and periodically dumps thread CPU utilization to a console. Just type

java -jar jtop.jar <PID>

and you get thread list with CPU details

2012-04-25T00:38:07.799+0400 CPU usage 
  process cpu=2.96%
  application: cpu=2.02% (user=1.24% sys=0.78%)
  other: cpu=0.93% 
[000001] user=0.00% sys=0.00% - main
[000002] user=0.00% sys=0.00% - Reference Handler
[000003] user=0.00% sys=0.00% - Finalizer
[000004] user=0.00% sys=0.00% - Signal Dispatcher
[000005] user=0.00% sys=0.00% - Attach Listener
[000018] user=0.00% sys=0.00% - CLI Requests Server
[000021] user=0.00% sys=0.00% - AWT-EventQueue-0
[000022] user=0.00% sys=0.00% - AWT-Shutdown
[000023] user=0.00% sys=0.00% - Inactive RequestProcessor thread [Was:Explorer Builder Processor/com.sun.tools.visualvm.core.explorer.ExplorerModelBuilder$1]
[000026] user=0.00% sys=0.00% - TimerQueue
[000028] user=0.00% sys=0.00% - Inactive RequestProcessor thread [Was:TimedSoftReference/org.openide.util.TimedSoftReference]
[000032] user=0.31% sys=0.78% - Timer-1
[000036] user=0.00% sys=0.00% - RMI TCP Accept-0
[000037] user=0.93% sys=0.00% - RMI TCP Connection(2)-192.168.0.105
[000038] user=0.00% sys=0.00% - RMI Scheduler(0)
[000039] user=0.00% sys=0.00% - JMX server connection timeout 39
[000040] user=0.00% sys=0.00% - JMX server connection timeout 40

gcrep

Like jtop, gcrep connects to running JVM, but instead of thread usage, it collects GC information and mimic GC log.

java -jar gcrep.jar <PID>

[GC: G1 Young Generation#30 time: 54ms mem: G1 Survivor: 2048k+0k->2048k G1 Eden: 93184k-93184k->0k G1 Old Gen: 72253k+134k->72387k]
[GC: G1 Young Generation#31 time: 51ms interval: 9648ms mem: G1 Survivor: 2048k+0k->2048k[rate:0.00kb/s] G1 Eden: 93184k-93184k->0k[rate:-9658.37kb/s] G1 Old Gen: 72387k+0k->72387k[rate:0.00kb/s]]

Unlike normal GC logs, here I have detailed information for each space, deltas and approximate allocation rate. Of cause, you can get same thing for GC log with just few hundreds of awk script lines, provided GC log output is not screwed by race condition between concurrent phases (which is happen sometime).

You can grab jars from code.goole.com/p/gridkit

Sources are available at GitHub https://github.com/aragozin/jvm-tools

Both of my tools are very rough at the moment. They just fit my personal needs, but you can easily modify code and may be send a patch back to me.

Friday, April 13, 2012

Coherence 101, few things about read through you may want to know

Read-through is a technique which allows cache to automatically populate entry for external data source up on cache miss. Oracle Coherence supports this technique via read-write-backing-map and application provided cache loaders (you can read more in Coherence documentation).

CacheLoader/CacheStore vs. BinaryEntryStore

You cache loader/store plug-in may either implement CacheLoader/CacheStore interface or BinaryEntryStore interface. BinaryEntryStore have following key advantages:

It can work with binary objects, which allows you to avoid unneeded serialization/deserialziation in some case.

It is possible to distinguish inserts vs. updates using BinaryEntryStore. BinaryEntryinterface provides you access to both new and previous version of value, this may be very useful.

Why Coherence is doing load() before store()?

Assume that we working with key which does not exist in cache. If you just put(…) new key via named cache interface, Coherence would work as expected. It will add object to a cache and call store(…) in cache store plug-in. But if you will use entry processor and setValue(…) for entry which is not in cache – surprise, surprise – Coherence will first load(…) key and then store(…) new value.

Reason is simple, setValue(…) should return pervious value as result of operation. Use other version of method – setValue(value, false) to avoid unnecessary load(…) call. BTW way putAll(…) should be preferred over put(…) for same reason – putAll(…) is not required to return previous value.

load() vs. loadAll() methods

Assume that your cache loader using SQL to fetch data from RDBMS. It is clear what single SQL select retrieving N entries (e.g. using in (…) in where clause) at once is better than N subsequent SQL selects each fetching only one entry.

Prior to Coherence 3.7, read-write backing map implementation were using sequential approach (making bulk cache preloading with read-though impractical). In Coherence 3.7 this was fixed (but you should use at least 3.7.1.3 version, earlier versions have known bugs related to read-through).

So, in 3.7 getAll() will use loadAll() under hood (but remember that your key set will be split to partitions, distributed across storage members and each storage member will process read-though in partition-parallel fashion).

But will it work with aggregators and entry processors invoked over collection of keys? – not so fast …

BTW If you stick with 3.6 or earlier you can read about work around here.

Aggregator warm up

Assume that you know key set you want to aggregate using Coherence distributed aggregation, but some many of these keys may not be in cache (i.g. not-yet-loaded). Read-though is enabled.

Instance of your aggregator started on storage node will receive set of BinaryEntrys from Coherence. But it does mean that all these entries are present in cache, Coherence will not try to preload working set for aggregator. Actually aggregator may decide to ignore data-not-in-cache (see isPresent() method). But if it call any kind of “get” methods on entry, Coherence will load value via cache loader plug-in. Problem is – it will be done in sequential manner, so this may take A LOT of time.

Can we work this around? - Sure.

Simplest workaround is call getAll() before invoking aggregator (but it kills idea of distributed aggregation). A smarter way is dig though internal cache layers and load entries via call to read-write-backing-map. Snippet below can be used for effective preloading for set of entries in aggregators and entry processors.

public static void preloadValuesViaReadThrough(Set<BinaryEntry> entries) {
 CacheMap backingMap = null;
 Set<Object> keys = new HashSet<Object>();
 for (BinaryEntry entry : entries) {
  if (backingMap == null) {
   backingMap = (CacheMap) entry.getBackingMapContext().getBackingMap();
  }
  if (!entry.isPresent()) {
   keys.add(entry.getBinaryKey());
  }
 }
 backingMap.getAll(keys);
}

Aggregation, expiry and past expiry entry resurrection

If you are using read-write backing map in combination with expiry, you may be prone to following effect.

Assume that your cache is idle for some time and some of cache entries are already past their expiry. Now you are issuing an aggregator over all cache data (in my case it was a regular housekeeping job interested only in live cache data). Filters in Coherence can match only cache data (they never trigger read-though), but surprisingly, operation described above starts storming DB with read-through requests!

What has happen?

Lazy expiry

Local cache (acting as internal map for read-write backing map) is doing expiry passively. If you are not touching it, it cannot expire anything. But if you call any of its method, expiry check will be triggered and entries may be physically removed for cache at this point.

Key index of partitioned cache service

Partitioned cache service has internal structure called “key index” – it is simply a set of all keys in local backing map. When you issuing a filter based operation, Coherence calculates key set first (using filter), then perform operation (e.g. aggregation) over know set of keys. A set of all keys are passed to filter, then filter may decide which keys to process (it can consult with indexes at this point) and whenever further filtering by value is required. AlwaysFilter is very simple; it does not require any value filtering, so Coherence just passing whole “key index” content as input for aggregation without consulting with backing map.

Together

A lot of entries in cache are past expiry, but they are still in cache because it is idle and local cache has no opportunity to perform expiry check. Aggregator with AlwaysFilter is issued, and Coherence storage member will perform aggregation against all keys currently in “key index” (including key past their expiry). Access to first entry from aggregator will trigger expiry check in backing map, effectively wiping out expired entries. But aggregator instance is already started and its entry set already has these keys. By processing recently expired entries, which are in its entry set, aggregator will be triggering read-though resurrecting them (and of cause it would be doing it one by one – read SLOW).

How to prevent this?

Well, my conditions are little exotics. You probably never hit exactly this problem, but still understanding of such effects may be helpful for related cases.

Workaround is dead simple – call size() on cache just before issuing an aggregator. size() will hit backing map, it will have a chance to process expiry, and by the moment of aggregator arrival dead entries will be removed from “key index” thus no unexpected read-though would happen.

Conclusion

Live is full of surprises when it comes to complex distributed systems. Keep your eyes open ;)

Tuesday, April 10, 2012

Coherence, managing multiple Extend connections

Coherence*Extend is a protocol which is used for non-members of cluster to get access to Coherence services. Extend is using TCP connection to one of cluster members (which should host proxy service) and use this member as a relay. Normally client process is creating single TCP connection. This connection is shared between allNamedCache instances and threads of client process. To be more precise, it is creating single TCP connection per remote service, but normal you have just one remote cache service (and in some case another remote invocation service).

Of cause TCP connection will failover automatically is case of proxy or network problems, so for most practical cases it is ok to share single TCP connection per process. Proxy member of cluster is acting as a relay (in most cases it doesn’t even deserialize data passing through), so single client process is unlikely to overload proxy process … unless you are using invocation services. In case of invocation service proxy is performing logic on behalf of client and it can be arbitrary complex, so it may be desirable to spread requests across multiple proxy processes.

Here is simple trick, we should create as many remote services as connections we want to have. There is a slight problem, you cannot have same cache name associated with different remote services at the same time … unless you are using multiple cache factories.

Below is snippet of code which is manages creating cache factories and mangling service names to create separate set of Extend connections perExtendConnection instance.

public class ExtendConnection {

    private static Logger LOGGER = Logger.getLogger(ExtendConnection.class.getName());
    
    private static AtomicInteger CONNECTION_COUNTER = new AtomicInteger();

    private int connectionId = CONNECTION_COUNTER.incrementAndGet();
    private ConfigurableCacheFactory cacheFactory;
    private ConcurrentMap<Service, Service> activeServices = new ConcurrentHashMap<Service, Service>(4, 0.5f, 2);
    
    public ExtendConnection(String configFile) {
        cacheFactory = initPrivateCacheFactory(configFile);
    }

    private DefaultConfigurableCacheFactory initPrivateCacheFactory(String configFile) {
        LOGGER.info("New Extend connection #" + connectionId + " is going to be created, config: " + configFile);

        XmlElement xml = XmlHelper.loadFileOrResource(configFile, "Coherence cache configuration for Extend connection #" + connectionId);
        // transforming configuration
        XmlElement schemes = xml.getSafeElement("caching-schemes");
        for(Object o: schemes.getElementList()) {
            XmlElement scheme = (XmlElement) o;
            if (isRemoteScheme(scheme)) {
                String name = scheme.getSafeElement("service-name").getString();
                if (name != null) {
                    String nname = name + "-" + connectionId;
                    scheme.getElement("service-name").setString(nname);
                }
            }
        }
        
        DefaultConfigurableCacheFactory factory = new DefaultConfigurableCacheFactory(xml);
        return factory;
    }
    
    
    private boolean isRemoteScheme(XmlElement scheme) {
        String name = scheme.getName();
        return "remote-cache-scheme".equals(name) || "remote-invocation-scheme".equals(name);
    }


    public NamedCache getCache(String name) {
        NamedCache cache = cacheFactory.ensureCache(name, null);
        Service service = cache.getCacheService();
        activeServices.putIfAbsent(service, service);
        return cache;
    }

    public InvocationService getInvocationService(String serviceName) {
        InvocationService service = (InvocationService) cacheFactory.ensureService(serviceName + "-" + connectionId);
        activeServices.putIfAbsent(service, service);
        return service;
    }

    /**
     * Warning: this method is not concurrency safe, you may get to trouble if you are accessing caches of services via this connection during shutdown.
     */
    public void disconnect() {
        for(Service service:  new ArrayList<Service>(activeServices.keySet())) {
            try {
                if (service.isRunning()) {
                    service.stop();
                }
            }
            catch(Exception e) {
                LOGGER.log(Level.WARNING, "Exception during remote service shutdown", e);
            }
        }
    }
}

Each instance of class above manages physical TCP Extend connection (or multiple if you have multiple remote services in configuration). To create multiple connections just create multiple instances, but make sure that you are not leaking them. Extend connections will not be closed automatically by GC, so you should pool them carefully.

This technique is also useful if you want to keep connections to several different clusters at the same time.

Monday, April 9, 2012

Tech talk at CloudCamp Moscow, HPC - could point of view

Today I was speaking at CloudCamp Moscow.
Below is slidedeck.

Monday, April 2, 2012

Tech talk at RIT++ Moscow, Open source search solutions

Today I was speaking at RIT++ at Moscow.
Below are slides at russian.

Wednesday, March 28, 2012

Secret HotSpot option improving GC pauses on large heaps

my Patch mentioned in this post (RFE-7068625) for JVM garbage collector was accepted into HotSpot JDK code base and available starting from 7u40 version of HotSport JVM from Oracle.

This was a reason for me to redo some of my GC benchmarking experiments. I have already mentioned ParGCCardsPerStrideChunk in article related to patch. This time, I decided study effect of this option more closely.

Parallel copy collector (ParNew), responsible for young collection in CMS, use ParGCCardsPerStrideChunk value to control granularity of tasks distributed between worker threads. Old space is broken into strides of equal size and each worker responsible for processing (find dirty pages, find old to young references, copy young objects etc) a subset of strides. Time to process each stride may vary greatly, so workers may steal work from each other. For that reason number of strides should be greater than number of workers.

By default ParGCCardsPerStrideChunk =256 (card is 512 bytes, so it would be 128KiB of heap space per stride) which means that 28GiB heap would be broken into 224 thousands of strides. Provided that number of parallel GC threads is usually 4 orders of magnitude less, this is probably too many.

Synthetic benchmark

First, I have run GC benchmark from previous article using 2k, 4k and 8K for this option. HotSpot JVM 7u3 was used in experiment.

It seems that default value (256 cards per strides) is too small even for moderate size heaps. I decided to continue my experiments with stride size 4k as it shows most consistent improvement across whole range of heap sizes.

Benchmark above is synthetic and very simple. Next step is to choose more realistic use case. I usual, my choice is to use Oracle Coherence storage node as my guinea pig.

Benchmarking Coherence storage node

In this experiment I’m filling cache node with objects (object 70% of old space filled with live objects), then put it under mixed read/write load and measuring young GC pauses of JVM. Experiment was conducted with two different heap sizes (28 GiB and 14 GiB), young space for both cases was limited by 128MiB, compressed pointers were enabled.

Coherence node with 28GiB of heap

JVM	Avg. pause	Improvement
7u3	0.0697	0
7u3, stride=4k	0.045	35.4%
Patched OpenJDK 7	0.0546	21.7%
Patched OpenJDK 7, stride=4k	0.0284	59.3%

Coherence node with 14GiB of heap

JVM	Avg. pause	Improvement
7u3	0.05	0
7u3, stride=4k	0.0322	35.6%

This test is close enough to real live Coherence work profile and such improvement of GC pause time has practical importance. I have also included JVM built from OpenJDK trunk with enabled RFE-7068625 patch for 28 GiB test, as expected effect of patch is cumulative with stride size tuning.

Stock JVMs from Oracle are supported

Good news is that you do not have to wait for next version of JVM, ParGCCardsPerStrideChunk option is available in all Java 7 HotSpot JVMs and most recent Java 6 JVMs. But this option is classified as diagnostic so you should enable diagnostic options to use it.

-XX:+UnlockDiagnosticVMOptions

-XX:ParGCCardsPerStrideChunk=4096

Tuesday, March 13, 2012

Using Thrift in Coherence

Coherence provides you two built-in options for serialization format for your object: Java serialization and POF. But you are not limited to this option. You can totally different way of serialization using custom Serializer.

Why use alternative serialization?

If you think that Thrift or Protobuf would be better in speed or size compared to POF, that is probably not true. I did a benchmark using this framework, POF was scoring slightly better than both Thrift and Protobuf. In addition, POF can extract attributes without deserialization of whole object.

Only serious reason I can think of – you already alternative serialization implemented for you object and do not want support multiple format. If it is your can, using alternative serialization in Coherence is perfectly justified.

Catch

So you already have, serialization format for your domain objects you are happy this. But besides domain objects, you custom serializer should also support: standard java types (including collections), internal Coherence classes and custom entry processors, aggregations, filter use by your application (if any). So, custom serializer is not a practical option.

Hybrid POF + Thrift serializer

Solution is simple; use your alternative format for domain objects and POF for anything else. Here is example using Thrift.

pof-config.xml

<pof-config>
 
    <user-type-list>
 
        <!-- Include definitions required by Coherence -->
        <include>coherence-pof-config.xml</include>
 
        <!--
            You should declare type ID for each thrift class you are going to use in Coherence
        -->
        <user-type>
            <type-id>1000</type-id>
            <class-name>org.gridkit.sample.MyObject</class-name>
            <serializer>
                <class-name>org.gridkit.coherence.utils.thift.ThriftPofSerializer</class-name>
            </serializer>
        </user-type>
 
        ...
 
        <!-- Usual POF declarion for application non-thrift classes -->
 
        <user-type>
            <type-id>1100</type-id>
            <class-name>org.gridkit.coherence.sample.SampleEntryProcessor</class-name>
        </user-type>
           
    </user-type-list>
        
</pof-config>

ThriftPofSerializer.java

public class ThriftPofSerializer implements PofSerializer {

 private Constructor<?> constructor;
 private TSerializer serializer;
 private TDeserializer deserializer;
 
 
 public ThriftPofSerializer(int typeId, Class type) {
  try {
   this.constructor = type.getConstructor();
   this.constructor.setAccessible(true);
   this.serializer = new TSerializer();
   this.deserializer = new TDeserializer();
  } catch (Exception e) {
   throw new RuntimeException(e);
  }
 }

 @Override
 @SuppressWarnings("rawtypes")
 public void serialize(PofWriter out, Object obj) throws IOException {
  TBase tobj = (TBase) obj;
  byte[] data;
  try {
   data = serializer.serialize(tobj);
  } catch (TException e) {
   throw new IOException(e);
  }
  out.writeBinary(0, new Binary(data));
 }

 @Override
 @SuppressWarnings("rawtypes")
 public Object deserialize(PofReader in) throws IOException {
  try {
   byte[] data = in.readByteArray(0);
   TBase stub = (TBase) constructor.newInstance();
   deserializer.deserialize(stub, data);
   return stub;
  } catch (Exception e) {
   throw new IOException(e);
  }
 }
}

If some Thrift class in used by other Thrift classes but do not put in Coherence individually, you can omit it in pof-config.xml.

Pages