Wednesday, April 22, 2009

Cache for read and cache for write

Caches are everywhere. We have multiple layers of caches in CPU, we are caching disk blocks in memory and we have a lot of various caches in our applications. Having performance problems you usually have only two viable options (without throwing in more hardware, of cause) – reduce algorithmic complexity (use indexes, hashes, etc) or cache your hot data. All other optimizations in most cases are limited to dozens of percents in performance gain, while caching or indexing, done right, can improve performance ten fold.

Then we are saying caching, we usually have read operation in mind. Such things like in-memory-data-grids (IMDG) extends our vision a little. In-memory-data-grid makes it is possible to perform complex transaction processing in memory without touching disk or DB. But memory is volatile, and in most cases IMDG is just a facade in front of old, slow but reliable RDBMS. And here between lightning fast IMDG and RDBMS there is a place for a class of product not known to the market yet - transactional write cache.

But let me start from the distance, and explain may opinion step by step.

Architecture 1. Just caching
architecture-1
Read, average time – most operations are served by cache.
Read, max time – in case of cache miss, we should access DB (cache does not decrease max operation time).
Write – write through strategy, time is close to DB operation time (cache does not speed up write operations).

We use data grid for caching read access to database. Cache allows us to reduce average time of read operations (but max read time remains limited by DB read time). For some applications only reducing average time is not enough, they impose strict limitations of max operation time also. So we need to evolve architecture further.

Architecture 2. In memory data storage.

architecture-2


Read – all operations are served by in-memory-data-grid.
Write – write through strategy, time is close to DB operation time (cache does not speed up write operations).

Scheme is very similar to previous case, but now we keep all data in cache (no cache misses any more). Implementation of such cache in much more complex, but there are good products (both commercial and open source) on the market, which can do this hard work for you. Problem with read operation performance solved, but fro write operations database is still a bottle. If your business is low latency transactions processing, this architecture is still not good enough. Both read and write operations should be lightning fast, you just can’t afford to touch database at all.
Let’s see what we can do next to improve performance.

Architecture 3. In memory data processing with asynchronous DB writing.

architecture-3


Read – all operations are served by in-memory-data-grid.
Write
– all writes also served in in-memory-data-grid, write behind strategy is used to write changes to DB.

Now both read and write operations are served in memory. Changes are written to database asynchronously in the background. There is a possible lag between state of data in grid and in database, but both data in grid and data in database are consistent. Everything looks very good until you start thinking about disaster recovery. And here you have a problem, there are transactions that have been processed in grid but haven’t been written in database. In case of disaster these transaction will be lost.

Yes, IMDG products usually provides some level of fault tolerance, they usually can survive after losing one or more  servers in the grid (while they can rebalance load from fallen servers to surviving ones), but none of them cannot tolerate restarting of all grid. And no, this is just not enough, you cannot afford to loose a single transaction even if whole datacenter is down. In memory data grids can offer us really fast read access to data (including indexing and querying), but storage with fast and reliable persistent writes is out of their scope.

Hey! They are in-memory-data-grids, and they are really good at that they are intended to do. You can’t expect them to do everything. :)

Here I want to conclude - we need another class of middleware products, which will offer us storage with fast and reliable persistent writes. Such persistent storage will be complimentary to IMDG products and will support their “cold start” recovery scenario.

Architecture 4. IMDG + disk based write cache.

architecture-4


Read – all operations are served by in-memory-data-grid.
Write – write through to disk storage; write behind strategy writes from disk storage to DB.

Read operations are served in memory. Write operations are served by disk based persistent storage, and later  being asynchronously written to database. Disk cache guaranties that after restart, all transactions will be recovered and end up in database. Cost of read operations is same as one in architectures 2 and 3, cost of write operations is limited by disk storage.

That benefits do we get introducing another layer? Why is it better than writing directly to DB?

Well, disk storage is a very specialized component (much simple than RDBMS). It should implement only simple set of operations: transactional writes and bulk read (required for recovery only).  Implementation of such storage can be very simple, yet efficient. Latency of write transaction in such storage will be very close to latency of disk transaction itself.

Ok, I think it is clear now that specialized disk storage is more efficient than database, but still there are open questions.

Why should it be implemented as separate component, shouldn’t we integrate it with data grid (usually IMDG products already provide some disk persistent for overflow)?

For me, it has taken to implement such disk storage solution as IMDG extension to understand, that it should be a separate component or may be event separate product.

IMDG and disk storage address different problems, and they use very different approaches. Disks have greater capacity than memory. IMDG often distribute data across network of nodes to increase capacity of grid, but it does not make sense in case of disk storage (even if capacity of one disk is not enough, disk arrays or SAN can address problem of capacity). Number of nodes (processes) also participating in IMDG and disk storage also different.

It is hard to bring strong arguments, but I'm quite confident in this opinion. Devil is in the details.

Retrospective.

This was not first time I was working with architecture of type 3. I start wondering how we were working without dedicated persistent storage layer before. And I was a little surprised when I understood that messaging middleware can, to some extent, fill the gap of write cache.

architecture-5

Read - all operations are served by data grid.
Write - changes are published on message bus, and later written to database and data grid. Changes become visible only after some delay.

In this case, time of write transaction tied to time of sending message (messaging middleware should guarantee durability of message). Somehow comparable to disk storage (messaging middleware should use disk based log to provide durability of messages). But also there is a considerable lag between the end of write transaction and changes being visible to system.

This architecture has somewhat worse performance characteristics than architecture 4 (both messaging middleware and disk based write cache should use the disk, but write cache are potentially more efficient). But it is built up on already available middleware, reliable and outperforms architecture 2.

Only serious drawback of this architecture compared to 4 is a cold start time (it should recover its state either from database or from messaging server, both ways are considerably slower than just reading state from disk).