Monday, February 23, 2015

So, you have dumped 150 GiB of JVM heap, now what?

150 GiB worth of JVM heap dump is laying on hard drive and I need analyze specific problem detected in that process.

This is a dump of proprietary hybrid of in-memory RDBMS and CEP system, I'm responsible for. All data are stored in Java heap, so heap size of some installation is huge (400 GiB heap is largest to the date).

Problem of analyzing huge heap dumps were on my radar for some time, so I wasn't unprepared.

To be honest, I haven't tried to open this file in Eclipse Memory Analyzer, but I doubt it could handle it.

For me, for some time, most useful tool in heap analyzers was JavaScript based queries. Clicking through millions objects is not fun. It is much better to walk object graph with code, not with mouse.

Heap dump is just a serialized graph of objects, my goal is to extract specific information from this graph. I do not really need a fancy UI, API to heap graph would be even better.

How I can analyze heap dump programmatically?

I have started my research with NetBeans profiler (it was a year ago). NetBeans is open source and have visual heap dump analyzer (same component is also used in JVisualVM). It turns out, what heap dump processing code is separate module and API it provides is suitable for custom analysis logic.

NetBeans heap analyzer has a critical limitation, though. It is using temporary file to keep internal index of heap dump. This file is typically around 25% of heap dump itself. But most important it takes a time to build this file, before any query to heap graph is possible.

After taking better look, I decided, I could remove this temporary file. I have forked library (my fork is available at GitHub). Some functions was lost together with temporary file (e.g. backward reference traversing), but they are not need for my kind of tasks.

Another important change to original library, was implementing HeapPath.
HeapPath is an expression language for object graph. It is useful both as generic predicate language in graph traversal algorithms and as simple tool to extract data from object dump. HeapPath automatically converts strings, primitives and few other simple types from heap dump structures to normal objects.

This library proved itself very useful in our daily job. One of its application was memory reporting tool for our database/CEP system which automatically report actual memory consumption of every relational transformation node (there could be few hundred nodes in single instance).

For interactive exploring API + Java is not best set of tools, tough. But it lets me do my job (and 150 GiB of dump leave me no alternatives).

Should I be adding some JVM scripting language to the mix ...

BTW: Single pass through 150 GiB is taking about 5 minutes. Meaning full analysis usually employ multiple iterations, but processing times are fairly reasonable even for that heap size.

Wednesday, February 11, 2015

Binary search - is it still most optimal?

If you have a sorted collection of elements, how would you find index of specific value?
"Binary search" is likely to be your answer.
Algorithms theory is teaching us what binary search is most optimal algorithm for this task with log(N) complexity.
Well, hash table can do better, if you need to find key by exact match. In many cases, though, you have reasons to have your collection sorted, not hashed.

On my job, I'm working on sophisticated in-memory database tailored for streaming data processing. We have a lot of places where we deal with sorted collection of integers (data row references, etc).

Algorithms theory is good, but in reality there are things like cache hierarchy, branch prediction, super scalar execution which may skew performance at edge cases.

Question is - where lie borders between reality ruled by CPU quirks and lawful space of classic algorithms theory?

If you have a doubt - do an experiment.

Experiment is simple: I'm generating a large number of sorted arrays of 32 bit integers. When I search random key in random array multiple times. In each experiment average size of array is fixed. Large number of arrays used to ensure cold memory access. Average time search time is measured.

All code written in Java and measured using JMH tool.

Participants are

  • Binary search - java.util.Arrays.binarySearch()
  • Linear search - simple loop over array until key is found
  • Linear search 2 - looping over every second element in array, if greater key is found, check i - 1 index too

X axis is average array length
Y axis is average time of single search in microseconds
Measurments have been done on 3 different types CPU.

Results speak for themselves.

I was surprised a little, as I were expecting binary search to outperform linear at length of 32 or 64, but it seems that modern processors are very good at optimizing linear memory access.

Provided that 8 - 128 is a practical range for BTree like structures, I will likely to reconsider some of data structures used in our database.