150 GiB worth of JVM heap dump is laying on hard drive and I need analyze specific problem detected in that process.
This is a dump of proprietary hybrid of in-memory RDBMS and CEP system, I'm responsible for. All data are stored in Java heap, so heap size of some installation is huge (400 GiB heap is largest to the date).
Problem of analyzing huge heap dumps were on my radar for some time, so I wasn't unprepared.
To be honest, I haven't tried to open this file in Eclipse Memory Analyzer, but I doubt it could handle it.
For me, for some time, most useful tool in heap analyzers was JavaScript based queries. Clicking through millions objects is not fun. It is much better to walk object graph with code, not with mouse.
Heap dump is just a serialized graph of objects, my goal is to extract specific information from this graph. I do not really need a fancy UI, API to heap graph would be even better.
How I can analyze heap dump programmatically?
I have started my research with NetBeans profiler (it was a year ago). NetBeans is open source and have visual heap dump analyzer (same component is also used in JVisualVM). It turns out, what heap dump processing code is separate module and API it provides is suitable for custom analysis logic.
NetBeans heap analyzer has a critical limitation, though. It is using temporary file to keep internal index of heap dump. This file is typically around 25% of heap dump itself. But most important it takes a time to build this file, before any query to heap graph is possible.
After taking better look, I decided, I could remove this temporary file. I have forked library (my fork is available at GitHub). Some functions was lost together with temporary file (e.g. backward reference traversing), but they are not need for my kind of tasks.
Another important change to original library,
was implementing HeapPath.
HeapPath is an expression language for object graph.
It is useful both as generic predicate language in graph traversal
algorithms and as simple tool to extract data from object dump.
HeapPath automatically converts strings, primitives and few other
simple types from heap dump structures to normal objects.
This library proved itself very useful in our daily job. One of its application was memory reporting tool for our database/CEP system which automatically report actual memory consumption of every relational transformation node (there could be few hundred nodes in single instance).
For interactive exploring API + Java is not best set of tools, tough. But it lets me do my job (and 150 GiB of dump leave me no alternatives).
Should I be adding some JVM scripting language to the mix ...
BTW: Single pass through 150 GiB is taking about 5 minutes. Meaning full analysis usually employ multiple iterations, but processing times are fairly reasonable even for that heap size.