Friday, September 16, 2016

How to measure object size in Java?

You define fields, their names and types, in source of Java class, but it is JVM the one who decides how they will be stored in physical memory.

Sometimes you want to know exactly how much Java object weights in Java. Answering this question is surprisingly complicated.

Challenge

  • Pointer size and Java object header size varies.
  • JVM could be build for 32 or 64 bit architecture. On 64 bit architectures JVM may or may not use compressed pointers (-XX:+UseCompressedOops).
  • Object padding may be different (-XX:ObjectAlignmentInBytes=X).
  • Different field types may have different alignment rules.
  • JVM may reorder fields in object layout as it likes.

Figure below illustrates how JVM may rearrange fields in memory.

Guessing object layout

You can scrap class fields via reflection and try to guess layout chosen by JVM taking into account platform pointer size and other factors.

... at least you can try.

Using the Unsafe

sun.misc.Unsafe is internal helper class used by JVM code. You should not use it, but you can (with some help from reflection). Unsafe is popular among people doing weird things with JVM.

Unsafe can let you query information about physical layout of Java object. Though, it would not tell you directly real size of object in memory. You would still have to do some error-prone math to calculate object's size.

Here is example of such code.

Instrumentation agent

java.lang.instrument.Instrumentation is an API for profilers and other performance tools. You need to install agent into JVM to get instance of this class. This class has handy getObjectSize(...) method which would tell you real object size.

There is library jamm which exploit this aproach. You should use special JVM start options though.

Threading MBean

Threading MBean in JVM has a handy allocation counter. Using this counter you can easily measure object size by allocating new instance and checking delta of counter. Snippet below is doing just that.

import java.lang.management.ManagementFactory;

public class MemMeter {

    private static long OFFSET = measure(new Runnable() {
        @Override
        public void run() {
        }
    });

    /**
     * @return amount of memory allocated while executing provided {@link Runnable}
     */
    public static long measure(Runnable x) {
       long now = getCurrentThreadAllocatedBytes();
       x.run();
       long diff = getCurrentThreadAllocatedBytes() - now;
       return diff - OFFSET;
    }

    @SuppressWarnings("restriction")
    private static long getCurrentThreadAllocatedBytes() {
        return ((com.sun.management.ThreadMXBean)ManagementFactory.getThreadMXBean()).getThreadAllocatedBytes(Thread.currentThread().getId());
    }
}

Below is simple usage example

System.out.println("size of java.lang.Object is " 
+ MemMeter.measure(new Runnable() {

    Object x;

    @Override
    public void run() {
        x = new Object();
    }
}));

Though, this approach require you to create new instance of object to measure its size. That may be an obstacle.

jmap

jmap is a one of JDK tools. With jmap -histo PID command you can print histogram of your heap objects.

num     #instances         #bytes  class name
---------------------------------------------
  1:       1413317      111961288  [C
  2:        272969       39059504  <constMethodKlass>
  3:       1013137       24315288  java.lang.String
  4:        245685       22715744  [I
  5:        272969       19670848  <methodKlass>
  6:        206682       17868464  [B
  7:         29355       17722320  <constantPoolKlass>
  8:        659710       15833040  java.util.HashMap$Entry
  9:         29355       12580904  <instanceKlassKlass>
 10:        105637       12545112  [Ljava.util.HashMap$Entry;
 11:        170894       11797400  [Ljava.lang.Object;

For objects, you can divide byte size by instance count to get individual instance size for class. This would not work for arrays, though.

Java Object Layout tool

Java Object Layout tool is using number of different approaches for introspecting physical layout of Java object in memory.

Thursday, July 21, 2016

Rust, JNI, Java

Recently, I had a necessity to do some calls to kernel32.dll from my Java code. Just a few system calls on Windows platform, as simple as it sounds. Plus I wanted to keep resulting size of binary as small as possible.

Later requirement has added a fair challenge to that task.

How to call platform code for Java?

JNI - Java Native Interface

JNI is built in JVM and is part of Java standard. Sounds good, there is a catch though. To call native code from Java via JNI, you have to write native code (e.g. using C language). That is it, JNI requires some glue code (aka bindings) between native calls and Java methods.

m... do we have other alternatives?

JNA - Java Native Access

JNA is an alternative to JNI. You can call native code from Java, no glue code. Cool, what is the cost?

JNA jar has size of 1.1 MiB. Extra megabyte just to do couple of simple calls to Windows kernel - not a deal.

Back to JNI

Ok, I need to write some glue code for JNI. What language to choose?

C/C++ - no, just no. C/C++ tool chain, compiler, headers, build tools, is an abomination, especially on Windows. Please, I just need literally half screen of code compiled to dll binary. I do not want 10 GiB worth Visual Studio to pollute my desktop.

Die hard Java guy is speaking :)

Free Pascal

Pascal is an ancient language. It was programming language of my youth. MS DOS, Turbo Pascal ... colors were so bright these days.

Twenty years later, I was surprised to find Pascal in pretty good shape. Free Pascal has impressive list of supported platforms. Pascal compiler is lighting fast. Produced binaries have no dependency on libc / msvcrt.

Using Free Pascal I get my kernel32-to-JNI dll with size of 33 KiB. That sounds much, much better.

Can we do better?

Rust

Rust is a new kid in a language block. It has a strong ambition to replace C/C++ as system level language. It gives you all powers of C plus memory safety, modernized build system, language level modules (crates).

Sounds promising, let's try Rust for little JNI glue dll.

Calling

rustc -C debuginfo=0 --crate-type dylib myjni.rs

result is disappointing 2.5 MiB binary.

Rust dylib is a dll which can be used by other Rust code, so it is exposing a lot of language specific metadata. cdylib is a new packaging introduced in Rust 1.10, which is more suitable for JNI bindings.

Command line

rustc -C lto -C debuginfo=0 --crate-type cdylib myjni.rs

has produced 1.6 MiB binary. -C lto option instructs compiler to do "link time optimization". For some reason cdylib was not compiling without lto option for me.

Ok, direction is right, but we need to move much further. Let's try more compiler options.

Command line

rustc -C opt-level=3 -C lto -C debuginfo=0 --crate-type cdylib myjni.rs

has produced 200 KiB binary. Optimization allow compiler to throw away a big portion of standard library which I will never need for my simple JNI binding.

Though, a large portion of standard library is still there.

In Rust you can fully turn off standard library (e.g. to run on bare metal).

Normally you would need at least memory management, but for simple JNI binding you can get away using stack allocation only.

At the moment, using Rust with no_std option requires nightly build of compiler. I have also rewrite some portion of kernel32 and JNI declarations to avoid dependency on libc types.

rustc -C opt-level=3 -C panic=abort -C lto -C debuginfo=0 --crate-type cdylib myjni.rs

Binary size is 22.5 KiB.

Cool, we have beaten Free Pascal.

One more tweak, execute strip -s on resulting dll and final binary size is 16.9 KiB.

Honestly, 16.9 KiB for couple of calls is still overkill. But, I'm not desperate enough to try assembly for JNI binding, at least not today.

Conclusion

Free Pascal IMHO, Free Pascal a good choice if you need simple JNI bindings. As a bonus, Free Pascal on Linux has no dependency on platform's dynamic libraries, so you can build cross-Linux-distro binaries.

Rust. I believe Rust have a great potential. Rust has unique memory safety model yet it let you to get as close to bare metal as C does. Besides other features, Rust has really promising cross compiling capabilities, which gives it a very strong position in embedded / IoT space.

Yet, Rust needs to get more stable. no_std feature is not available in latest (1.10) stable. cdynlib is not supported by latest stable cargo tool. Rust tool chain on Windows depends either on MS Visual Studio or MSys. Resulting binaries are slightly incompatible to each other (Oracle JMV is build with Visual Studio, so using MSys built JNI bindings leads to process crash in certain cases).

Wednesday, March 16, 2016

Finalizers and References in Java

Automatic memory management (garbage collection) is one of essential aspects of Java platform. Garbage collection relieves developers from pain of memory management and protects them from whole range of memory related issues. Though, working with external resources (e.g. files and socket) from Java becomes tricky, because garbage collector alone is not enough to manage such resources.

Originally Java had finalizers facility. Later special reference classes were added to deal with same problem.

If we have some external resource which should be deallocated explicitly (common case with native libraries), this task could be solved either using finalizer or phantom reference. What is the difference?

Finalizer approach

Code below is implementing resource housekeeping using Java finalizer.

public class Resource implements ResourceFacade {

    public static AtomicLong GLOBAL_ALLOCATED = new AtomicLong(); 
    public static AtomicLong GLOBAL_RELEASED = new AtomicLong(); 

    int[] data = new int[1 << 10];
    protected boolean disposed;

    public Resource() {
        GLOBAL_ALLOCATED.incrementAndGet();
    }

    public synchronized void dispose() {
        if (!disposed) {
            disposed = true;
            releaseResources();
        }
    }

    protected void releaseResources() {
        GLOBAL_RELEASED.incrementAndGet();
    }    
}

public class FinalizerHandle extends Resource {

    protected void finalize() {
        dispose();
    }
}

public class FinalizedResourceFactory {

    public static ResourceFacade newResource() {
        return new FinalizerHandle();
    }    
}

Phantom reference approach

public class PhantomHandle implements ResourceFacade {

    private final Resource resource;

    public PhantomHandle(Resource resource) {
        this.resource = resource;
    }

    public void dispose() {
        resource.dispose();
    }    

    Resource getResource() {
        return resource;
    }
}

public class PhantomResourceRef extends PhantomReference<PhantomHandle> {

    private Resource resource;

    public PhantomResourceRef(PhantomHandle referent, ReferenceQueue<? super PhantomHandle> q) {
        super(referent, q);
        this.resource = referent.getResource();
    }

    public void dispose() {
        Resource r = resource;
        if (r != null) {
            r.dispose();
        }        
    }    
}

public class PhantomResourceFactory {

    private static Set<Resource> GLOBAL_RESOURCES = Collections.synchronizedSet(new HashSet<Resource>());
    private static ResourceDisposalQueue REF_QUEUE = new ResourceDisposalQueue();
    private static ResourceDisposalThread REF_THREAD = new ResourceDisposalThread(REF_QUEUE);

    public static ResourceFacade newResource() {
        ReferedResource resource = new ReferedResource();
        GLOBAL_RESOURCES.add(resource);
        PhantomHandle handle = new PhantomHandle(resource);
        PhantomResourceRef ref = new PhantomResourceRef(handle, REF_QUEUE);
        resource.setPhantomReference(ref);
        return handle;
    }

    private static class ReferedResource extends Resource {

        @SuppressWarnings("unused")
        private PhantomResourceRef handle;

        void setPhantomReference(PhantomResourceRef ref) {
            this.handle = ref;
        }

        @Override
        public synchronized void dispose() {
            handle = null;
            GLOBAL_RESOURCES.remove(this);
            super.dispose();
        }
    }

    private static class ResourceDisposalQueue extends ReferenceQueue<PhantomHandle> {

    }

    private static class ResourceDisposalThread extends Thread {

        private ResourceDisposalQueue queue;

        public ResourceDisposalThread(ResourceDisposalQueue queue) {
            this.queue = queue;
            setDaemon(true);
            setName("ReferenceDisposalThread");
            start();
        }

        @Override
        public void run() {
            while(true) {
                try {
                    PhantomResourceRef ref = (PhantomResourceRef) queue.remove();
                    ref.dispose();
                    ref.clear();
                } catch (InterruptedException e) {
                    // ignore
                }
            }
        }
    }
}

Implementing same task using phantom reference requires more boilerplate. We need separate thread to handle reference queue, in addition, we need to keep strong references to allocated reference objects.

How finilaizers work in Java

Under the hood, finilizers work very similarly to our phantom reference implementation, though, JVM is hiding boilerplate from us.

Each time instance of object with finalizer is created, JVM creates instance of FinalReference class to track it. Once object becomes unreachable, FinalReference is triggered and added to global final reference queue, which is being processed by system finalizer thread.

So finalizes and phantom reference approach work very similar. Why should you bother with phantom references?

Comparing GC impact

Let's have simple test: resource object is allocated then added to the queue, once queue size hits limit oldest reference is evicted and thrown away. For this test we will monitor reference processing via GC logs.

Running finalizer based implementation.

[GC [ParNew[ ... [FinalReference, 5718 refs, 0.0063374 secs] ... 
Released: 6937 In use: 59498

Running phantom based implementation.

[GC [ParNew[ ... [PhantomReference, 5532 refs, 0.0037622 secs] ... 
Released: 5468 In use: 38897

As you can see, once object becomes unreachable, it needs to be handled in GC reference processing phase. Reference processing is a part of Stop-the-World pause. If, between collections, too many references becomes eligible for processing it may prolong Stop-the-World pause significantly.

In case above, there is no much difference between finalizers and phantom references. But let's change workflow a little. Now we would explicitly dispose 99% of handles and rely on GC only for 1% of references (i.e. semiautomatic resource management).

Running finalizer based implementation.

[GC [ParNew[ ... [FinalReference, 6295 refs, 0.0070033 secs] ...
Released: 6707 In use: 1457

Running phantom based implementation.

[GC [ParNew[ ... [PhantomReference, 625 refs, 0.0001551 secs] ... 
Released: 21682 In use: 1217

For finalizer based implementation there is no difference. Explicit resource disposal doesn't help reduce GC overhead. But with phantoms, we can see what GC do not need to handle explicitly disposed references (so number of references process by GC is reduced by order of magnitude).

Why this is happening? When resource handle is disposed we drop reference to phantom reference object. Once phantom reference is unreachable, it would never be queued for processing by GC, thus saving time in reference processing phase. It is quite opposite with final references, once created it will be strong referenced by JVM until being processed by finalizer thread.

Conclusion

Using phantom references for resources housekeeping requires more work compared to plain finalizer approach. But using phantom references you have far more granular control over whole process and implement number of optimizations such as hybrid (manual + automatic) resource management.

Full source code used for this article is available at https://github.com/aragozin/example-finalization.