Wednesday, March 28, 2012

Secret HotSpot option improving GC pauses on large heaps

Recently, my patch (RFE-7068625) for JVM garbage collector was accepted into HotSpot JDK code base.

This was a reason for me to redo some of my GC benchmarking experiments. I have already mentioned ParGCCardsPerStrideChunk in article related to patch. This time, I decided study effect of this option more closely.

Parallel copy collector (ParNew), responsible for young collection in CMS, use ParGCCardsPerStrideChunk  value to control granularity of tasks distributed between worker threads. Old space is broken into strides of equal size and each worker responsible for processing (find dirty pages, find old to young references, copy young objects etc) a subset of strides. Time to process each stride may vary greatly, so workers may steal work from each other. For that reason number of strides should be greater than number of workers.

By default ParGCCardsPerStrideChunk =256 (card is 512 bytes, so it would be 128KiB of heap space per stride) which means that 28GiB heap would be broken into 224 thousands of strides. Provided that number of parallel GC threads is usually 4 orders of magnitude less, this is probably too many.

Synthetic benchmark

First, I have run GC benchmark from previous article using 2k, 4k and 8K for this option. HotSpot JVM 7u3 was used in experiment.

It seems that default value (256 cards per strides) is too small even for moderate size heaps. I decided to continue my experiments with stride size 4k as it shows most consistent improvement across whole range of heap sizes.

Benchmark above is synthetic and very simple. Next step is to choose more realistic use case. I usual, my choice is to use Oracle Coherence storage node as my guinea pig.

Benchmarking Coherence storage node

In this experiment I’m filling cache node with objects (object 70% of old space filled with live objects), then put it under mixed read/write load and measuring young GC pauses of JVM. Experiment was conducted with two different heap sizes (28 GiB and 14 GiB), young space for both cases was limited by 128MiB, compressed pointers were enabled.
Coherence node with 28GiB of heap
JVM
Avg. pause
Improvement
7u3
0.0697
0
7u3, stride=4k
0.045
35.4%
0.0546
21.7%
Patched OpenJDK 7, stride=4k
0.0284
59.3%
Coherence node with 14GiB of heap
JVM
Avg. pause
Improvement
7u3
0.05
0
7u3, stride=4k
0.0322
35.6%
This test is close enough to real live Coherence work profile and such improvement of GC pause time has practical importance. I have also included JVM built from OpenJDK trunk with enabled RFE-7068625 patch for 28 GiB test, as expected effect of patch is cumulative with stride size tuning.

Stock JVMs from Oracle are supported

Good news is that you do not have to wait for next version of JVM, ParGCCardsPerStrideChunk option is available in all Java 7 HotSpot JVMs and most recent Java 6 JVMs. But this option is classified as diagnostic so you should enable diagnostic options to use it.
-XX:+UnlockDiagnosticVMOptions
-XX:ParGCCardsPerStrideChunk=4096

Tuesday, March 13, 2012

Using Thrift in Coherence

Coherence provides you two built-in options for serialization format for your object: Java serialization and POF. But you are not limited to this option. You can totally different way of serialization using custom Serializer.

Why use alternative serialization?

If you think that Thrift or Protobuf would be better in speed or size compared to POF, that is probably not true. I did a benchmark using this framework, POF was scoring slightly better than both Thrift and Protobuf. In addition, POF can extract attributes without deserialization of whole object.
Only serious reason I can think of – you already alternative serialization implemented for you object and do not want support multiple format. If it is your can, using alternative serialization in Coherence is perfectly justified.

Catch

So you already have, serialization format for your domain objects you are happy this. But besides domain objects, you custom serializer should also support: standard java types (including collections), internal Coherence classes and custom entry processors, aggregations, filter use by your application (if any). So, custom serializer is not a practical option.

Hybrid POF + Thrift serializer

Solution is simple; use your alternative format for domain objects and POF for anything else. Here is example using Thrift.
pof-config.xml
<pof-config>
 
    <user-type-list>
 
        <!-- Include definitions required by Coherence -->
        <include>coherence-pof-config.xml</include>
 
        <!--
            You should declare type ID for each thrift class you are going to use in Coherence
        -->
        <user-type>
            <type-id>1000</type-id>
            <class-name>org.gridkit.sample.MyObject</class-name>
            <serializer>
                <class-name>org.gridkit.coherence.utils.thift.ThriftPofSerializer</class-name>
            </serializer>
        </user-type>
 
        ...
 
        <!-- Usual POF declarion for application non-thrift classes -->
 
        <user-type>
            <type-id>1100</type-id>
            <class-name>org.gridkit.coherence.sample.SampleEntryProcessor</class-name>
        </user-type>
           
    </user-type-list>
        
</pof-config>
ThriftPofSerializer.java
public class ThriftPofSerializer implements PofSerializer {

 private Constructor<?> constructor;
 private TSerializer serializer;
 private TDeserializer deserializer;
 
 
 public ThriftPofSerializer(int typeId, Class type) {
  try {
   this.constructor = type.getConstructor();
   this.constructor.setAccessible(true);
   this.serializer = new TSerializer();
   this.deserializer = new TDeserializer();
  } catch (Exception e) {
   throw new RuntimeException(e);
  }
 }

 @Override
 @SuppressWarnings("rawtypes")
 public void serialize(PofWriter out, Object obj) throws IOException {
  TBase tobj = (TBase) obj;
  byte[] data;
  try {
   data = serializer.serialize(tobj);
  } catch (TException e) {
   throw new IOException(e);
  }
  out.writeBinary(0, new Binary(data));
 }

 @Override
 @SuppressWarnings("rawtypes")
 public Object deserialize(PofReader in) throws IOException {
  try {
   byte[] data = in.readByteArray(0);
   TBase stub = (TBase) constructor.newInstance();
   deserializer.deserialize(stub, data);
   return stub;
  } catch (Exception e) {
   throw new IOException(e);
  }
 }
}
If some Thrift class in used by other Thrift classes but do not put in Coherence individually, you can omit it in pof-config.xml.

Wednesday, March 7, 2012

Coherence. How to get rid of domain classes in grid classpath?

Coherence data grid is working with objects (storing, queering, aggregating etc). Java objects are native for Coherence but .NET and C++ are also supported. Usually this is good thing, but sometimes it may cause you problems.

Typically Coherence deployed as dedicated storage cluster (few JVMs over few servers contributing memory resources) with application processes connecting either as storage disabled members or Coherence*Extend clients. It also possible (and fairly often) that storage cluster can be used by multiple applications.

Idea to have separated release/deploy cycle for storage nodes and application looks very attractive. But there is a trick. Classes for objects stored in Coherence distributed cache should be present in classpath of storage nodes. Bummer.

Well, while statement reflects experience of many Coherence users, it is not technically true. Let me elaborate.
- Coherence storage nodes are storing binary form of keys and values in memory,
- queries (and indexes) may trigger deserialization of object on server side, but if you stick with POF extractors objects wont’t be deserialized,
- entry processors and aggregator may force objects to be desirialized on server side, but you can avoid it by using BinaryEntry API.

So if you are careful, you can get rid of domain classes in classpath on storage nodes. This is huge for complex Coherence based application, you now can just keep grid online while deploying application releases. Of cause, custom entry processor, aggregators, value extractor etc are still have to be available in classpath if you use them, but even if you use them, such kind of code are tending to be much more stable.

Ok, in theory this is achievable, but in practice, it is very hard to achieve. Sticking with binary API to manipulate java object is awkward (and not always efficient due to gaps between object and binary API).

Here is a middle ground solution - partially serialized object


Key idea - use different serializers on client and server side. From same binary presentation; on client side object is fully deserialized, but on server side just outer layer and few fields are, most of object data are still binary blob. This way, on server side we do not need domain classes.

Trick is possible thanks to PofReader.readerRemainder() / PofWriter.writeRemainder(). These methods allow parsing just start of POF stream and keeping its remainder unparsed. At the same time, POF stream are intact, POF extractors can access any attribute of object.

When this technique can be used?

Then is design Coherence based solution, I’m doing my best to keep it modular. Usually there is a layer around Coherence offering application specific, yet generic service. At least one reason to do it in such way - this service can be mocked for testing. Domain object are rarely stored directly in Coherence instead they are wrapped in envelops. Example above illustrates visioned storage, envelop is used to annotate data with timestamp. Envelop is a part of service, while payload of envelop is not.

Code example

package example;

import java.io.IOException;

import com.tangosol.io.pof.PofReader;
import com.tangosol.io.pof.PofSerializer;
import com.tangosol.io.pof.PofWriter;
import com.tangosol.util.Binary;

public class Envelop {

   public static final int TIMESTAMP_POF     =   1;
   public static final int DELETED_POF   =   2;
   public static final int PAYLOAD_POF     =  20;
  
   protected long timestamp;
   protected boolean deleted;
    protected Object payload;
    protected Binary binaryPayload;
    transient boolean serverMode;

    /** TO BE USED WITH SERIALIZER */
    protected Envelop(long timestamp, boolean deleted, Object payload, Binary binaryPayload, boolean serverMode) {
       this.timestamp = timestamp;
       this.deleted = deleted;
       this.payload = payload;
       this.binaryPayload = binaryPayload;
       this.serverMode = serverMode;
   }

    /** Constructor used on client side */
   public Envelop(Object payload, long timestamp, boolean deleted) {
       this.payload = payload;
       this.timestamp = timestamp;
       this.deleted = deleted;
       this.serverMode = false;
    }
  
    public Object getPayload() {
       return payload;
    }
  
    public Binary getBinaryPayload() {
       return binaryPayload;
    }
  
    public long getTimestamp() {
       return timestamp;
    }
  
    public void setTimestamp(long timestamp) {
       this.timestamp = timestamp;
    }

    public boolean isDeleted() {
       return deleted;
   }

   public void setDeleted(boolean deleted) {
       this.deleted = deleted;
   }

   public static class ServerSerializer implements PofSerializer {

       @Override
       public Object deserialize(PofReader in) throws IOException {
           long timestamp = in.readLong(TIMESTAMP_POF);
           boolean deleted = in.readBoolean(DELETED_POF);
           Binary data = in.readRemainder();           
           Envelop dv = new Envelop(timestamp, deleted, null, data, true);
           return dv;
       }

       @Override
       public void serialize(PofWriter out, Object o) throws IOException {           
           Envelop dv = (Envelop) o;
           if (!dv.serverMode) {
               throw new IllegalArgumentException("Object is in client mode, but server serializer is used. Something wrong with POF config!");
           }
           out.writeLong(TIMESTAMP_POF, dv.getTimestamp());
           out.writeBoolean(DELETED_POF, dv.isDeleted());
           out.writeRemainder(dv.getBinaryPayload());
       }
    }
  
    public static class ClientSerializer implements PofSerializer {

       @Override
       public Object deserialize(PofReader in) throws IOException {
           long timestamp = in.readLong(TIMESTAMP_POF);
           boolean deleted = in.readBoolean(DELETED_POF);
           Object payload = in.readObject(PAYLOAD_POF);
           Binary data = in.readRemainder();           
           Envelop dv = new Envelop(timestamp, deleted, payload, data, false);
           return dv;
       }

       @Override
       public void serialize(PofWriter out, Object o) throws IOException {
           Envelop dv = (Envelop) o;
           if (dv.serverMode) {
               throw new IllegalArgumentException("Object is in server mode, but client serializer is used. Something wrong with POF config!");
           }
           out.writeLong(TIMESTAMP_POF, dv.getTimestamp());
           out.writeBoolean(DELETED_POF, dv.isDeleted());
           out.writeObject(PAYLOAD_POF, dv.getPayload());
           out.writeRemainder(dv.getBinaryPayload());
       }       
    }
}

Summary

Using this technique it is possible to exclude application specific classes from classpath of cluster member JVMs. If you are using .NET or C++ you can even avoid implementing domain objects in Java at all, yet be able to do complex operations using POF extractors.