next up previous contents
Next: Benchmark Results for Multidimensional Up: Object Serialization for Marshalling Previous: Datatypes in an MPI-like   Contents


Adding Serialization to the API

In this section we will discuss the other option for representing complex data buffers in the Java API of [#!MPIPOSITION!#]--introduction of an MPJ.OBJECT datatype.

It is natural to assume that the elements of buffers passed to send and other output operations are objects whose classes implement the Serializable interface. There are at least two ways one may consider communicating object types in the MPI interface

  1. Use the standard ObjectOutputStream to convert the object buffers to byte vectors, and communicate these byte vectors using the same method as for primitive byte buffers (for example, this might involve a native method call to C MPI functions). At the destination, use the standard ObjectInputStream to rebuild the objects.
  2. Replace naive use of serialization streams with more specialized code that uses platform-specific knowledge to communicate data fields efficiently. For example, one might modify the standard writeObject in such a way that a native method creates an MPI derived datatype structure describing the layout of data in the object, and this buffer descriptor could be passed to a native MPI_Send function.
In the second case our implementation is responsible for prepending a suitable type descriptor to the message, so that objects can be reconstructed at the receiving end before data is copied to them.

The first implementation scheme is more straightforward, and this approach will be considered in the remainder of this section. We discuss an implementation based on the mpiJava wrappers, combining standard JDK object serialization methods with a JNI interface to native MPI. Benchmark results presented in the next section suggest that something like the second approach (or some suitable combination of the two) deserves serious consideration, hence section 5.5 describes one realization of this scheme.

The original version of mpiJava was a direct Java wrapper for standard MPI. Apart from adopting an object-oriented framework, it added a modest amount of code to the underlying C implementation of MPI. Derived datatype constructors, for example, simply called the datatype constructors of the underlying implementation and returned a Java object containing a representation of the C handle. A send operation or a wait operation, say, dispatched a single C MPI call. Even exploiting standard JDK object serialization and a native MPI package, uniform support for the MPJ.OBECT basic type complicates the wrapper code significantly.

In the new version of the wrapper, every send, receive, or collective communication operation tests if the base type of the datatype argument describing a buffer is OBJECT. If not--if the buffer element type is a primitive type--the native MPI operation is called directly, as in the old version. If the buffer is an array of objects, special actions must be taken in the wrapper. If the buffer is a send buffer, the objects must be serialized. We also support MPI-like derived datatypes as described in the previous section. On grounds of uniformity, these should be definable with base type OBJECT, just as for primitive elements. The message is then some subset of the array of objects passed in the buffer argument, selected according to the displacement sequence of the derived datatype. In the implementation, a method

  byte [] Object_Serialize(Object   buf,
                           int      offset,
                           int      count,
                           Datatype type)
takes the send buffer and descriptor, and returns a byte vector containing the serialized data. At the receiving end a corresponding Object_deserialize method is called. Making the Java wrapper responsible for handling derived data types when the base type is OBJECT requires additional fields in the Java-side Datatype class. In particular the Java object explicitly maintains the displacement sequence as an array of integers.

A further set of changes to the implementation arises because the size of the serialized data is not known in advance, and cannot be computed at the receiving end from type information available there. Before the serialized data is sent, the size of the data must be communicated to the receiver, so that a byte receive buffer can be allocated. We send two physical messages--a header containing size information, followed by the data5.2. This, in turn, complicates the implementation of the various wait and test methods on communication request objects, and the start methods on persistent communication requests, and ends up requiring extra fields in the Java Request class. Comparable changes are needed in the collective communication wrappers. A gather operation, for example, involving object types is implemented as an MPI_GATHER operation to collect all message lengths, followed by an MPI_GATHERV to collect possibly different-sized data vectors.


next up previous contents
Next: Benchmark Results for Multidimensional Up: Object Serialization for Marshalling Previous: Datatypes in an MPI-like   Contents
Bryan Carpenter 2004-06-09