next up previous contents
Next: Adding Serialization to the Up: Object Serialization for Marshalling Previous: Motivation   Contents

Datatypes in an MPI-like API for Java

The MPI standard is explicitly object-based. The C++ binding specified in the MPI 2 standard collects these objects into suitable class hierarchies and defines most of the library functions as class member functions. The Java API proposed in [#!MPIPOSITION!#] follows this model, and lifts its class hierarchy directly from the C++ binding of MPI.

In our Java version a class MPI with only static members acts as a module containing global services, such as initialization of the message-passing layer, and many global constants including a default communicator COMM_WORLD (It has been pointed out that if multiple MPI threads are allowed in the same Java VM, the default communicator cannot be obtained from a static variable. The final version of the API may change this convention.) The communicator class Comm is the single most important class in MPI. All communication functions are members of Comm or its subclasses. Another class that is relevant for the discussion below is the Datatype class. This describes the type of the elements in the message buffers passed to send, receive, and other communication functions. Various basic datatypes are predefined in the package. These mainly correspond to the primitive types of Java, shown in figure 4.1 in section 4.2.

The methods corresponding to standard send and receive operations of MPI are members of Comm with interfaces

        void send(Object buf, int offset, int count,
                  Datatype datatype, int dst, int tag)

        Status recv(Object buf, int offset, int count,
                    Datatype datatype, int src, int tag)
In both cases the actual argument corresponding to buf must be a Java array with element type compatible with the datatype argument. If the specified type corresponds to a primitive type, the buffer must be a one-dimensional array. Multidimensional arrays can be communicated directly if an object type is specified, because an individual array can be treated as an object. Communication of object types implies some form of serialization and unserialization. This could be the built-in serialization provided in current Java environments, or (as we discuss at length in section 5.5) it could be some specialized serialization tuned for message-passing.

Besides object types the draft Java binding proposal retains a model of MPI derived datatypes. In C or Fortran bindings of MPI, derived datatypes have two roles. One is to allow messages to contain mixed types. The other is to allow noncontiguous data to be transmitted. The first role involves using the MPI_TYPE_STRUCT derived data constructor, which allows one to describe the physical layout of, say, a C struct containing mixed types. This will not work in Java, because Java does not expose the low-level layout of its objects. In C or Fortran MPI_TYPE_STRUCT also allows one to incorporate displacements computed as differences between absolute addresses, so that parts of a single message can come from separately declared arrays and other variables. Again there is no very natural way to do this in Java. (But effects similar to these uses of MPI_TYPE_STRUCT can be achieved by using MPJ.OBJECT as the buffer type, and relying on object serialization.)

We conclude that in the Java binding the first role of derived dataypes should probably be abandoned--derived types can only include elements of a single basic type. This leaves description of noncontiguous buffers as the remaining role for derived data types. Every derived data type constructable in the Java binding has a uniquely defined base type. This is one of the 9 basic types enumerated in figure 4.1. A derived datatype is an object that specifies two things: a base type and a sequence of integer displacements. In contrast to the C and Fortran bindings the displacements can be interpreted in terms of subscripts in the buffer array argument, rather than as byte displacements.

For example the type constructor indexed is a member of Datatype with interface

  Datatype indexed(int [] arrayOfBlocklengths,
                   int [] arrayOfDisplacements)
This is a binding of the standard MPI operation MPI_TYPE_INDEXED. It constructs a new datatype representing replication of the original datatype (to which the method is applied) into a sequence of blocks. Each block can contain a different number of copies and have a different displacement. The base type of the new datatype will be the same as the base type of the original type. If the displacement sequence of the original type was

with extent5.1 ex, and B is arrayOfBlocklengths argument and D is arrayOfDisplacements argument, the new datatype will have displacement sequence

Here, c is the number of blocks.

In Java the derived dataype constructed by indexed has a potentially useful role. It allows to send (or receive) messages containing values scattered randomly in some one-dimensional array. The draft proposal incorporates versions of other type constructors from MPI including MPI_TYPE_VECTOR for strided sections. We note, though, that the value of providing strided sections is reduced because Java has no natural mapping between elements of its multidimensional arrays and elements of equivalent one-dimensional arrays. This thwarts one common use of strided sections, for representing portions of multidimensional arrays.

next up previous contents
Next: Adding Serialization to the Up: Object Serialization for Marshalling Previous: Motivation   Contents
Bryan Carpenter 2004-06-09