next up previous contents
Next: General APIs Up: A low-level communication library Previous: A low-level communication library   Contents


Goals and Requirements

Figure 5.1: An HPJava communication stack.
\begin{figure}\centerline{\psfig{figure=Figs/mpjdev.eps}}\end{figure}

The mpjdev API is designed with the goal that it can be implemented portably on network platforms and efficiently on parallel hardware. Unlike MPI which is intended for the application developer, mpjdev is meant for library developers. Application level communication libraries like the Java version of Adlib (or MPJ [13]) may be implemented on top of mpjdev. The mpjdev API itself might be implemented on top of Java sockets in a portable network implementation, or--on HPC platforms--through a JNI (Java Native Interface) to a subset of MPI. The positioning of the mpjdev API is illustrated in Figure 5.1. Currently not all the communication stack in this figure is implemented. The Java version of Adlib, the pure Java implementation on SMPs, and native the MPI implementation are developed and included in the current HPJava or mpiJava releases. The rest of the stack may be filled in the future. Detailed API information is given in section 5.2.

An important requirement is to support communication of all intrinsic Java types, including primitive types, and objects. It should transfer data between the Java program and the network while keeping the overheads of the Java Native Interface as low as practical. From the development of our earlier successful library mpiJava, we learned communication overheads are key factor of performance. For the mpjdev library, one important decision is made to reduce communication overhead. Usually communication protocols are type specific--different type of data should be sent separately. To avoid many small sends, we maintain all the data of the mpjdev as the Java byte [] array for pure Java versions or C char [] array for JNI-based versions. This means all the different primitive types of Java can be stored into the one buffer and sent together instead of using many small separate sends. The Java class types are treated as special case. We can send both primitive types and class types together in one buffer but data may end up in two different messages, one for primitive data and the other for serialized Java objects. To support Java objects efficiently, mpjdev maintains serialized Java objects as a separate Java byte [] array.

Currently there are three different implementations. The initial version of mpjdev was targeted to HPC platforms, through a JNI interface to a subset of MPI. For SMPs, and for debugging on a single processor, we later implemented a pure-Java, multithreaded version. This version assumes SPMD processes are mapped to Java threads. We also developed a more optimized and system-specific mpjdev built on the IBM SP system using the Low-level Application Programming Interface (LAPI). This chapter also describes a proposed pure-Java version of mpjdev that uses a Java sockets. This would provide a more portable network implementation of HPJava (without layering on, say, MPICH or LAM).

Our mpjdev layer is similar to abstract device interface (ADI) of MPICH. This is used as a lower level communications layer in the MPICH implementation of the MPI. This interface handles just the communication between processes. Message information is divided into two parts: the message envelope and the data. The message envelope is relatively small and contains message information like tag, communicator, and length.

There are various differences between mpjdev and the ADI. One is that while mpjdev stores message information in the same buffer with the data and send together, the ADI message envelope maintain own buffer and the data of ADI may or may not be delivered at same time. Another is that mpjdev is more suitable to handle different types of data in a message. The ADI does not have particularly good ways to handle different data types in the same buffer.


next up previous contents
Next: General APIs Up: A low-level communication library Previous: A low-level communication library   Contents
Bryan Carpenter 2004-06-09