To increase the flexibility of the system, and make it more attractive to programmers accustomed to the direct message-passing style, an interface to MPI is clearly desirable. In HPF, with its global-thread-of-control model, a proper interface to the underlying message-passing platform is only practical through the extrinsic procedure mechanism. Unlike HPF, the HPJava environment is based on the HPspmd model which is designed to facilitate direct access to SPMD library interfaces. Different processors can either simultaneously work on data with the globally subscripted arrays, or independently execute complex procedures on their own local data. The conversion between these phase is relatively seamless.
We use a simple example extracted from an N-body algorithm to illustrate the usage of the HPJava binding for MPI. Figure 3.4 is a simplified pure data parallel version of force computation in a N-body program. There are three distributed arrays in the program, a, b and f. Distributed array a has fixed copy of the particle positions and b has circulating copy of the positions
Now suppose we wanted to combine features of this HPJava version with an MPI version, while partially keeping the convenient data-parallel array syntax. Figure 3.5 shows HPJava/MPI version of N-body. Process group arrangements and initialization for distributed arrays are handled by HPJava, while the shift-operation is done by calling the mpiJava MPI.Sendrecv_replace point-to-point communication routine between processors. The local variables a_block, b_block and f_block in the program are not distributed arrays, but they are assigned different values by distributed arrays a, b and f according to their position in the process grid. A inquiry function dat() returns a sequential Java array containing the locally held elements of the distributed array. The array size of local variables for each processor is , where is total particle size and is number of processors. These local arrays run serially on each node, but concurrently across all the nodes.
Comparing this HPJava/MPI version with the data-parallel version, notice that both programs send the same amount of total data, but the HPJava/MPI version does shifts of whole blocks of size for sending data, while the pure data-parallel version uses a cshift call that does one circular shift for every step for sending data. This adds extra communication start-up overheads. In addition, HPJava version requires more copying operations ( times) than the mpiJava version does ( times), where typically .
This example leaves non-trivial issues unresolved--in general what is the mapping from distributed-array elements to local-data-segment elements? What is the mapping between HPJava process groups and MPI groups? The complete specification of HPJava will address these issues; eventually a better integrated message-passing API may be desirable.