Next: A high-level communication library
Up: Message-passing for HPJava
Previous: The mpiJava wrapper
Contents
Task-Parallelism in HPJava
Sometimes some parts of a large parallel program cannot be written
efficiently in the pure data parallel style, using overall constructs
to process all elements of distributed arrays homogeneously.
Sometimes, for efficiency, a process has to do some procedure that combines
just the locally held array elements in a non-trivial way.
The HPJava environment is designed to facilitate direct access to SPMD library
interfaces. HPJava provides constructs to facilitate both data-parallel and
task-parallel programming. Different processors can either simultaneously work
on data in globally subscripted arrays, or independently execute more complex
procedures on their own local data. The conversion between these phases is
supposed to be relatively seamless.
As an example of the HPJava binding for MPI, we will consider a fragment
from a parallel N-body classical mechanics problem.
As the name suggests, this problem is concerned with
the dynamics of a set of N interacting bodies. The total force on each
body includes a contribution from all the other bodies in the system.
The size of this contribution depends on the position,
, of
the body experiencing the force, and the position,
, of the body
exerting it. If the individual contribution is given by
force(
), the net force on body
is
where now
is the position of the
th body.
A simplified pure
data parallel version of force computation in a N-body program is illustrated
in Figure 3.8. There are three distributed arrays in the
program, a, b and f. We repeatedly rotate a copy, b,
of the position vector, a, and contributions to the force are
accumulated as we go. The trouble is that this involves N small shifts.
Calling out to the communication library so many times (and copying a whole
array so many times) is likely to produce an inefficient program.
Figure 3.8:
HPJava data parallel version of the N-body force computation.
 |
One way to express the algorithm is in a direct SPMD message-passing
style. Example code is given in
Figure 3.9. In this HPJava/MPI version
of N-body, the HPJava will manage process group arrangements and initialization
for distributed arrays. We have used the method Sendrecv_replace(),
a point-to-point communication routine between processors from the mpiJava
binding of MPI, instead of the shift-operation from
Figure 3.8.
The local variables a_block, b_block and f_block in the
program are not distributed arrays.
And they are assigned by an inquiry function
call dat() that returns a sequential Java array containing the locally
held elements of the distributed array.
This HPJava/MPI version does P shifts of whole
blocks of size B for sending N data instead of N small
shifts in pure data parallel version. This reduces communication between nodes.
The HPJava/MPI version also requires less copying operations (P times)
than the pure data parallel version (N times), where typically
.
This example leaves some issues unresolved--in general what is
the mapping from distributed-array elements to local-data-segment
elements?
It assumes each processor hold identical sized blocks of data (P
exactly divides N). For a general distributed array or section, the
local segment may be some stride subset of the vector returned by dat().
The complete specification of HPJava addresses these
issues. There is also an issue about the mapping between HPJava process groups
and MPI groups. We need an MPI like library that is better integrated with
HPJava constructs.
We envisage an API tentatively called OOMPH (Object-oriented Message
Passing for HPJava).
The details have not been worked out. OOMPH would built on mpjdev, and fully
interoperable with HPJava Adlib.
Figure 3.9:
Version of the N-body force computation using reduction to Java array.
 |
Next: A high-level communication library
Up: Message-passing for HPJava
Previous: The mpiJava wrapper
Contents
Bryan Carpenter
2004-06-09