next up previous contents
Next: A high-level communication library Up: Message-passing for HPJava Previous: The mpiJava wrapper   Contents


Task-Parallelism in HPJava

Sometimes some parts of a large parallel program cannot be written efficiently in the pure data parallel style, using overall constructs to process all elements of distributed arrays homogeneously. Sometimes, for efficiency, a process has to do some procedure that combines just the locally held array elements in a non-trivial way. The HPJava environment is designed to facilitate direct access to SPMD library interfaces. HPJava provides constructs to facilitate both data-parallel and task-parallel programming. Different processors can either simultaneously work on data in globally subscripted arrays, or independently execute more complex procedures on their own local data. The conversion between these phases is supposed to be relatively seamless. As an example of the HPJava binding for MPI, we will consider a fragment from a parallel N-body classical mechanics problem. As the name suggests, this problem is concerned with the dynamics of a set of N interacting bodies. The total force on each body includes a contribution from all the other bodies in the system. The size of this contribution depends on the position, $ x$, of the body experiencing the force, and the position, $ y$, of the body exerting it. If the individual contribution is given by force($ x, y$), the net force on body $ i$ is

$\displaystyle \sum_{j}$   force($\displaystyle a_i, a_j$)$\displaystyle $

where now $ a_j$ is the position of the $ j$th body. A simplified pure data parallel version of force computation in a N-body program is illustrated in Figure 3.8. There are three distributed arrays in the program, a, b and f. We repeatedly rotate a copy, b, of the position vector, a, and contributions to the force are accumulated as we go. The trouble is that this involves N small shifts. Calling out to the communication library so many times (and copying a whole array so many times) is likely to produce an inefficient program.
Figure 3.8: HPJava data parallel version of the N-body force computation.
\begin{figure}
\small
\begin{verbatim}
Procs1 p = new Procs1(P) ;
on(p) ...
...0) ;
HPspmd.copy(b, tmp) ;
}
}\end{verbatim}
\normalsize
\end{figure}
One way to express the algorithm is in a direct SPMD message-passing style. Example code is given in Figure 3.9. In this HPJava/MPI version of N-body, the HPJava will manage process group arrangements and initialization for distributed arrays. We have used the method Sendrecv_replace(), a point-to-point communication routine between processors from the mpiJava binding of MPI, instead of the shift-operation from Figure 3.8. The local variables a_block, b_block and f_block in the program are not distributed arrays. And they are assigned by an inquiry function call dat() that returns a sequential Java array containing the locally held elements of the distributed array. This HPJava/MPI version does P shifts of whole blocks of size B for sending N data instead of N small shifts in pure data parallel version. This reduces communication between nodes. The HPJava/MPI version also requires less copying operations (P times) than the pure data parallel version (N times), where typically $ P \ll N$. This example leaves some issues unresolved--in general what is the mapping from distributed-array elements to local-data-segment elements? It assumes each processor hold identical sized blocks of data (P exactly divides N). For a general distributed array or section, the local segment may be some stride subset of the vector returned by dat(). The complete specification of HPJava addresses these issues. There is also an issue about the mapping between HPJava process groups and MPI groups. We need an MPI like library that is better integrated with HPJava constructs. We envisage an API tentatively called OOMPH (Object-oriented Message Passing for HPJava). The details have not been worked out. OOMPH would built on mpjdev, and fully interoperable with HPJava Adlib.
Figure 3.9: Version of the N-body force computation using reduction to Java array.
\begin{figure}
\small
\begin{verbatim}
Procs1 p = new Procs1(P) ;
on(p) ...
... right, 99, left, 99) ;
}
}\end{verbatim}
\normalsize
\end{figure}

next up previous contents
Next: A high-level communication library Up: Message-passing for HPJava Previous: The mpiJava wrapper   Contents
Bryan Carpenter 2004-06-09