next up previous contents index
Next: Dimension Splitting Up: Low level SPMD programming Previous: Low level SPMD programming   Contents   Index

An Example

We will consider a fragment from a parallel N-body classical mechanics problem. As the name suggests, this problem is concerned with the dynamics of a set of N interacting bodies. The total force on each body includes a contribution from all the other bodies in the system. The size of this contribution depends on the position, $x$, of the body experiencing the force, and the position, $y$, of the body exerting it. If the individual contribution is given by $\mbox{\tt force(}x, y\mbox{\tt )}$, the net force on body $i$ is

\begin{displaymath}
\sum_{j} \mbox{force(}a_i, a_j\mbox{\tt )}
\end{displaymath}

where now $a_j$ is the position of the $j$th body. The total force can be computed in parallel by the program given in 7.1. We repeatedly rotate a copy, b, of the position vector, a, using cshift. Every element in b thus passes by every element in the fixed vector a, and contributions to the force are accumulated as we go. The approach is similar to the pipelined matrix multiplication in Figure 3.15.

Figure 7.1: Data parallel version of the N-body force computation.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
... -1) ;
HPutil.copy(b, tmp) ;
}
}\end{verbatim}\end{minipage}\end{displaymath}

The trouble is that this involves $N$ small shifts (Figure 7.2). Calling out to the communication library so many times (and copying a whole array so many times) is likely to produce an inefficient program.

Figure 7.2: Simple ``data parallel'' N-body force computation. The array b is shifted one element to the right in each iteration. Arrows define element pairs combined in the iteration.
\includegraphics[width=5in]{dataParNBody.eps}

In fact we can achieve an equivalent effect--passing the value of every element by every other--if we do just $P$ iterations of an outer loop, with each iteration shifting the whole block of locally held elements of the moving copy to the neighbouring process (Figure 7.3).

Figure 7.3: Efficient N-body force computation. The array b is shifted one block to the right in each iteration. Arrows define element pairs combined in the iteration.
\includegraphics[width=5in]{taskParNBody.eps}

We can express the second approach straightforwardly enough in the language defined so far, but the price is that we have to change the way the distributed arrays are represented in the program.

One legitimate way to express the algorithm is in a direct SPMD message-passing style. Example code is given in Figure 7.4. The local block size is B, so the value of N is $\mbox{\tt P} \times \mbox{\tt B}$. For the sake of being concrete, we have used the methods Rank() and Sendrecv_replace() from the mpiJava binding of MPI (discussed briefly on page [*] and in section 2.7). Figure 7.4 is a valid SPMD Java program and therefore a valid HPJava program. You can find a runnable version of this code in the hpjdk release under hpdjk/examples/PPHPJ/EgMpiNBody.hpj. Unlike most examples in this report this code will only run under the multi-process model (see section 2.7)--currently there is no multithreaded version of the mpiJava mpi package available.

So this works, but unfortunately it lives in a different universe of data structures and communication functions from the parallel-array, collective-communication oriented algorithms we have seen so far. We need to build some bridge between these two extreme styles of parallel programming.

Figure 7.4: MPI version of the N-body force computation.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}int ...
...MPI.FLOAT,
right, 0, left, 0) ;
}\end{verbatim}\end{minipage}\end{displaymath}

One approach--perhaps not the most obvious, but quite natural in HPJava--is to explicitly split the index space of the original one-dimensional arrays across two dimensions: a distributed dimension representing the process dimension itself, and a sequential dimension explicitly representing the local block. The code is given in Figure 7.5. This code relies on the fact that the Dimension class representing a process dimension is actually a subclass of Range, so a process dimension can be used directly as a distributed range of an array.

Figure 7.5: Efficient HPJava version of the N-body force computation.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
..., 0) ;
HPutil.copy(b, tmp) ;
}
}\end{verbatim}\end{minipage}\end{displaymath}

Although there is only one element of d associated with each process, the rules of HPJava force us to explicitly subscript in the associated array dimension. This leads to some extra verbosity, but this style of programming has some attractive features:


next up previous contents index
Next: Dimension Splitting Up: Low level SPMD programming Previous: Low level SPMD programming   Contents   Index
Bryan Carpenter 2003-04-15