Next: Dimension Splitting Up: Low level SPMD programming Previous: Low level SPMD programming   Contents   Index

# An Example

We will consider a fragment from a parallel N-body classical mechanics problem. As the name suggests, this problem is concerned with the dynamics of a set of N interacting bodies. The total force on each body includes a contribution from all the other bodies in the system. The size of this contribution depends on the position, , of the body experiencing the force, and the position, , of the body exerting it. If the individual contribution is given by , the net force on body is

where now is the position of the th body. The total force can be computed in parallel by the program given in 7.1. We repeatedly rotate a copy, b, of the position vector, a, using cshift. Every element in b thus passes by every element in the fixed vector a, and contributions to the force are accumulated as we go. The approach is similar to the pipelined matrix multiplication in Figure 3.15.

The trouble is that this involves small shifts (Figure 7.2). Calling out to the communication library so many times (and copying a whole array so many times) is likely to produce an inefficient program.

In fact we can achieve an equivalent effect--passing the value of every element by every other--if we do just iterations of an outer loop, with each iteration shifting the whole block of locally held elements of the moving copy to the neighbouring process (Figure 7.3).

We can express the second approach straightforwardly enough in the language defined so far, but the price is that we have to change the way the distributed arrays are represented in the program.

One legitimate way to express the algorithm is in a direct SPMD message-passing style. Example code is given in Figure 7.4. The local block size is B, so the value of N is . For the sake of being concrete, we have used the methods Rank() and Sendrecv_replace() from the mpiJava binding of MPI (discussed briefly on page and in section 2.7). Figure 7.4 is a valid SPMD Java program and therefore a valid HPJava program. You can find a runnable version of this code in the hpjdk release under hpdjk/examples/PPHPJ/EgMpiNBody.hpj. Unlike most examples in this report this code will only run under the multi-process model (see section 2.7)--currently there is no multithreaded version of the mpiJava mpi package available.

So this works, but unfortunately it lives in a different universe of data structures and communication functions from the parallel-array, collective-communication oriented algorithms we have seen so far. We need to build some bridge between these two extreme styles of parallel programming.

One approach--perhaps not the most obvious, but quite natural in HPJava--is to explicitly split the index space of the original one-dimensional arrays across two dimensions: a distributed dimension representing the process dimension itself, and a sequential dimension explicitly representing the local block. The code is given in Figure 7.5. This code relies on the fact that the Dimension class representing a process dimension is actually a subclass of Range, so a process dimension can be used directly as a distributed range of an array.

Although there is only one element of d associated with each process, the rules of HPJava force us to explicitly subscript in the associated array dimension. This leads to some extra verbosity, but this style of programming has some attractive features:

• As a practical matter, the fact we are dealing with true HPJava distributed arrays means we can continue to employ concise calls to collective library functions like cshift(), instead of relatively clumsy functions like Sendrecv_replace().
• More esoterically, the program follows the canonical HPspmd style, described briefly in section 5.4. As defined in that section, all variables are coherent. The elements of the arrays in the program of Figure 7.4 are not coherent, because they take different values in each process, although their home group is the set of all processes. Respecting the canonical style may not have immediate practical advantages, but it is somehow aesthetically pleasing. In this style there is no need for incoherent functions like Rank() to get the local process id--instead one uses global index values associated with overall constructs.

Next: Dimension Splitting Up: Low level SPMD programming Previous: Low level SPMD programming   Contents   Index
Bryan Carpenter 2003-04-15