We will consider a fragment from a parallel N-body classical
mechanics problem. As the name suggests, this problem is concerned with
the dynamics of a set of N interacting bodies. The total force on each
body includes a contribution from all the other bodies in the system.
The size of this contribution depends on the position, , of
the body experiencing the force, and the position, , of the body
exerting it. If the individual contribution is given by
, the net force on body is
The trouble is that this involves small shifts (Figure 7.2). Calling out to the communication library so many times (and copying a whole array so many times) is likely to produce an inefficient program.
In fact we can achieve an equivalent effect--passing the value of every element by every other--if we do just iterations of an outer loop, with each iteration shifting the whole block of locally held elements of the moving copy to the neighbouring process (Figure 7.3).
We can express the second approach straightforwardly enough in the language defined so far, but the price is that we have to change the way the distributed arrays are represented in the program.
One legitimate way to express the algorithm is in a direct SPMD message-passing style. Example code is given in Figure 7.4. The local block size is B, so the value of N is . For the sake of being concrete, we have used the methods Rank() and Sendrecv_replace() from the mpiJava binding of MPI (discussed briefly on page and in section 2.7). Figure 7.4 is a valid SPMD Java program and therefore a valid HPJava program. You can find a runnable version of this code in the hpjdk release under hpdjk/examples/PPHPJ/EgMpiNBody.hpj. Unlike most examples in this report this code will only run under the multi-process model (see section 2.7)--currently there is no multithreaded version of the mpiJava mpi package available.
So this works, but unfortunately it lives in a different universe of data structures and communication functions from the parallel-array, collective-communication oriented algorithms we have seen so far. We need to build some bridge between these two extreme styles of parallel programming.
One approach--perhaps not the most obvious, but quite natural in HPJava--is to explicitly split the index space of the original one-dimensional arrays across two dimensions: a distributed dimension representing the process dimension itself, and a sequential dimension explicitly representing the local block. The code is given in Figure 7.5. This code relies on the fact that the Dimension class representing a process dimension is actually a subclass of Range, so a process dimension can be used directly as a distributed range of an array.
Although there is only one element of d associated with each process, the rules of HPJava force us to explicitly subscript in the associated array dimension. This leads to some extra verbosity, but this style of programming has some attractive features: