Other Distribution Formats

HPJava follows HPF in allowing several different distribution formats
for the dimensions of its distributed arrays. The new formats are
provided without further extension to the syntax of the language.
Instead the `Range` class hierarchy is extended. The full hierarchy
is shown in Figure 3.1.

The `BlockRange` subclass is very familiar by now. The
`Dimension` class is also familiar, although previously it wasn't
presented as a range class. Later examples will demonstrate how
it can be convenient to use process dimension objects as array ranges.
`CyclicRange` and `BlockCyclicRange` are directly analogous
with the cyclic and block-cyclic distribution formats available in
HPF.

Cyclic distributions are sometimes favoured because they can lead to
better *load balancing* than the simple block distributions
introduced so far. Some algorithms (for example dense matrix
algorithms) don't have the kind of locality that favours block
distribution for stencil updates, but they do involve phases where
parallel computations are limited to subsections of the whole
array. In block distributions these sections may map to
only a fraction of the available processes, leaving the remaining
processes idle. Here is a contrived example

Procs2 p = new Procs2(2, 3) ; on(p) { Range x = new BlockRange(N, p.dim(0)) ; Range y = new BlockRange(N, p.dim(1)) ; float [[,]] a = new float [[x, y]] ; overall(i = x for 0 : N / 2 - 1) overall(j = y for 0 : N / 2 - 1) a [i, j] = complicatedFunction(i`, j`) ; }As shown in Figure 3.2, this leads to a very poor distribution of workload. The process with coordinates does nearly all the work. The process at has a few elements to work on, and all the other processes are idle.

In *cyclic* distribution format, the
index space is mapped to the process dimension in wraparound
fashion. If we change our example to

Procs2 p = new Procs2(2, 3) ; on(p) { Range x = new CyclicRange(N, p.dim(0)) ; Range y = new CyclicRange(N, p.dim(1)) ; float [[,]] a = new float [[x, y]] ; overall(i = x for 0 : N / 2 - 1) overall(j = y for 0 : N / 2 - 1) a [i, j] = complicatedFunction(i`, j`) ; }Figure 3.3 shows that the imbalance is much less extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using overall loops and collective array operations) it is often possible to change the distribution format of arrays dramatically but leave much of the code that processes them invariant. HPJava does not guarantee this property in the same way that HPF does, and fully optimized HPJava programs are unlikely to be so easily redistributed. But it is still a useful feature.

As a more graphic example of how cyclic distribution
can improve load balancing, consider the Mandelbrot set code of Figure
3.4. Points away from the set are generally eliminated in
a few iterations, whereas those close to the set or inside it take much
more computation--points are assumed to be inside the set, and the
corresponding element of the array is set to 1, when the number of
iterations reaches `CUTOFF`. In the case `N` = 64 (with `CUTOFF` = 100), the block decomposition of the set is shown in Figure
3.5. The middle column of processors will do
most of the work, because they hold most of the set. Changing the
class of the ranges to `CyclicRange` gives the much more even
distribution shown in Figure 3.6.

Block cyclic distribution format is a generalization of cyclic
distribution that is used in some libraries for parallel
linear algebra^{3.1}. It will not be discussed further here.

The `ExtBlockRange` subclass represents block-distributed ranges
extended with *ghost regions*.