Other Distribution Formats

HPJava follows HPF in allowing several different distribution formats
for the dimensions of its distributed arrays. The new formats are
provided without further extension to the syntax of the language.
Instead the `Range` class hierarchy is extended. The full hierarchy
is shown in Figure 3.1.

The `BlockRange` subclass should be quite familiar by now. The
`Dimension` class is also familiar, although previously it wasn't
presented as a range class--later examples will demonstrate how
it can be convenient to use process dimensions directly as array ranges.
`CyclicRange` is directly analogous to the cyclic distribution
format available in HPF.

Cyclic distributions are sometimes favored because they can lead to
better *load balancing* than the simple block distributions
introduced so far. Some algorithms (for example dense matrix
algorithms) don't have the kind of locality that favours block
distribution for stencil updates, but they do involve phases where
parallel computations are limited to subsections of the whole
array. In block distributions these sections may map to
only a fraction of the available processes, leaving the remaining
processes idle. Here is a contrived example

The point here is that the

In *cyclic* distribution format, the
index space is mapped to the process dimension in wraparound
fashion. If we change our example to

Figure 3.3 shows that the imbalance is not nearly as extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using

As a more graphic example of how cyclic distribution can improve load balancing, consider the Mandelbrot set code of Figure 3.4.

The Mandelbrot set is defined in terms of points in a plane (actually
the plane of complex numbers). Mathematically, the set is a subset of
the points in the plane--those points for which a certain iteration
does *not* converge. Figure 3.4 runs this iteration at each
point of a grid embedded in the plane. If the iteration of the while loop
terminates normally, the current point is *not* in the set. If the
while loop would carry on indefinitely, the current point *is* in the set.
In practice one approximates the set as being those points for which the
while loop iterates a sufficiently large number of times--`CUTOFF`
should be some moderately large integer.

We represent the set as a distributed array of `char`. On completion
points outside the set have value `' '`

and point inside the
set have value `'#'`

. This representation is chosen just because
it makes a recognizable display when the array is printed out with
`Adlib.printf()`.

Points away from the set are generally eliminated in
a few iterations, whereas those inside the set or close to it take much
more computation. In the case `N` = 64 (with `CUTOFF` = 100), the block decomposition of the set is shown in Figure
3.5. The middle column of processors will do
most of the work, because they hold most of the set. Changing the
class of the ranges to `CyclicRange` gives the much more even
distribution shown in Figure 3.6.

We are not suggesting, by the way, that Mandelbrot computation is a serious application of HPJava. It is just a meant as a simple illustration of an issue about distribution format and load balancing.

The `ExtBlockRange` subclass represents block-distributed ranges
extended with *ghost regions*.