next up previous contents index
Next: Ghost Regions Up: More on Mapping Arrays Previous: More on Mapping Arrays   Contents   Index


Other Distribution Formats

HPJava follows HPF in allowing several different distribution formats for the dimensions of its distributed arrays. The new formats are provided without further extension to the syntax of the language. Instead the Range class hierarchy is extended. The full hierarchy is shown in Figure 3.1.

Figure 3.1: The Range hierachy of HPJava.
\includegraphics{ranges.eps}

The BlockRange subclass should be quite familiar by now. The Dimension class is also familiar, although previously it wasn't presented as a range class--later examples will demonstrate how it can be convenient to use process dimensions directly as array ranges. CyclicRange is directly analogous to the cyclic distribution format available in HPF.

Cyclic distributions are sometimes favored because they can lead to better load balancing than the simple block distributions introduced so far. Some algorithms (for example dense matrix algorithms) don't have the kind of locality that favours block distribution for stencil updates, but they do involve phases where parallel computations are limited to subsections of the whole array. In block distributions these sections may map to only a fraction of the available processes, leaving the remaining processes idle. Here is a contrived example

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
... = complicatedFunction(i\lq , j\lq ) ;
}\end{verbatim}\end{minipage}\end{displaymath}

The point here is that the overall constructs only traverse half the ranges of the array they process. As shown in Figure 3.2, this leads to a very poor distribution of workload. The process with coordinates $(0,0)$ does nearly all the work. The process at $(0,1)$ has a few elements to work on, and all the other processes are idle.

Figure 3.2: Work distribution for an example with block distribution.
\includegraphics{blockBalance.eps}

In cyclic distribution format, the index space is mapped to the process dimension in wraparound fashion. If we change our example to

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
... = complicatedFunction(i\lq , j\lq ) ;
}\end{verbatim}\end{minipage}\end{displaymath}

Figure 3.3 shows that the imbalance is not nearly as extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using overall loops and collective array operations) it is often possible to change the distribution format of arrays dramatically while leaving much of the code that processes them unchanged. HPJava does not guarantee this property in the same way that HPF does, and fully optimized HPJava programs are unlikely to be so easily redistributed. But it is still a useful feature.

Figure 3.3: Work distribution for an example with cyclic distribution.
\includegraphics{cyclicBalance.eps}

As a more graphic example of how cyclic distribution can improve load balancing, consider the Mandelbrot set code of Figure 3.4.

The Mandelbrot set is defined in terms of points in a plane (actually the plane of complex numbers). Mathematically, the set is a subset of the points in the plane--those points for which a certain iteration does not converge. Figure 3.4 runs this iteration at each point of a grid embedded in the plane. If the iteration of the while loop terminates normally, the current point is not in the set. If the while loop would carry on indefinitely, the current point is in the set. In practice one approximates the set as being those points for which the while loop iterates a sufficiently large number of times--CUTOFF should be some moderately large integer.

We represent the set as a distributed array of char. On completion points outside the set have value ' ' and point inside the set have value '#'. This representation is chosen just because it makes a recognizable display when the array is printed out with Adlib.printf().

Points away from the set are generally eliminated in a few iterations, whereas those inside the set or close to it take much more computation. In the case N = 64 (with CUTOFF = 100), the block decomposition of the set is shown in Figure 3.5. The middle column of processors will do most of the work, because they hold most of the set. Changing the class of the ranges to CyclicRange gives the much more even distribution shown in Figure 3.6.

We are not suggesting, by the way, that Mandelbrot computation is a serious application of HPJava. It is just a meant as a simple illustration of an issue about distribution format and load balancing.

Figure 3.4: Mandelbrot set computation.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
...ntf(''%c %'' + N + ''N'', set) ;
}\end{verbatim}\end{minipage}\end{displaymath}

Figure 3.5: Blockwise decomposition of the Mandelbrot set (black region).
\includegraphics[width=3in]{blockMandelbrot.eps}

Figure 3.6: Cyclic decomposition of the Mandelbrot set.
\includegraphics[width=3in]{cyclicMandelbrot.eps}

The ExtBlockRange subclass represents block-distributed ranges extended with ghost regions.


next up previous contents index
Next: Ghost Regions Up: More on Mapping Arrays Previous: More on Mapping Arrays   Contents   Index
Bryan Carpenter 2003-04-15