next up previous contents
Next: ``Array syntax'' in Java. Up: Data parallellism in Java Previous: Data parallellism in Java   Contents


Parallel arrays and collective communication

At the run-time level, a class library implementation of the HPF model is likely to include

Our first experiments with a Java binding only touch the surface of the full HPF semantics, but they provide some hints about a general framework. The interface given here borrows from the C++ class library, Adlib, developed by one of us [6].

A distributed array is parametrized by a member of the Array class. In C++ Array would naturally be a template for a container class. In Java, generic container classes are problematic. Without the template mechanism, the obvious options are that a container holds items of type Object, the base class for all non-primitive types, or that a separate container class is provided for each allowed type of element. The first option doesn't allow for array elements of primitive type, and prevents compile-time type-checking (reminiscent of using void* in C). The second approach presumably involves restricting elements to the finite set of primitive types (int, float, ...)6. For now we have side-stepped the issue by leaving the data elements out of the Array class. Array defines the shape and distribution of an array, but space for elements is allocated in a separately declared vector of the appropriate type7.

The constructor for an Array defines its shape and distribution format. This is expressed through two auxilliary classes: the Procs and Range classes. The Procs class corresponds directly to the HPF processor arrangement. It maps the set of physical processes on which the program is executing to a multi-dimensional grid. A Range describes a single dimension of an HPF array. It embodies an array extent (the size of the array in the dimension concerned), and a mapping of the subscript range to a dimension of a Procs grid.

In our pilot implementation any parallel Java application is written as a class extending the library class Node. The Node class maintains some global information and provides various collective operations on arrays as member functions. The code for the ``main program'' goes in the run member of the application class8.

A simplified version of the code for the ``Life'' demo is given in figure 6.

Figure 6: Simplified code of the Life demo program.
\begin{figure}\footnotesize\begin{verbatim}public class Life extends Node impl...
...] = 1; break;
default: w[i] = 0; break;
}
}
}
}
}\end{verbatim}\end{figure}
The object p represents a 2 by 2 process grid. The Procs constructor takes the current Node object as an argument, from which it obtains information on the available physical processes. In this simplified example we assume that the program executes on exactly four processors.

The objects x and y represent index ranges of size N distributed over the first and second dimensions of the grid p. The default distribution format is blockwise. Cyclic distribution format can also be specified by using a range object of class CRange, which is derived from Range (the pilot implementation does not provide any more general distribution or alignment options).

The object r represents the shape and distribution of a two dimensional array. This ``template'' is be shared by several distributed arrays--it does not contain a data vector. The data vectors that hold the local segments of arrays are created separately using the inquiry function seg, which returns the number of locally held elements. In the example the elements of the main data array are held in w. The extra arrays cn_, cp_, ..., cnn, ... will be used to hold arrays of neighbour values9.

The ``forall loop'' initializing w should be read as something like

  forall(i in range x, j in range y)
    w(i, j) = fun(i, j)
where fun is some function of the global indices defining the initial state of the Life board. The members forall, test, next update internal state of r so that r.sub() returns the local subscript for the current iteration, and r.idx(0) and r.idx(1) return the global index values for the current iteration. We are using r as an iterator class10.

The main loop uses cyclic shift operations to obtain neighbours, communicating data where necessary. The shift operation is a member of the Node class. It should be overloaded to accept data vectors of any primitive type--here the array elements are bytes.

Finally w is updated in terms of its neighbours.

Note some characteristic features of the data-parallel style of programming:

It may be unclear that this framework has the same power as HPF. If the distributed array model is extended to the full HPF model, as it can be, and the shift operation is augmented by some more powerful collective operations including gather and scatter operations parametrized by subscript arrays, we claim that it does. The proof lies in the observation that this is, as advertised, a simplified model of the kind of run-time library that various HPF translators target. This is not to say that programming in this style is always as straightforward as the example given here, or as comprehensible as the corresponding HPF program (if it was, there would be no need for HPF).


next up previous contents
Next: ``Array syntax'' in Java. Up: Data parallellism in Java Previous: Data parallellism in Java   Contents
Bryan Carpenter 2002-07-11