As explained in the previous section, the underlying distributed array model is equivalent to the HPF array model. As a matter of detail, distributed array mapping is described in terms of a slightly different set of basic concepts. HPF describes the decomposition of an array through alignment to some template, which is in turn distributed over a processor arrangement. The analogous concepts in our parametrization of the distributed array are the distributed range (or simply range) and the process group (or simply group). A distributed range is akin a single dimension of an HPF template--it defines a map from an integer global subscript range into a particular dimension of a process group. A process group is equivalent to an HPF processor arrangement, or to a certain subset of such an arrangement. Switching from templates to ranges and groups is a change of parametrization only. In itself it does not change the set of allowed ways to decompose an array. The new primitives fit better with our distributed control constructs, and correspond more directly to components of our run-time array descriptor. Ranges and groups are treated as proper objects in the extended language. They are values that can be stored in variables or passed to procedures. The group and ranges describing a particular distributed array are accessible through inquiry functions.
To motivate the discussion of HPJava, we will refer to figure
2, which gives a parallel implementation of Choleski
decomposition in the extended language. In pseudocode, the sequential
algorithm is
In HPJava a base class Group describes a general group of processes. It has subclasses Procs1, Procs2, ..., that represent one-dimensional process grids, two-dimensional process grids, and so on. In the example p is defined as a one-dimensional grid of extent NP. The on construct in the example acts like a conditional, excluding processors outside the group p. A distributed range, base class Range, defines a range of integer global subscripts, and specifies how they are mapped into a process grid dimension. In the example, the range x is initalized to a cyclically distributed range of extent N. CyclicRange is one of several subclasses of Range that define different distribution formats.
Now a and b are declared to be distributed arrays. In HPJava the type-signatures and constructors of distributed arrays use double brackets to distinguish them from ordinary Java arrays. If a particular dimension of an array has a distributed range, the corresponding slot in the type signature of the array should include a # symbol. Because b has no range distributed over the active process group (p) it is defined to be replicated across this group. The mapping of a and b is equivalent to the HPF declarations
!HPF$ PROCESSORS p(np)
!HPF$ TEMPLATE t(n)
!HPF$ DISTRIBUTE t(CYCLIC) ONTO p
REAL a(n, n), b(n)
!HPF$ ALIGN a(i, *) WITH t(i)
!HPF$ ALIGN b(*) WITH t(*)
with range x taking over the role of the one-dimensional
template t.
Subscripting operations on distributed arrays are subject to a strict
restriction. An access to an array element such as a [s, k] is
legal, but only if the local process holds the element in
question. The language provides syntax to alleviate the inconvenience
of this restriction. The idea of a location is introduced. It
can be viewed as an abstract element, or ``slot'', of a distributed
range. Any location is mapped to a particular slice of a process
grid. Locations are used to parametrize a new distributed control
construct called the at construct. This works like on,
except that its body is executed only on processes that hold the
specified location. Locations can also be used directly as array
subscripts, in place on integers (locations used as array subscripts
must be elements of the corresponding ranges of the array). The array
access above can be safely written in the context
Location l = x [k] ;
at(l)
... a [s, l] ...
(the first dimension of a is sequential, so we don't have to
worry about the SPMD constraint for subscript s). In the main
example, this syntax is used to ensure that the first block of code
inside the loop only executes on the processor holding column k.
The example involves one communication operation. This is taken from the Adlib library: the function remap copies the elements of one distributed array or section to another of the same shape. The two arrays can have any, unrelated decompositions. Because b has replicated mapping, remap copies identical values to all processors--ie it implements a broadcast of the values in the array section a [[k + 1 : , k]]. The syntax for array sections in HPJava is almost identical to the syntax of sections in Fortran 90. Subscript triplets work in the same way as in Fortran 90.
The last and most important distributed control construct in the language is called over. It is used to access all locally held locations in a particular range, and can therefore be used to access all locally held elements of arrays parametrized by that range. The over construct implements a distributed parallel loop. Its parameter is a member of the special class Index which is a subclass of Location. The idx member of Range can be used inside parallel loops to yield arithmetic expressions that depend on global index values. In the example the over construct is used to iterate over all columns of the matrix to the right of column k.
As promised, the Choleski example has introduced essentially all the important language ideas in HPJava. Further extensions are minor, or consist in adding new subclasses of Range or Group, rather than syntax extensions. Figure 3 gives a parallel implementation of red-black relaxation in the same language. To support this important stencil-update paradigm, ghost regions are allowed on distributed arrays [25]. In our case the width of these regions is specified in a special form of the BlockRange constructor. The ghost regions are explicitly brought up to date using the library function writeHalo.
Note that the new range constructor and writeHalo function are library features (respectively from the base HPJava runtime and the Adlib communication library), not new language extensions. One new piece of syntax is involved: the addition and subtraction operators are overloaded so that integer offsets can be added or subtracted to locations, yielding new, shifted, locations. This kind of shifted access only works if the subscripted array has suitable ghost extensions.