The at mechanism of the previous section is often useful, but in practice good parallel algorithms do not spend much time assigning to isolated elements of distributed arrays. A more urgent requirement is a mechanism for parallel access to distributed array elements.
The last and most important distributed control construct in the language is called over. It implements a distributed parallel loop. Conceptually it is quite similar to the FORALL construct of Fortran, except that the over construct specifies exactly where its parallel iterations are to be performed. The argument of over is a member of the special class Index. This class is a subclass of Location, so it is syntactically correct to use an index as an array subscript (the effect of such subscripting is only well-defined inside an over construct parametrised by the index in question). Here is an example of a pair of nested over loops:
float [[#,#]] a = new float [[x, y]], b = new float [[x, y]] ; ... Index i, j ; over(i = x | :) over(j = y | :) a [i, j] = 2 * b [i, j] ;The body of an over construct executes, conceptually in parallel, for every location in the range of its index (or some subrange if a non-trivial triplet is specified). An individual ``iteration'' executes on just those processors holding the location associated with the iteration. The net effect of the example above should be reasonably clear. It assigns twice the value of each element of b to the corresponding element of a. Because of the rules about where an individual iteration iterates, the body of an over can usually only combine elements of arrays that have some simple alignment relation relative to one another. The idx member of range can be used in parallel updates to yield expressions that depend on global index values.
With the over construct we can give some useful examples of parallel programs.
Figure 1 gives a parallel implementation of Cholesky decomposition in the extended language. The first dimension of a is sequential (``collapsed'' in HPF parlance). The second dimension is distributed (cyclically, to improve load-balancing). This a column-oriented decomposition. The example involves one new operation from the standard library. The function remap copies the elements of one distributed array or section to another of the same shape. The two arrays can have any, unrelated decompositions. In the current example remap is used to implement a broadcast. Because b has no range distributed over p, it implicitly has replicated mapping; remap accordingly copies identical values to all processors. This example also illustrates construction of sections of distributed arrays, and use of non-trivial triplets in the over construct.
Figure 2 gives a parallel implementation of red-black relaxation in the extended language. To support this important stencil-update paradigm, ghost regions are allowed on distributed arrays. Ghost regions are extensions of the locally held block of a distributed array, used to cache values of elements held on adjacent processors. In our case the width of these regions is specified in a special form of the BlockRange constructor. The ghost regions are explicitly brought up to date using the library function writeHalo. Its arguments are an array with suitable extensions and a vector defining in each dimension the width of the halo that must actually be updated.
Note that the new range constructor and writeHalo function are library features, not new language extensions. One new piece of syntax is needed: the addition and subtraction operators are overloaded so that integer offsets can be added or subtracted to locations, yielding new, shifted, locations. This kind of shifted access is illegal if it implies access to off-processor data. It only works if the subscripted array has suitable ghost extensions.
We have covered most of the important language features
we propose to implement. Two additional features that are quite important in
practice but have not been discussed are subranges and subgroups. A subrange is simply a range which is a regular section of
some other range, created by syntax like
x [0 : 49].
Subranges can be used to create distributed
arrays with general HPF-like alignments.
A subgroup is some
slice of a process array, formed by restricting process coordinates in
one or more dimensions to single values.
Subgroups formally describe the state of the active process group
inside at and over constructs.
For a more complete description of a slightly earlier version of
the proposed language, see .