Next: Other features Up: HPJava: data parallel extensions Previous: Higher-level ranges and locations

# Distributed loops

Good parallel algorithms don't usually expend many lines of code assigning to isolated elements of distributed arrays. The at mechanism of the previous section is often useful, but a more pressing need is a mechanism for parallel access to distributed array elements. The last and most important distributed control construct in the language is called over. It implements a distributed parallel loop. Conceptually it is quite similar to the FORALL construct of Fortran, except that the over construct specifies exactly where its parallel iterations are to be performed. The argument of over is a member of the special class Index. The class Index is a subclass of Location, so it is syntactically correct to use an index as an array subscript7. Here is an example of a pair of nested over loops:

  float [[#,#]] a = new float [[x, y]],
b = new float [[x, y]] ;
...
Index i, j ;
over(i = x | :)
over(j = y | :)
a [i, j] = 2 * b [i, j] ;

The body of an over construct executes, conceptually in parallel, for every location in the range of its index (or some subrange if a non-trivial triplet is specified)8. An individual iteration'' executes on just those processors holding the location associated with the iteration. In a particular iteration, the location component of the index (the base class object) is equal to that location. The net effect of the example above should be reasonably clear. It assigns twice the value of each element of b to the corresponding element of a. Because of the rules about where an individual iteration iterates, the body of an over can usually only combine elements of arrays that have some simple alignment relation relative to one another. The idx member of range can be used in parallel updates to give expressions that depend on global index values.

With the over construct we can give some useful examples of parallel programs. Figure 2 is the famous Jacobi iteration for a two dimensionsional Laplace equation. We have used cyclic shift to implement nearest neighbour communications9.

Copying whole arrays into temporaries is not an efficient way of accessing nearest neighbours in an array. Because this is such a common pattern of communication, the standard library supports ghost regions. Distributed arrays can be created in such a way that the segment stored locally is extended with some halo. This halo caches values stored in the segments of adjacent processes. The cached values are explicitly bought up to date by the library operation writeHalo.

An optimized version of the Jacobi program is give in figure 3. This version only involves a singe array temporary. A new constructor for BlockRange is provided. This allows the width of the ghost extensions to be specified. The arguments of writeHalo itself are an array with suitable extensions and two vectors. The first defines in each dimension the width of the halo that must actually be updated, and the second defines the treatment at the ends of the range--in this case the ghost edges are updated with cyclic wraparound. The new constructor and new writeHalo function are simply standard library extensions. One new piece of syntax is needed: the addition and subtraction operators are overloaded so that integer offsets can be added or subtracted to Location objects, yielding new, shifted, locations. The usual access rules apply--this kind of shifted access is illegal if it implies access to off-processor data. It only works if the subscripted array has suitable ghost extensions.

Next: Other features Up: HPJava: data parallel extensions Previous: Higher-level ranges and locations
Bryan Carpenter 2002-07-11