The *at* mechanism of the previous section is often useful, but in
practice good parallel algorithms do not spend much time assigning to
isolated elements of distributed arrays. A more urgent requirement
is a mechanism for *parallel* access to distributed array
elements.

The last and most important distributed control construct in the language
is called *over*.
It implements a distributed parallel loop. Conceptually it is quite
similar to the `FORALL` construct of Fortran, except that the
*over* construct specifies exactly where its parallel iterations
are to be performed.
The argument of *over* is a member of the special class `Index`.
This class is a subclass of `Location`, so it is syntactically
correct to use an index as an array subscript (the effect
of such subscripting is only well-defined inside an *over* construct
parametrised by the index in question). Here is an example of
a pair of nested *over* loops:

float [[#,#]] a = new float [[x, y]], b = new float [[x, y]] ; ... Index i, j ; over(i = x | :) over(j = y | :) a [i, j] = 2 * b [i, j] ;The body of an

With the *over* construct we can give some useful
examples of parallel programs.

Figure 1 gives a parallel implementation
of Cholesky decomposition in the extended language.
The first dimension of `a` is sequential (``collapsed''
in HPF parlance). The second dimension is distributed (cyclically,
to improve load-balancing). This a column-oriented decomposition.
The example involves one new operation from
the standard library. The function `remap` copies the elements
of one distributed array or section to another of the same shape.
The two arrays can have any, unrelated decompositions.
In the current example `remap` is used to implement a broadcast.
Because `b` has no range distributed over `p`,
it implicitly has *replicated* mapping; `remap` accordingly
copies identical values to all processors. This example also
illustrates construction of sections of distributed arrays, and use of
non-trivial triplets in the *over* construct.

Figure 2 gives a parallel implementation of red-black
relaxation in the extended language. To support this important
stencil-update paradigm, *ghost regions* are allowed on
distributed arrays. Ghost regions are extensions of the locally held
block of a distributed array, used to cache values of elements held on
adjacent processors. In our case the width of these regions is
specified in a special form of the `BlockRange` constructor. The
ghost regions are explicitly brought up to date using the library
function `writeHalo`. Its arguments are an array with suitable
extensions and a vector defining in each dimension the width of the
halo that must actually be updated.

Note that the new range constructor and `writeHalo` function are *library* features, not new language extensions. One new piece of
syntax is needed: the addition and subtraction operators are
overloaded so that integer offsets can be added or subtracted to
locations, yielding new, shifted, locations. This kind of shifted
access is illegal if it implies access to off-processor data. It only
works if the subscripted array has suitable ghost extensions.

We have covered most of the important *language* features
we propose to implement. Two additional features that are quite important in
practice but have not been discussed are *subranges* and *subgroups*. A subrange is simply a range which is a regular section of
some other range, created by syntax like `x [0 : 49]`

.
Subranges can be used to create distributed
arrays with general HPF-like alignments.
A *subgroup* is some
slice of a process array, formed by restricting process coordinates in
one or more dimensions to single values.
Subgroups formally describe the state of the active process group
inside *at* and *over* constructs.
For a more complete description of a slightly earlier version of
the proposed language, see [3].