Distributed arrays

Some or all of the dimensions of a multi-dimensional array can be
declared to be *distributed ranges*. In general a distributed
range is represented by an object of class `Range`. A `Range`
object defines a range of integer subscripts, and defines how they are mapped
into a process array dimension. In fact the `Dimension`
class introduced in the previous section is a subclass of `Range`. In this case the integer range is just the range of coordinate
values associated with the dimension. Each value in the range
is mapped, of course, to the process (or slice of processes) with
that coordinate. This kind of range is also called a *primitive range*.
More complex subclasses of `Range` implement more
elaborate maps from integer ranges to process dimensions.
Some of these will be introduced in later sections. For now we concentrate
on arrays
constructed with `Dimension` objects as their distributed ranges.

The syntax of section 2 is extended in the following way to support distributed arrays

- A distributed range object may appear in place of an integer extent in the
``constructor'' of the array (the expression following the
`new`keyword). - If a particular dimension of the array has a distributed range, the
corresponding slot in the type signature of the array should include
a
`#`symbol. - In general the constructor of the distributed
array must be followed by an
`on`clause, specifying the process group over which the array is distributed. Distributed ranges of the array must be distributed over distinct dimensions of this group^{6}.

float [[#,#,]] a = new float [[x, y, 100]] on p ;defines

The distribute array `a` is equivalent in terms of storage
to four local arrays defined by

float [] b = new float [100] ;But because

float [[#,#,]] a = new float [[x, y, 100]] on p, b = new float [[x, y, 100]] on p ; Adlib.shift(a, b, -1, 0, CYCL) ;The

There is a catch in this.
When subscripting the distributed dimensions of an array it
is *simply disallowed* to use subscripts that refer to
off-processor elements. While this:

int i = x.crd(), j = y.crd() ; a [i, j, 20] = a [i, j, 21] ;is allowed, this:

int i = x.crd(), j = y.crd() ; a [i, j, 20] = b [(i + 1) % 2, j, 20] ;is forbidden. The second example could apparently be implemented using a nearest neighbour communication, quite similar to the

If subscripting distributed dimensions is so restricted, why are the
`i`, `j` subscripts on the arrays needed at all? In the
examples of this section these subscripts are only allowed one value on each
processor.
Well, the inconvience of specifying the subscripts will be reduced by
language constructs introduced later, and the fact that only one subscript
value is local is a special feature of the *primitive ranges*
used here. The higher level distributed ranges introduced later
map multiple elements to individual processes. Subscripting will no
longer look so redundant.

We finish this section with a fairly complex example using
the notation established so far. The algorithm of figure
1 implements multiplication of two `N` `N` matrices. One dimension of each of the two matrices is
block-distributed over the `P` processors of `p`, so `N` is
equal to `P` `B` where `B` is the (constant)
local block size.

The matrices are represented as three dimensional arrays, with their
distributed dimensions explicitly split into a distributed range of
extent `P` and a local sequential range of extent `B`. In
later sections we will see how to represent this distribution format
with a single block-distributed `Range` object. Even with that
facility available, the representation used here may still be more natural
for algorithms like the current one, where the block structure is
an integral to the algorithm. The undistributed dimensions of
the matrices are just sequential ranges of extent `N`. The
operation of the algorithm for `P` is visualized in figure
2. There are two phases. Between the phases the data
in `b` is exchanged by the `shift` operation^{7}.