Some or all of the dimensions of a multi-dimensional array can be declared to be distributed ranges. In general a distributed range is represented by an object of class Range. A Range object defines a range of integer subscripts, and defines how they are mapped into a process array dimension. In fact the Dimension class introduced in the previous section is a subclass of Range. In this case the integer range is just the range of coordinate values associated with the dimension. Each value in the range is mapped, of course, to the process (or slice of processes) with that coordinate. This kind of range is also called a primitive range. More complex subclasses of Range implement more elaborate maps from integer ranges to process dimensions. Some of these will be introduced in later sections. For now we concentrate on arrays constructed with Dimension objects as their distributed ranges.
The syntax of section 2 is extended in the following way to support distributed arrays
float [[#,#,]] a = new float [[x, y, 100]] on p ;defines a as a 2 by 2 by 100 array of floating point numbers. Because the first two dimensions of the array are distributed ranges--dimensions of p--a is actually realized as four segments of 100 elements, one in each of the processes of p. The process in p with coordinates i, j holds the section a [[i, j, :]].
The distribute array a is equivalent in terms of storage to four local arrays defined by
float  b = new float  ;But because a is declared as a collective object we can apply collective operations to it. The Adlib functions introduced in section 2 apply equally well to distributed arrays, but now they imply inter-processor communication.
float [[#,#,]] a = new float [[x, y, 100]] on p, b = new float [[x, y, 100]] on p ; Adlib.shift(a, b, -1, 0, CYCL) ;The shift operation causes the local values of a is overwritten with the values of b from a processor adjacent in the x dimension.
There is a catch in this. When subscripting the distributed dimensions of an array it is simply disallowed to use subscripts that refer to off-processor elements. While this:
int i = x.crd(), j = y.crd() ; a [i, j, 20] = a [i, j, 21] ;is allowed, this:
int i = x.crd(), j = y.crd() ; a [i, j, 20] = b [(i + 1) % 2, j, 20] ;is forbidden. The second example could apparently be implemented using a nearest neighbour communication, quite similar to the shift example above. But Java-Ad imposes an strict policy distinguishing it from many data parallel languages: while library functions may introduce communications, language primitives such as array subscripting never imply communication.
If subscripting distributed dimensions is so restricted, why are the i, j subscripts on the arrays needed at all? In the examples of this section these subscripts are only allowed one value on each processor. Well, the inconvience of specifying the subscripts will be reduced by language constructs introduced later, and the fact that only one subscript value is local is a special feature of the primitive ranges used here. The higher level distributed ranges introduced later map multiple elements to individual processes. Subscripting will no longer look so redundant.
We finish this section with a fairly complex example using the notation established so far. The algorithm of figure 1 implements multiplication of two N N matrices. One dimension of each of the two matrices is block-distributed over the P processors of p, so N is equal to P B where B is the (constant) local block size.
The matrices are represented as three dimensional arrays, with their distributed dimensions explicitly split into a distributed range of extent P and a local sequential range of extent B. In later sections we will see how to represent this distribution format with a single block-distributed Range object. Even with that facility available, the representation used here may still be more natural for algorithms like the current one, where the block structure is an integral to the algorithm. The undistributed dimensions of the matrices are just sequential ranges of extent N. The operation of the algorithm for P is visualized in figure 2. There are two phases. Between the phases the data in b is exchanged by the shift operation7.