next up previous
Next: The on construct and Up: HPJava: data parallel extensions Previous: Process arrays

Distributed arrays

Some or all of the dimensions of a multi-dimensional array can be declared to be distributed ranges. In general a distributed range is represented by an object of class Range. A Range object defines a range of integer subscripts, and defines how they are mapped into a process array dimension. In fact the Dimension class introduced in the previous section is a subclass of Range. In this case the integer range is just the range of coordinate values associated with the dimension. Each value in the range is mapped, of course, to the process (or slice of processes) with that coordinate. This kind of range is also called a primitive range. More complex subclasses of Range implement more elaborate maps from integer ranges to process dimensions. Some of these will be introduced in later sections. For now we concentrate on arrays constructed with Dimension objects as their distributed ranges.

The syntax of section 2 is extended in the following way to support distributed arrays

Assume p, x and y are declared as in the previous section, then
  float [[#,#,]] a = new float [[x, y, 100]] on p ;
defines a as a 2 by 2 by 100 array of floating point numbers. Because the first two dimensions of the array are distributed ranges--dimensions of p--a is actually realized as four segments of 100 elements, one in each of the processes of p. The process in p with coordinates i, j holds the section a [[i, j, :]].

The distributed array a is equivalent in terms of storage to four local arrays defined by

  float [] b = new float [100] ;
But because a is declared as a collective object we can apply collective operations to it. The HPJlib functions introduced in section 2 apply equally well to distributed arrays, but now they imply inter-processor communication.
  float [[#,#,]] a = new float [[x, y, 100]] on p,
                 b = new float [[x, y, 100]] on p ;

  HPJlib.shift(a, b, -1, 0, CYCL) ;
The shift operation causes the local values of a to be overwritten with values of b from a processor adjacent in the x dimension.

There is a catch in this. When subscripting the distributed dimensions of an array it is simply disallowed to use subscripts that refer to off-processor elements. While this:

  int i = x.crd(), j = y.crd() ;

  a [i, j, 20] = a [i, j, 21] ;
is allowed, this:
  int i = x.crd(), j = y.crd() ;

  a [i, j, 20] = b [(i + 1) % 2, j, 20] ;
is forbidden. The second example could apparently be implemented using a nearest neighbour communication, quite similar to the shift example above. But our language imposes an strict policy distinguishing it from most data parallel languages: while library functions may introduce communications, language primitives such as array subscripting never imply communication.

If subscripting distributed dimensions is so restricted, why are the i, j subscripts on the arrays needed at all? In the examples of this section these subscripts are only allowed one value on each processor. Well, the inconvience of specifying the subscripts will be reduced by language constructs introduced later, and the fact that only one subscript value is local is a special feature of the primitive ranges used here. The higher level distributed ranges introduced later map multiple elements to individual processes. Subscripting will no longer look so redundant.

next up previous
Next: The on construct and Up: HPJava: data parallel extensions Previous: Process arrays
Bryan Carpenter 2002-07-11