Allowing collapsed array dimensions means that an array can be
distributed over a process grid having smaller rank than the array
itself. Conversely it is also allowed to distribute an array over a
process grid whose rank is larger than the array.
Replication and collapsing can both occur in a single array, for example
In the last section we gave a ``pipelined'' matrix multiply algorithm. A simpler and potentially more efficient implementation of matrix multiplication can be given if the operand arrays have carefully chosen replicated/collapsed distributions. The program is given in Figure 3.16. As illustrated in Figure 3.17, the rows of a are replicated in the process dimension associated with y. Similarly the columns of b are replicated in the dimension associated with x. Hence all arguments for the inner scalar product are already in place for the computation--no communication is needed.
We would be very lucky to come across three arrays with such a special alignment relation (distribution format relative to one another). There is an important function in the Adlib library called remap(), which takes a pair of arrays as arguments. These must have the same shape and type, but they can have unrelated distribution formats. The elements of the source array are copied to the destination array. In particular, if the destination array has a replicated mapping, the values in the source array are broadcast appropriately.
Figure 3.18 shows how we can use remap to adapt the program in Figure 3.16 and create a general purpose matrix multiplication routine. Besides the remap function, this example introduces two inquiries grp() and rng() that are defined for any distributed array. The inquiry grp() returns the distribution group of the array, and the inquiry returns the th range of the array. The argument is in the range , where is the rank (dimensionality) of the array.
The example also illustrates the most general form of the distributed array creation expression. In all earlier examples arrays were distributed over the whole of the active process group, defined by an enclosing on construct. In general an on clause attached to an distributed array creation expression itself itself can specify that the array is distributed over some subset of the active group. This allows one to create an array outside the on construct that will processes its elements. Through communication functions like remap, values can then be exchanged between different process grids. A sketch example is given in Figure 3.19.
To allow for this kind of situation, where arguments might be distributed over distinct process groups--not the active process group--the generic matrix multiplication of Figure 3.18 included on p clauses in the constructors of its temporary arrays, and explicitly restricts control with an on(p) construct before processing. As we will see in the next section, this refinement also allows the arguments of matmul to be arbitrary array sections.
A special case of the pattern of multiple grids singles out one
process of the program as a ``control'' processor, responsible
for things like I/O. This can be done by creating a Procs0