Distributed Array Restriction

Library functions operating on distributed arrays often
specify certain *alignment relations* between their array
arguments. In HPJava it is natural to define two arrays to
be *aligned* if they have the same distribution group and
all their ranges are aligned^{4.3}.
The Adlib method `dotProduct()`, for example, takes two distributed
array arguments. These arguments must be aligned.

Occasionally it happens that two arrays we want to pass as arguments to
a library function are *essentially* aligned, but one is replicated
over a particular process dimension and the other isn't. It may be
intuitively obvious that all the data needed by the function is in the
right place, but still we cannot call the function--the ranges may
match, but the replicated array has a larger distribution group. By
the definition given above the arrays are not *identically*
aligned.

One possibility is to relax the definition of argument alignment to take account of this situation. But experience suggests that the simple definition of alignment given above is easy to understand, and the specification and implementation of library functions are simplest if thay are based on this definition.

A minor extension to the HPJava language takes care of this situation. The restriction operation introduced for groups in the previous section can also be applied to an array. It returns a new array object--akin to an array section--which has the same ranges as the parent array, but has its group restricted by the specified location. Applied to a replicated array, it returns an array object referencing only the copies of the elements held in the restricted group.

Figure 4.6 is a generalization of the matrix multiplication
program in Figure 3.16 to the case where the arrays
are suitably distributed over a 3-dimensional process grid. Note that
array `c` is replicated over the process dimension of `z`, `a` is replicated over the dimension of `y`, and `b` is
replicated over the dimension of `x`. The sequential inner loop of
Figure 3.16 is replaced by a call to `dotProduct()`
which directly forms the inner product of two sections with distributed
range `z`.

If we didn't know about array restriction we would probably try
to write the loop body as

The trouble is that according to the rules of the previous section the first argument of

This is the first example we have given of a call to a collective
library function *inside* the parallel `overall` construct. The
library, Adlib, supports this kind of ``nested parallelism'' provided
a few precautions are taken. These will be explained
in section 6.

Incidentally, this example illustrates some interesting principles, but it is not supposed to be a practical implementation of matrix multiplication. The overhead of making separate collective communication calls will far outweigh the notional advantage of their parallel execution.