Library functions operating on distributed arrays often specify certain alignment relations between their array arguments. In HPJava it is natural to define two arrays to be aligned if they have the same distribution group and all their ranges are aligned4.3. The Adlib method dotProduct(), for example, takes two distributed array arguments. These arguments must be aligned.
Occasionally it happens that two arrays we want to pass as arguments to a library function are essentially aligned, but one is replicated over a particular process dimension and the other isn't. It may be intuitively obvious that all the data needed by the function is in the right place, but still we cannot call the function--the ranges may match, but the replicated array has a larger distribution group. By the definition given above the arrays are not identically aligned.
One possibility is to relax the definition of argument alignment to take account of this situation. But experience suggests that the simple definition of alignment given above is easy to understand, and the specification and implementation of library functions are simplest if thay are based on this definition.
A minor extension to the HPJava language takes care of this situation. The restriction operation introduced for groups in the previous section can also be applied to an array. It returns a new array object--akin to an array section--which has the same ranges as the parent array, but has its group restricted by the specified location. Applied to a replicated array, it returns an array object referencing only the copies of the elements held in the restricted group.
Figure 4.6 is a generalization of the matrix multiplication program in Figure 3.16 to the case where the arrays are suitably distributed over a 3-dimensional process grid. Note that array c is replicated over the process dimension of z, a is replicated over the dimension of y, and b is replicated over the dimension of x. The sequential inner loop of Figure 3.16 is replaced by a call to dotProduct() which directly forms the inner product of two sections with distributed range z.
If we didn't know about array restriction we would probably try
to write the loop body as
This is the first example we have given of a call to a collective library function inside the parallel overall construct. The library, Adlib, supports this kind of ``nested parallelism'' provided a few precautions are taken. These will be explained in section 6.
Incidentally, this example illustrates some interesting principles, but it is not supposed to be a practical implementation of matrix multiplication. The overhead of making separate collective communication calls will far outweigh the notional advantage of their parallel execution.