That is quite neat, but unfortunately it isn't the end of the story. The examples of the previous section will only work properly if N is an exact multiple of the number of processes, P, so that the block size, B, is identical in all processes. In general we do not want to limit ourselves to this case.
A few of the possibilities for mapping a 50-element, one-dimensional
array to 4 processes are illustrated in Figure 7.9.
They presume the declarations:
At this point one might feel inclined to abandon the idea of dimension splitting. These examples doesn't seem to fit at all with the idea of dividing a distributed dimension of extent N into P blocks of constant size B, which is what is needed if dimension splitting is to work.
We could give up and go home, but we won't. Note that for a given range there is a fixed bound on the number of array elements held by any process. We can redefine the symbol B to refer to this constant bound. The operational assumption is that all processes allocate enough space to hold this bounding number of elements, although not all processes necessarily use all slots. For the dimension-split array, the extent of the the sequential dimension is the constant number, B, of locally allocated slots. The elements of the original array are embedded somehow in these slots.
In Figure 7.9 the most likely value
for B in the first two examples is
.
For the third example the amount of space allocated will presumably
be enough to hold the parent distributed array--most likely B is
7.1.
The embedding of actual elements is determined by a new
method, localBlock(). This is a member of the Range class.
It has no arguments, and returns
an object of class hpjava.lang.Block, declared as:
(Unfortunately there is an overlap of terminology between the blocks of an arbitrary range, and the distribution format of one particular kind of range, viz. BlockRange. The Block class and the localBlock() method are in no way specifically tied to the ``block-wise'' distribution format embodied in BlockRange. Equivalent methods are defined for any range.)
One of the most important applications of the localBlock() method is in the translation scheme for the overall construct, and this is a fairly natural way to illustrate its use.
Consider this fragment of HPJava:
At first sight we have just replaced one overall construct with another. But the range d can be assumed to be a process dimension, so the role of the new overall construct is essentially a formality--k only has one location in each process, and the new overall doesn't yield a local loop. Although the k subscript of as is required by the rules of the language, it is essentially does nothing, because there is only one locally held location. In fact the only real effect of the reduced construct is to change the active process group inside its body.
So effectively, this transformation has reduced the overall construct to a sequential local for loop. Subscripting with a distributed index has essentially been reduced to subscripting into the sequential local array as[[k, :]]. The subscript expression and the global index expression on the right hand side have been reduced to expressions linear in the loop index. Such expressions can be translated efficiently by a compiler using a strength reduction optimization (replacing the linear expressions with an incremented accumulation variable).
Because it involves reducing an overall construct to a kind of lower-level overall, we sometimes call this general scheme recursive translation. It's an interesting device formally, but note the actual HPJava translator doesn't work this way--it translates directly to Java, following the rules in Appendix A.
Note that technically the localBlock() inquiry is incoherent (see section 5.4). This can be fixed by adding an argument that specified the local coordinate, similar to the argument of the coherent (though otherwise lower-level) block() method that will be introduced in section 7.7. This argument was omitted here to simplify usage, and also to allow the unique result of localBlock() to be computed once and cached inside the Range object. Effectively it uses the incoherent crd() inquiry internally.