next up previous contents index
Next: Ghost regions and dimension Up: Low level SPMD programming Previous: Dimension Splitting   Contents   Index

Block Parameters

That is quite neat, but unfortunately it isn't the end of the story. The examples of the previous section will only work properly if N is an exact multiple of the number of processes, P, so that the block size, B, is identical in all processes. In general we do not want to limit ourselves to this case.

A few of the possibilities for mapping a 50-element, one-dimensional array to 4 processes are illustrated in Figure 7.9. They presume the declarations:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Proc...
...ocs1(4) ;
Dimension d = p.dim(0) ;\end{verbatim}\end{minipage}\end{displaymath}

In the first case the distributed array a is divided into four contiguous blocks of sizes $(13,13,13,11)$. In the second case the blocking is different--$(13,13,12,12)$--and the formula for computing the global index value is quite different. The third case illustrates that a might actually be some section of an array. In this example the blocking is $(13,12,13,12)$ and we must take into account that the subscripting into the local segment of the array is strided. Also there is an offset of the first element, which in some processes is zero and in others is one.

Figure 7.9: Example embeddings of array elements in local blocks

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Rang...
...
float [[-]] a = new float [[x]] ;\end{verbatim}\end{minipage}\end{displaymath}

\includegraphics[width=4.9in]{mapBlock.eps}

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Rang...
...
float [[-]] a = new float [[x]] ;\end{verbatim}\end{minipage}\end{displaymath}

\includegraphics[width=4.9in]{mapCyclic.eps}

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}Rang...
... float [[-]] a = b [[0 : 99 : 2]] ;\end{verbatim}\end{minipage}\end{displaymath}

\includegraphics[width=4.9in]{mapStride.eps}


At this point one might feel inclined to abandon the idea of dimension splitting. These examples doesn't seem to fit at all with the idea of dividing a distributed dimension of extent N into P blocks of constant size B, which is what is needed if dimension splitting is to work.

We could give up and go home, but we won't. Note that for a given range there is a fixed bound on the number of array elements held by any process. We can redefine the symbol B to refer to this constant bound. The operational assumption is that all processes allocate enough space to hold this bounding number of elements, although not all processes necessarily use all slots. For the dimension-split array, the extent of the the sequential dimension is the constant number, B, of locally allocated slots. The elements of the original array are embedded somehow in these slots.

In Figure 7.9 the most likely value for B in the first two examples is $\lceil 50 / 4 \rceil = 13$. For the third example the amount of space allocated will presumably be enough to hold the parent distributed array--most likely B is $\lceil 100 / 4 \rceil = 25$7.1.

The embedding of actual elements is determined by a new method, localBlock(). This is a member of the Range class. It has no arguments, and returns an object of class hpjava.lang.Block, declared as:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}clas...
... glb_bas ;
public int glb_stp ;
}\end{verbatim}\end{minipage}\end{displaymath}

The value of count specifies the number of actual array elements in the selected block, and the pair sub_bas, sub_stp define a base and step for the local subscript values associated with those elements. The pair glb_bas, glb_stp define a base and step for the global index values associated with the elements.

(Unfortunately there is an overlap of terminology between the blocks of an arbitrary range, and the distribution format of one particular kind of range, viz. BlockRange. The Block class and the localBlock() method are in no way specifically tied to the ``block-wise'' distribution format embodied in BlockRange. Equivalent methods are defined for any range.)

One of the most important applications of the localBlock() method is in the translation scheme for the overall construct, and this is a fairly natural way to illustrate its use.

Consider this fragment of HPJava:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}floa...
...(i = x for :)
a [i] = (float) i\lq  ;\end{verbatim}\end{minipage}\end{displaymath}

By applying dimension splitting to the array a, the overall construct can be ``translated'' as illustrated in Figure 7.10.

Figure 7.10: Recursive translation of a simple overall construct.
SOURCE:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}floa...
...(i = x for :)
a [i] = (float) i\lq  ;\end{verbatim}\end{minipage}\end{displaymath}

TRANSLATION:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}floa...
...t) (b.glb_bas + b.glb_stp * l) ;
}\end{verbatim}\end{minipage}\end{displaymath}

At first sight we have just replaced one overall construct with another. But the range d can be assumed to be a process dimension, so the role of the new overall construct is essentially a formality--k only has one location in each process, and the new overall doesn't yield a local loop. Although the k subscript of as is required by the rules of the language, it is essentially does nothing, because there is only one locally held location. In fact the only real effect of the reduced construct is to change the active process group inside its body.

So effectively, this transformation has reduced the overall construct to a sequential local for loop. Subscripting with a distributed index has essentially been reduced to subscripting into the sequential local array as[[k, :]]. The subscript expression and the global index expression on the right hand side have been reduced to expressions linear in the loop index. Such expressions can be translated efficiently by a compiler using a strength reduction optimization (replacing the linear expressions with an incremented accumulation variable).

Because it involves reducing an overall construct to a kind of lower-level overall, we sometimes call this general scheme recursive translation. It's an interesting device formally, but note the actual HPJava translator doesn't work this way--it translates directly to Java, following the rules in Appendix A.

Note that technically the localBlock() inquiry is incoherent (see section 5.4). This can be fixed by adding an argument that specified the local coordinate, similar to the argument of the coherent (though otherwise lower-level) block() method that will be introduced in section 7.7. This argument was omitted here to simplify usage, and also to allow the unique result of localBlock() to be computed once and cached inside the Range object. Effectively it uses the incoherent crd() inquiry internally.



Subsections
next up previous contents index
Next: Ghost regions and dimension Up: Low level SPMD programming Previous: Dimension Splitting   Contents   Index
Bryan Carpenter 2003-04-15