next up previous contents index
Next: Non-local blocks Up: An extended example: prefix Previous: Cyclic distribution formats   Contents   Index


Optimization for ``general'' distribution formats

We can combine the procedures given in the preceding sections to produce a single optimized prefix procedure that works for any distribution format by using the format() inquiry on Range. The code is given in Figure 7.20.

The inquiry format() returns the constant DIST_DIMENSION if the range is a process dimension and DIST_CYCLIC if it is a cyclically distributed range (or subrange). In these two cases we use the naive algorithm. In all other cases we use the improved blockPrefix().

Notice that blockPrefix() will work OK for a collapsed range (corresponding to a sequential array). It is permitted to do dimension splitting on a collapsed array. The range of the resulting distributed dimension is a ``degenerate'' internal process dimension of size 1. Everything will work, although it would probably be more efficient to test for DIST_COLLAPSED and handle this in a separate, optimized subroutine.

Figure 7.20: Optimized parallel prefix for any distribution format.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}stat...
...efault :
blockPrefix(a) ;
}
}
}\end{verbatim}\end{minipage}\end{displaymath}

In the interests of simplifying the presentation, we left a bug in the algorithm of section 7.6.1. It will fail in the case where the original array range is a subrange with negative alignment stride. To deal with this case the ``recursive'' call to compute global prefixes could be replaced with a call that computes the global suffix--partial sums of elements that start at the topmost element and increment downwards So stage 2 in Figure 7.17 could be replaced with something like7.6:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}if(x...
... 0)
prefix(t) ;
else
suffix(t) ;\end{verbatim}\end{minipage}\end{displaymath}

The suffix computation procedure can use essentially the same algorithms as the prefixes, with some loop orders and shift directions reversed. There is no need to change the individual block processing to take account of negative alignment strides: the localBlock() inquiry does this automatically. The sub_stp field will be negative in this case.


next up previous contents index
Next: Non-local blocks Up: An extended example: prefix Previous: Cyclic distribution formats   Contents   Index
Bryan Carpenter 2003-04-15