next up previous contents index
Next: Cyclic distribution formats Up: An extended example: prefix Previous: An extended example: prefix   Contents   Index


Optimization for block distribution formats

For the most straightforward, block-based distribution formats--these include arrays parameterized BlockRange, ExtBlockRange and IrregRange, there is a fairly straightforward optimization. We can do prefix computation within individual blocks, then do a global exclusive prefix combining the sums of the blocks. Finally we add the global prefix back to the incomplete prefixes within the blocks. This is illustrated in Figures 7.17, 7.18.

Figure 7.17: Parallel prefix optimized for block-wise distribution formats.

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}stat...
...l ;as [k, sub] += sum ;
}
}
}\end{verbatim}\end{minipage}\end{displaymath}

Figure 7.18: Prefix optimized for block distributions: illustration for N = 8, P = 3.
\includegraphics[width=5in]{blockPrefix.eps}

This version has a hope to be reasonably efficient if $N \gg P$, so arithmetic costs have a chance to dominate communication cost. It still does about $2N$ total additions operations instead the $N$ operations for the sequential code. But in principle if $P > 2$ it should be possible to compensate for this factor, and gain some parallel speedup.

The code given here will work for the ranges mentioned above, and subranges of these7.5. It will not work for cyclic distributions.


next up previous contents index
Next: Cyclic distribution formats Up: An extended example: prefix Previous: An extended example: prefix   Contents   Index
Bryan Carpenter 2003-04-15