In this section we will give a detailed example of how dimension splitting can be combined with the new inquiries on the Range class to write optimized parallel code that works for input arrays with general distribution formats.
A prefix computation (also sometimes called a ``scan'' operation) takes an array as input, and outputs an array containing the set of partial sums of the elements of that array. If the input array is a and the result array is r, we want the outcome to be:
![\begin{eqnarray*}
\mbox{\tt r [} 0 \mbox{\tt ]} & = & 0 \\
\mbox{\tt r [} i \...
... \ldots + \mbox{\tt a [} i - 1 \mbox{\tt ]}
; \quad 1 \le i < N
\end{eqnarray*}](img307.png)
We can give a straightforward data-parallel algorithm for this
computation using a doubling technique. Possible code is given
in Figure 7.15. The algorithm is very simple.
In a given iteration, the current value in the result array is shifted
by an amount that doubles between iterations. The shifted array is
added into the current array. The algorithm takes
iterations to complete. Its operation is illustrated in Figure
7.16.
As often happens, the pure data parallel program is concise and quite
readable. However it isn't very efficient. It requires a total of
floating point additions, whereas the naive sequential
algorithm only needs
. So we can only expect useful parallel speedup
if
. The pure data parallel version
needs optimization.