Range class hierarchy is extended. The full HPJava Range
hierarchy is illustrated in Figure 3.8.
In the previous section, we have already seen the BlockRange
distribution format and objects of class Dimension. The
Dimension class is familiar, although is was not presented
previously as a range class. CyclicRange corresponds to
the cyclic distribution format available in HPF.
The CyclicRange distributions can result in better load
balancing than the simple BlockRange distributions. For
example, dense matrix algorithms don't have the locality that
favors the block distributions used in stencil updates, but do involve
phases where parallel computations are restricted to subsections of
the whole array. In block distributions format, these sections may be
corresponding to only a fraction of the available processes, leading
to a very poor distribution workload. Here is a artificial example:
overall constructs only traverse
half the ranges of the array they process. As shown in Figure
3.9, this leads to a very poor distribution of
workload. The process with coordinates BlockRange to CyclicRange,
we can make better distribution workload. Figure 3.10
shows that the imbalance is not nearly as extreme.
Notice that nothing changed in the program apart from the choice of
range constructors. This is an attractive feature that HPJava shares
with HPF. If HPJava programs are written in a sufficiently pure data
parallel style (using overall loops and collective array
operations) it is often possible to change the distribution format of
arrays dramatically while leaving much of the code that processes them
unchanged.
The ExtBlockRange distribution describes a BlockRange
distribution extended with ghost regions. The ghost region is
extra space ``around the edges'' of the locally held block of
multiarray elements. These extra locations can cache some of the
element values properly belonging to adjacent processors. With ghost
regions, the inner loop of algorithms for stencil updates can be
written in a simple way, since the edges of the block don't need
special treatment in accessing neighboring elements. Shifted indices
can locate the proper values cached in the ghost region. This is a
very important feature in real codes, so HPJava has a specific
extension to make it possible.
For ghost regions, we relax the rule that the subscript of a
multiarray should be a distributed index. As special syntax,
the following expression
overall or at control construct and expression is
an integer expression--generally a small constant. This is called
shifted index. The significance of the shifted index is that an
element displaced from the original location will be accessed. If the
shift puts the location outside the local block plus surrounding
ghost region, an exception will occur.
Ghost regions are not magic. The values cached around the edges of a
local block can only be made consistent with the values held in blocks on
adjacent processes by a suitable communication. A library function
called writeHalo updates the cached values in the ghost regions
with proper element values from neighboring processes.
Figure 3.11 is a version of the Laplace equation using
red-black relaxation with ghost regions. The last two arguments of the
ExtBlockRange constructor define the widths of the ghost
regions added to the bottom and top (respectively) of the local block.
|
CollapsedRange is a range that is not distributed, i.e. all
elements of the range are mapped to a single process. The example
q, with the first dimension collapsed.
Figure 3.12 illustrates the
situation for the case N = 8.
However, if an array is declared in this way with an collapsed range,
one still has the awkward restriction that a subscript must be a
distributed index--one can't exploit the implicit locality of a
collapsed range, because the compiler treats CollapsedRange as
just another distributed range class.
In order to resolve this common problem, HPJava has a language
extension adding the idea of sequential dimensions. If
the type signature of a distributed array has the asterisk symbol,
*, in a specific dimension, the dimension will be implicitly
collapsed, and can have a subscript with an integer expression like an
ordinary multiarray. For example:
at construct is retained to deal with the distributed
dimension, but there is no need for distributed indices in the
sequential dimension. The array constructor is passed integer extent
expressions for sequential dimensions. All operations usually
applicable to distributed arrays are also applicable to arrays with
sequential dimensions. As we saw in the previous
section, it is also possible for all dimensions to be sequential, in
which case we recover sequential multidimensional arrays.
On a two-dimensional process grid, suppose a one-dimensional
distributed array with BlockRange distribution format is created,
and the array dimension is distributed over the first dimension of the
process grid, but no array dimension distributed over the second:
b replicated over
the second process dimension. Independent copies of the whole array
are created at each coordinated where replication occurs.
Replication and collapsing can both occur in a single array. For
example,
c is sequential, and the array is replicated over
both dimensions of p.
|
![]() |
a are replicated in
the process dimension associated with y. Similarly the
columns of b are replicated in the dimension associated with
x. Hence all arguments for the inner scalar product are
already in place for the computation--no communication is needed.
We would be very lucky to come across three arrays with such a
special alignment relation (distribution format relative to one
another). There is an important function in the Adlib communication
library called remap, which takes a pair of arrays as
arguments. These must have the same shape and type, but they can have
unrelated distribution formats. The elements of the source array are
copied to the destination array. In particular, if the destination
array has a replicated mapping, the values in the source array are
broadcast appropriately.
|
remap
to adapt the program in Figure 3.13 and create
a general purpose matrix multiplication routine. Besides the
remap function, this example introduces the two inquiry
methods grp() and rng() which are defined for any
multiarray. The inquiry grp() returns the distribution
group of the array, and the inquiry
rng(on construct. In general, an on clause attached
to an array constructor itself can specify that the array is
distributed over some subset of the active group. This allows one to
create an array outside the on construct that will processes its
elements.