Rangeclass hierarchy is extended. The full HPJava Range hierarchy is illustrated in Figure 3.8.
BlockRangedistribution format and objects of class
Dimensionclass is familiar, although is was not presented previously as a range class.
CyclicRangecorresponds to the cyclic distribution format available in HPF. The
CyclicRangedistributions can result in better load balancing than the simple
BlockRangedistributions. For example, dense matrix algorithms don't have the locality that favors the block distributions used in stencil updates, but do involve phases where parallel computations are restricted to subsections of the whole array. In block distributions format, these sections may be corresponding to only a fraction of the available processes, leading to a very poor distribution workload. Here is a artificial example:
overallconstructs only traverse half the ranges of the array they process. As shown in Figure 3.9, this leads to a very poor distribution of workload. The process with coordinates does nearly all the work. The process at has a few elements to work on, and all the other processes are idle.
CyclicRange, we can make better distribution workload. Figure 3.10 shows that the imbalance is not nearly as extreme. Notice that nothing changed in the program apart from the choice of range constructors. This is an attractive feature that HPJava shares with HPF. If HPJava programs are written in a sufficiently pure data parallel style (using
overallloops and collective array operations) it is often possible to change the distribution format of arrays dramatically while leaving much of the code that processes them unchanged. The
ExtBlockRangedistribution describes a
BlockRangedistribution extended with ghost regions. The ghost region is extra space ``around the edges'' of the locally held block of multiarray elements. These extra locations can cache some of the element values properly belonging to adjacent processors. With ghost regions, the inner loop of algorithms for stencil updates can be written in a simple way, since the edges of the block don't need special treatment in accessing neighboring elements. Shifted indices can locate the proper values cached in the ghost region. This is a very important feature in real codes, so HPJava has a specific extension to make it possible. For ghost regions, we relax the rule that the subscript of a multiarray should be a distributed index. As special syntax, the following expression
atcontrol construct and expression is an integer expression--generally a small constant. This is called shifted index. The significance of the shifted index is that an element displaced from the original location will be accessed. If the shift puts the location outside the local block plus surrounding ghost region, an exception will occur. Ghost regions are not magic. The values cached around the edges of a local block can only be made consistent with the values held in blocks on adjacent processes by a suitable communication. A library function called
writeHaloupdates the cached values in the ghost regions with proper element values from neighboring processes. Figure 3.11 is a version of the Laplace equation using red-black relaxation with ghost regions. The last two arguments of the
ExtBlockRangeconstructor define the widths of the ghost regions added to the bottom and top (respectively) of the local block.
CollapsedRangeis a range that is not distributed, i.e. all elements of the range are mapped to a single process. The example
q, with the first dimension collapsed. Figure 3.12 illustrates the situation for the case
CollapsedRangeas just another distributed range class. In order to resolve this common problem, HPJava has a language extension adding the idea of sequential dimensions. If the type signature of a distributed array has the asterisk symbol,
*, in a specific dimension, the dimension will be implicitly collapsed, and can have a subscript with an integer expression like an ordinary multiarray. For example:
atconstruct is retained to deal with the distributed dimension, but there is no need for distributed indices in the sequential dimension. The array constructor is passed integer extent expressions for sequential dimensions. All operations usually applicable to distributed arrays are also applicable to arrays with sequential dimensions. As we saw in the previous section, it is also possible for all dimensions to be sequential, in which case we recover sequential multidimensional arrays. On a two-dimensional process grid, suppose a one-dimensional distributed array with
BlockRangedistribution format is created, and the array dimension is distributed over the first dimension of the process grid, but no array dimension distributed over the second:
breplicated over the second process dimension. Independent copies of the whole array are created at each coordinated where replication occurs. Replication and collapsing can both occur in a single array. For example,
cis sequential, and the array is replicated over both dimensions of
aare replicated in the process dimension associated with
y. Similarly the columns of
bare replicated in the dimension associated with
x. Hence all arguments for the inner scalar product are already in place for the computation--no communication is needed. We would be very lucky to come across three arrays with such a special alignment relation (distribution format relative to one another). There is an important function in the Adlib communication library called
remap, which takes a pair of arrays as arguments. These must have the same shape and type, but they can have unrelated distribution formats. The elements of the source array are copied to the destination array. In particular, if the destination array has a replicated mapping, the values in the source array are broadcast appropriately.
remapto adapt the program in Figure 3.13 and create a general purpose matrix multiplication routine. Besides the
remapfunction, this example introduces the two inquiry methods
rng()which are defined for any multiarray. The inquiry
grp()returns the distribution group of the array, and the inquiry rng() returns the th range of the array. The argument is in the range , where is the rank (dimensionality) of the array. Figure 3.15 also introduces the most general form of multiarray constructors. So far, we have seen arrays distributed over the whole of the active process group, as defined by an enclosing
onconstruct. In general, an on clause attached to an array constructor itself can specify that the array is distributed over some subset of the active group. This allows one to create an array outside the on construct that will processes its elements.