In a distributed array with ghost regions, the memory for the locally held block of elements is allocated with extra space around the edges. These extra locations can be used to cache some of the element values properly belonging to adjacent processors. The inner loop of algorithms involving stencil updates can then written very simply. In accessing neighbouring elements, the edges of the block don't need special treatment. Rather than throwing an exception for an out of range subscript, shifted indices find the proper values cached in the ghost region. This is such an important technique in real codes that HPJava has a special extension to make it possible.
This is one place where the rule that the subscript of a distributed
array must be a bound location is relaxed.
In a special, slightly
idiosyncratic, piece of syntax, the following expression is
regarded as a bound location
overall(i = x for 1 : N - 2) overall(j = y for 1 : N - 2) a [i, j] = 0.25 * (a [i - 1, j] + a [i + 1, j] + a [i, j - 1] + a [i, j + 1]) ;is allowed if the array a has suitable ghost extensions3.2.
Ghost regions are not magic. The extra locations around the edge of a block can only be made consistent with the values inside blocks on adjacent processes by a suitable communication. A library function called writeHalo updates the values cached in the ghost regions with proper element values from neigbouring processes.
Figure 3.7 is a version of the Laplace program that uses ghost regions. The ellipses stand for code that is unchanged from Figure 2.7. The last two arguments of the ExtBlockRange constructor define the widths of the ghost regions added to the bottom and top (respectively) of the local block. In this version we still introduced one temporary array, called b. The reason is is that in Jacobi relaxation one is supposed to express all the new values in terms of values from the previous iteration. The library function copy copies elements between two aligned arrays (copy does not implement communication; if it is passed non-aligned arrays, and exception occurs).
As a matter of fact, if we removed the temporary, b, and assigned a elements on-the-fly in terms of the current, partially updated a array, nothing very bad would happen. The algorithm may even converge faster because it is locally using the more efficient Gauss-Siedel relaxation scheme. Figure 3.8 is an implementation of the ``red-black'' scheme (a true Gauss-Siedel scheme), in which all even sites are updated, then all odd sites are updated in a separate phase. There is no need to introduce the temporary array b. This example illustrates the use of a stepped triplet in an overall construct.
The ``footprint'' of the stencil update can be more general. Figure 3.9 shows an implementation of Conway's Game of Life. The local update involves dependences on diagonal neighbours. This example also shows the most general form of writeHalo library function, which allows one to specify exactly how much of the available ghost areas are to be updated (this can be less than the total ghost area allocated for the array) and to specify a ``mode'' of updating the ghost cells at the extremes of the whole array. By specifying CYCLIC mode, cyclic boundary conditions are automatically implemented.
As a final example, Figure 3.10 is a Monte Carlo simulation of the well-known Ising model from condensed matter physics. It combines cyclic boundary conditions with red-black updating scheme. Random numbers are generated here using the Random class from the standard Java library. Random streams are created with different values in each process using some expression that depends on the local coordinate value. The method used here is certainly too naive for a reliable simulation, but it illustrates the principle.