In a distributed array with ghost regions, the memory for the locally held block of elements is allocated with extra space around the edges. These extra locations can be used to cache some of the element values properly belonging to adjacent processors. The inner loop of algorithms involving stencil updates can then be written very simply. In accessing neighbouring elements, the edges of the block don't need special treatment. Rather than throwing an exception for an out of range subscript, shifted indices find the proper values cached in the ghost region. This is such an important technique in real codes that HPJava has a special extension to make it possible.
This is one place where the rule that the subscript of a distributed
array must be a distributed index is relaxed.
In a special piece of syntax, the following expression
Ghost regions are not magic. The values cached around the edges of a local block can only be made consistent with the values held in blocks on adjacent processes by a suitable communication. The library function called Adlib.writeHalo() updates the cached values in the ghost regions with proper element values from neigbouring processes.
Figure 3.7 is a version of the Laplace program that uses ghost regions. The omitted code is unchanged from Figure 2.8. The last two arguments of the ExtBlockRange constructor define the widths of the ghost regions added to the bottom and top (respectively) of the local block. In the example the ``halo area'' has width 1 on all sides. The initial state of the multiarray a is illustrated for the case where the process grid is 3 by 2 and N is 8 in Figure 3.8. In this figure the uninitialized ghost cells are represented by the zeroes around the edges of the segments.
The communication method Adlib.writeHalo() copies element values from neighboring processes into the ghost regions. The state of the array after the call to this method is visualized in Figure 3.9. The elements modified in the algorithm itself are highlighted in Figure 3.10 (remember that elements at the extreme edges of the ``physical'' part of the array hold the fixed boundary conditions). This figure illustrates that after the halo is written, all modified elements are surrounded by up-to-date local copies of the neighbouring elements.
In Figure 3.7 we still introduced one temporary array, called b. The reason is that in Jacobi relaxation one is supposed to express all the new values in terms of values from the previous iteration.
As a matter of fact, if we removed the temporary, b, and reassigned the a elements on-the-fly in terms of the other partially updated a elements, nothing very bad would happen. The algorithm may even converge faster because it is locally using the more efficient Gauss-Siedel relaxation scheme. Figure 3.11 is an implementation of the ``red-black'' scheme (a true Gauss-Siedel scheme), in which all even sites are updated, then all odd sites are updated in a separate phase. There is no need to introduce the temporary array b. This example illustrates the use of a stepped triplet in an overall construct.
The ``footprint'' of the stencil update can be more general. Figure 3.12 shows an implementation of the well-known cellular automaton, Conway's Game of Life. The local update involves dependences on diagonal neighbours. This example also shows the most general form of writeHalo() library function, which allows one to specify exactly how much of the available ghost areas are to be updated (this can be less than the total ghost area allocated for the array) and to specify a ``mode'' of updating the ghost cells at the extremes of the whole array. By specifying CYCLIC mode, cyclic boundary conditions are automatically implemented.
As a final example, Figure 3.13 is a Monte Carlo simulation of the well-known Ising model from condensed matter physics. It combines cyclic boundary conditions with red-black updating scheme. Random numbers are generated here using the Random class from the standard Java package java.util. Random streams are created with different values in each process using some expression that depends on the local coordinate value. (The method used here is probably too naive for a reliable simulation, but as usual it illustrates a principle.)