An Application

The *Multigrid* [10] method is a fast algorithm for solution of
linear and nonlinear problems using
*restrict* and *interpolate* operations between current
grids (*fine grid*) and restricted grids (*coarse grid*).
As applied to basic relaxation methods for PDEs, it hugely accelerates
elimination of the residual by restricting a smoothed version of the error
term to a coarse grid, computing a correction term on the coarse grid, then
interpolating this correction back to the original fine grid where it is used to
improve the original approximation to the solution.
Multigrid methods can be developed as a *V-cycle* method for simple linear
iterative methods. As we can see in Figure 6.1,
there are three characteristic phases in a V-cycle method;
*pre-relaxation*, *multigrid*, and *post-relaxation*.
The pre-relaxation phase makes the error smooth by performing a relaxation
method.
The multigrid phase restricts the current problem to a subset of the grid
points and solves a restricted problem for the correction term.
The post-relaxation phase perform some steps of the relaxation method
again after interpolating results back to the original grid.

As an example we take red-black relaxation for the Laplace equation as our
relaxation method.
The *relax* operation, the *restrict* operation, and
the *interpolate* operation are three critical parts of
a solution of 2D Poisson equation using a multigrid method.
Domain decomposition, which assigns part of the grid to
each processor, is used for parallelization. Boundary values of the subgrids
are exchanged with nearest neighbors before each calculation.
Red-black relaxation on array `uf` is illustrated in
Figure 6.2.
In this red-black relaxation, the values of the subgrids boundary are
exchanged by using the `writeHalo` method. We assume that all
distributed arrays in our examples were created with ghost regions.

The implementation makes use of the integer global loop index `i``
(read ``i-primed'') associated with distributed index `i`. This
value is used in computing the lower bound of the inner overall. The modulo
2 operation including `i`` and `it`, in conjunction with the
loop stride of 2, ensures that sites of a single parity are updated
in a given iteration `it`, and that this parity is swapped in
successive iterations.
This is a short way to write the algorithm, although it might be more
efficient to split into several `overall` constructs dealing with sites of
different party.

The *restrict* operation (Figure 6.3)
computes the residual and restricts it to a coarser grid.
The residual is computed only at points of the fine grid
matching points in the coarse grid (hence the strides of 2 in the
overall constructs).
A work-array, `tf`, aligned with `uf`, is used to hold the
residual values. We then use a *remap* operation to copy these
values to the coarse grid.

The important `remap` operation copies one distributed array to
another of the same shape which may have unrelated distribution format.
We have made no assumptions of any relationship between the
distribution of the fine grid array `uf` and the coarse grid array
`uc`. In our code a subset of the elements of the work-array `tf` is copied into a subset of the array in coarse grid. Behavior of
the restrict operation is illustrated in Figure 6.4.

The interpolate operation (Figure 6.5) is the opposite of
restriction. It sends the correction computed on the coarse grid
to the finer grid, and corrects current values on that grid. The `remap` and the array section expression are used to copy correction,
`uc`, from coarse grid to matching point of a work-array, `tf`,
on the fine grid. After the copying, we need to update boundary values using
the `writeHalo` to get most up-to-date values. By the two nested
overall constructs, it corrects current values of grid with the
work-array `tf`. The first overall deals with fine grid points on
horizontal links of the coarse grid and the second deals with those
of vertical links--this is sufficient because it turns out that
the correction is only needed on fine grid sites of a single parity.
Behavior of the interpolate operation is illustrated in Figure 6.6.