In the last section we introduced the basic HPJava syntax with
a couple of simple examples. In this section we will
discuss a full application of HPJava--a multigrid solver.
The particular solver was adapted from a existing Fortran
program (called PDE2), taken from the Genesis parallel benchmark
suite [1]. The whole of this program has been
ported to HPJava (it is about 800 lines), but in this article we
will only consider a few critical routines. The point is
to further illustrate the power of HPJava collective communication
operations. We will see that writeHalo and one other operation
(remap) suffice to code our routines compactly and
quite efficiently, provided these operations can operate on arbitrary
distributed arrays, including distributed array sections. In the next
section we will discuss implementation of the collectives.

Figure 5:
An example of multigrid iteration.

3.5in1.8in./multigrid.eps

The Multigrid [3] method is a fast algorithm for solution of
linear and nonlinear problems using
restrict and interpolate operations between current
grids (fine grid) and restricted grids (coarse grid).
As applied to basic relaxation methods for PDEs, it hugely accelerates
elimination of the residual by restricting a smoothed version of the error
term to a coarse grid, computing a correction term on the coarse grid, then
interpolating this term back to the original fine grid where it is used to
improve the original approximation to the solution.
Multigrid methods can be developed as a V-cycle method for simple linear
iterative methods. As we can see in Figure 5,
there are three characteristic phases in a V-cycle method;
pre-relaxation, multigrid, and post-relaxation.
The pre-relaxation phase makes the error smooth by performing a relaxation
method.
The multigrid phase restricts the current problem to a subset of the grid
points and solves a restricted problem.
The post-relaxation phase perform some steps of the relaxation method
again after interpolating results back to the original grid.

Figure 6:
Red-black relaxation on array uf.

As an example we take red-block relaxation for the Laplace equation as our
relaxation method.
The relax operation, the restrict operation, and
the interpolate operation are three main parts of
a solution of 2D Poisson equation using a multigrid method.
Domain decomposition, which assigns part of the grid to
each processor, is used for parallelization. Boundary values of the subgrids
are exchanged with nearest neighbours after each calculation.
This operation is illustrated in Figure 6.
In this red-black relaxation, the values of the subgrids boundary are
exchanged by using writeHalo method. We assume that all
distributed arrays in our examples were created with ghost regions.
The implementation makes use of the integer global loop index i`
(read ``i-primed'') associated with distributed index i. This
value is used in computing the lower bound of the inner overall. The modulo
2 operation including i` and it, in conjunction with the
loop stride of 2, ensures that sites of a single parity are updated
in a given iteration it, and moreover this parity is swapped in
successive iterations.

Figure 7:
Restrict operation.

The restrict operation (Figure 7)
computates the residual and restricts it to a coarser grid.
The residual is computed only at points of fine grid with
matching points in the course gride (hence the strides of 2 in the
overall constructs).
A work-array, tf, aligned with uf, is used to hold the
residual values. We then use a remap operation to copy these
values to the coarse grid.
The important remap operation allows to copy one distributed array to
another of the same shape which may have unrelated distribution format.
Note that we have made no assumptions of any relationship between the
distribution of the fine grid array uf and the coarse grid array
uc. In our code a subset of the elements of the work-array
tf is copied into a subset of the array in coarse grid. Behavior of
the restrict operation is illustrated in Figure 8.

Figure 8:
Illustration of restrict operation

3.3in1.8in./reoperation.eps

Figure 9:
Interpolate operation.

Figure 10:
Illustration of interpolate operation

3.3in1.8in./inoperation.eps

The interpolate operation (Figure 9) is opposite of
the restriction. It sends the correction computed on the coarse grid
to the finer grid, and corrects current values on that grid. The
remap and the array section expression are used to copy correction,
uc, from coarse grid to matching point of a work-array, tf,
in fine grid. After the copying, we need to update boundary values using
the writeHalo to get most up-to-date values. By the two nested
overall constructs, it corrects current values of grid with the
work-array tf. The first overall deals with fine grid points on
horizontal links of the coarse grid and the second deals with those
of vertical links--this is sufficient because it turns out that
the correction is only needed on fine grid sites of a single parity.
Behavior of the interpolate operation is illustrated in Figure 10.
The examples here rely on the two high-level
Adlib communication schedules that deal explicitly with distributed arrays;
remap and writeHalo methods.
The remap operation can be applied to various ranks and type of array.
Any section of an array with any allowed distribution format can be used.
Supported element types include Java primitive and Object types.
A general API for the remap function is

where T is a Java primitive or Object type. The arguments
here are zero-dimensional, one-dimensional, two-dimensional, and so on.
We will also summarize these in the shorthand interface:

where the signature T # means any distributed array with elements
of type T. (This form is not supported by the current HPJava
compiler, though something like it will be introduced in the future.
This more concise signature does not incorparte the constraint that
dst and src have the same rank--that would have to be tested
at run-time.)
Adlib was developed as a C++ library to support HPF translation
in the PCRC [15] and earlier projects [11].
Initially HPJava used a JNI wrapper interface to the C++ kernel
of the PCRC library. We are in the process of converting this to
pure Java, and extending it to support Java object types, and target
Java communication platforms.
Besides remap and writeHalo, Adlib includes a family of
related regular collective communication operations (shifts, skews,
transposes and so on). It incorporates a set of collective gather
and scatter operations for more irregular communications, and a set
of reduction operations based on the corresponding Fortran 90 array
intrinsics. Reduction operations take one or more distributed arrays
as input. They combine the elements to produce one or more scalar values,
or arrays of lower rank.
Next:Implementation of Collectives Up:Collective Communication for the Previous:The HPJava Language
Bryan Carpenter
2003-01-23