next up previous
Next: Implementation of Collectives Up: Collective Communication for the Previous: The HPJava Language


An Application

In the last section we introduced the basic HPJava syntax with a couple of simple examples. In this section we will discuss a full application of HPJava--a multigrid solver. The particular solver was adapted from a existing Fortran program (called PDE2), taken from the Genesis parallel benchmark suite [1]. The whole of this program has been ported to HPJava (it is about 800 lines), but in this article we will only consider a few critical routines. The point is to further illustrate the power of HPJava collective communication operations. We will see that writeHalo and one other operation (remap) suffice to code our routines compactly and quite efficiently, provided these operations can operate on arbitrary distributed arrays, including distributed array sections. In the next section we will discuss implementation of the collectives.
Figure 5: An example of multigrid iteration.
3.5in1.8in./multigrid.eps
The Multigrid [3] method is a fast algorithm for solution of linear and nonlinear problems using restrict and interpolate operations between current grids (fine grid) and restricted grids (coarse grid). As applied to basic relaxation methods for PDEs, it hugely accelerates elimination of the residual by restricting a smoothed version of the error term to a coarse grid, computing a correction term on the coarse grid, then interpolating this term back to the original fine grid where it is used to improve the original approximation to the solution. Multigrid methods can be developed as a V-cycle method for simple linear iterative methods. As we can see in Figure 5, there are three characteristic phases in a V-cycle method; pre-relaxation, multigrid, and post-relaxation. The pre-relaxation phase makes the error smooth by performing a relaxation method. The multigrid phase restricts the current problem to a subset of the grid points and solves a restricted problem. The post-relaxation phase perform some steps of the relaxation method again after interpolating results back to the original grid.
Figure 6: Red-black relaxation on array uf.
\begin{figure}
\small
\begin{verbatim}
static void relax(int itmax, int npf...
...j - 1] + uf [i, j + 1]);
}
}
}\end{verbatim}
\normalsize
\end{figure}
As an example we take red-block relaxation for the Laplace equation as our relaxation method. The relax operation, the restrict operation, and the interpolate operation are three main parts of a solution of 2D Poisson equation using a multigrid method. Domain decomposition, which assigns part of the grid to each processor, is used for parallelization. Boundary values of the subgrids are exchanged with nearest neighbours after each calculation. This operation is illustrated in Figure 6. In this red-black relaxation, the values of the subgrids boundary are exchanged by using writeHalo method. We assume that all distributed arrays in our examples were created with ghost regions. The implementation makes use of the integer global loop index i` (read ``i-primed'') associated with distributed index i. This value is used in computing the lower bound of the inner overall. The modulo 2 operation including i` and it, in conjunction with the loop stride of 2, ensures that sites of a single parity are updated in a given iteration it, and moreover this parity is swapped in successive iterations.
Figure 7: Restrict operation.
\begin{figure}
\small
\begin{verbatim}
static void restr(int npc, int npf, ...
... : nf - 2 : 2, 2 : nf - 2 : 2]]);
}\end{verbatim}
\normalsize
\end{figure}
The restrict operation (Figure 7) computates the residual and restricts it to a coarser grid. The residual is computed only at points of fine grid with matching points in the course gride (hence the strides of 2 in the overall constructs). A work-array, tf, aligned with uf, is used to hold the residual values. We then use a remap operation to copy these values to the coarse grid. The important remap operation allows to copy one distributed array to another of the same shape which may have unrelated distribution format. Note that we have made no assumptions of any relationship between the distribution of the fine grid array uf and the coarse grid array uc. In our code a subset of the elements of the work-array tf is copied into a subset of the array in coarse grid. Behavior of the restrict operation is illustrated in Figure 8.
Figure 8: Illustration of restrict operation
3.3in1.8in./reoperation.eps
Figure 9: Interpolate operation.
\begin{figure}
\small
\begin{verbatim}
static void interp(int npf, double[[...
... (tf [i, j - 1] + tf [i, j + 1]);
}\end{verbatim}
\normalsize
\end{figure}
Figure 10: Illustration of interpolate operation
3.3in1.8in./inoperation.eps
The interpolate operation (Figure 9) is opposite of the restriction. It sends the correction computed on the coarse grid to the finer grid, and corrects current values on that grid. The remap and the array section expression are used to copy correction, uc, from coarse grid to matching point of a work-array, tf, in fine grid. After the copying, we need to update boundary values using the writeHalo to get most up-to-date values. By the two nested overall constructs, it corrects current values of grid with the work-array tf. The first overall deals with fine grid points on horizontal links of the coarse grid and the second deals with those of vertical links--this is sufficient because it turns out that the correction is only needed on fine grid sites of a single parity. Behavior of the interpolate operation is illustrated in Figure 10. The examples here rely on the two high-level Adlib communication schedules that deal explicitly with distributed arrays; remap and writeHalo methods. The remap operation can be applied to various ranks and type of array. Any section of an array with any allowed distribution format can be used. Supported element types include Java primitive and Object types. A general API for the remap function is

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}
vo...
...,-]] dst, T [[-,-]] src) ;
...\end{verbatim}
\end{minipage}
\end{displaymath}

where T is a Java primitive or Object type. The arguments here are zero-dimensional, one-dimensional, two-dimensional, and so on. We will also summarize these in the shorthand interface:

\begin{displaymath}
\begin{minipage}[t]{\linewidth}\small\begin{verbatim}
void remap (T  ...

where the signature T # means any distributed array with elements of type T. (This form is not supported by the current HPJava compiler, though something like it will be introduced in the future. This more concise signature does not incorparte the constraint that dst and src have the same rank--that would have to be tested at run-time.) Adlib was developed as a C++ library to support HPF translation in the PCRC [15] and earlier projects [11]. Initially HPJava used a JNI wrapper interface to the C++ kernel of the PCRC library. We are in the process of converting this to pure Java, and extending it to support Java object types, and target Java communication platforms. Besides remap and writeHalo, Adlib includes a family of related regular collective communication operations (shifts, skews, transposes and so on). It incorporates a set of collective gather and scatter operations for more irregular communications, and a set of reduction operations based on the corresponding Fortran 90 array intrinsics. Reduction operations take one or more distributed arrays as input. They combine the elements to produce one or more scalar values, or arrays of lower rank.
next up previous
Next: Implementation of Collectives Up: Collective Communication for the Previous: The HPJava Language
Bryan Carpenter 2003-01-23