next up previous contents
Next: Starting an HPJ program Up: Parallel arrays and data Previous: Parallel arrays and data   Contents

Example data parallel Java program

A simplified version of the code for the ``Life'' demo is given in figure 1.

Figure 1: Simplified code of the Life demo program.
\begin{figure}\footnotesize\begin{verbatim}public class Life extends Node impl...
...] = 1; break;
default: w[i] = 0; break;
}
}
}
}
}\end{verbatim}\end{figure}

The object p represents a 2 by 2 process grid. In this simplified example we assume that the program executes on exactly four processors. More generally the library provides a member function on Procs to determine whether the local process holds any member of the virtual process grid. The Procs constructor takes the current Node object as an argument, from which it obtains information on the available physical processes.

The objects x and y represent index ranges of size N (the global index is in the range 0, ..., N - 1) distributed over the first and second dimensions of the grid p. The default distribution format is blockwise. Cyclic distribution format can also be specified by using a range object of class CRange, which is derived from Range (the pilot implementation does not provide any more general distribution or alignment options).

The object r represents the shape and distribution of a two dimensional array. Note that this ``template'' can be shared by several actual arrays because it does not contain a data vector. The limited polymorphism of Java makes it awkward to create true container classes for primitive data types. The data vectors that hold the local segments of arrays are created separately using the inquiry function seg which returns the number of locally held elements. In the example the elements of the main data array are held in w. The extra arrays cn_, cp_, ..., cnn, ... will be used to hold arrays on neighbour sites.

The ``forall loop'' initializing w should be read as something like

  forall(i in range x, j in range y)
    w(i, j) = fun(i, j)
where fun is some function of the global indices defining the initial state of the life board. The members forall, test, next update the state of r and the range structures contained in it so that r.sub() returns the local subscript for the current iteration, and x.idx() and y.idx() return the global index values for the current iteration

The main loop uses cyclic shift operations to obtain neighbours, communicating data where necessary. The shift operation is a member of the Node class. Eventually it will be overloaded to accept data vectors of any primitive type--here the array elements are bytes.

Finally w is implemented in terms of its neighbours. This could have been done using a ``forall loop'', but since global index values are not needed here the loop has been optimised for a simple for loop over the local segment. This performance-critical inner loop is coded at least as efficiently as a typical sequential program.

Note some characteristic features of the data-parallel style of programming:


next up previous contents
Next: Starting an HPJ program Up: Parallel arrays and data Previous: Parallel arrays and data   Contents
Bryan Carpenter 2002-07-12