A simplified version of the code for the ``Life'' demo is given in figure 1.
The object p represents a 2 by 2 process grid. In
this simplified example we assume that the program executes
on exactly four processors. More generally the library provides
a member function on Procs to determine whether the local
process holds any member of the virtual process grid. The
Procs constructor takes the current Node object as
an argument, from which it obtains information on the available
physical processes.
The objects x and y represent index ranges of
size N (the global index is in the range 0, ..., N - 1)
distributed over the first and second dimensions of the grid p.
The default distribution format is blockwise. Cyclic distribution
format can also be specified by using a range object of class
CRange, which is derived from Range (the pilot
implementation does not provide any more general distribution or
alignment options).
The object r represents the shape and distribution of a two
dimensional array. Note that this ``template'' can be shared by
several actual arrays because it does not contain a data vector.
The limited polymorphism of Java makes it awkward to create
true container classes for primitive data types.
The data vectors that hold the local segments of arrays are created separately
using the inquiry function seg which returns the number of
locally held elements.
In the example the elements of the main data array are held in
w. The extra arrays cn_, cp_, ..., cnn, ...
will be used to hold arrays on neighbour sites.
The ``forall loop'' initializing w should be read as something like
forall(i in range x, j in range y)
w(i, j) = fun(i, j)
where fun is some function of the global indices defining the
initial state of the life board. The members forall,
test, next update the state of r and the range
structures contained in it so that r.sub() returns the local
subscript for the current iteration, and x.idx() and
y.idx() return the global index values for the current iteration
The main loop uses cyclic shift operations to obtain neighbours,
communicating data where necessary. The shift operation is a
member of the Node class. Eventually it will be overloaded to
accept data vectors of any primitive type--here the array elements are
bytes.
Finally w is implemented in terms of its neighbours. This could
have been done using a ``forall loop'', but since global index values
are not needed here the loop has been optimised for a simple for loop
over the local segment. This performance-critical inner loop is coded
at least as efficiently as a typical sequential program.
Note some characteristic features of the data-parallel style of programming: