A simplified version of the code for the ``Life'' demo is given in figure 1.
p represents a 2 by 2 process grid. In
this simplified example we assume that the program executes
on exactly four processors. More generally the library provides
a member function on
Procs to determine whether the local
process holds any member of the virtual process grid. The
Procs constructor takes the current
Node object as
an argument, from which it obtains information on the available
y represent index ranges of
N (the global index is in the range
0, ..., N - 1)
distributed over the first and second dimensions of the grid
The default distribution format is blockwise. Cyclic distribution
format can also be specified by using a range object of class
CRange, which is derived from
Range (the pilot
implementation does not provide any more general distribution or
r represents the shape and distribution of a two
dimensional array. Note that this ``template'' can be shared by
several actual arrays because it does not contain a data vector.
The limited polymorphism of Java makes it awkward to create
true container classes for primitive data types.
The data vectors that hold the local segments of arrays are created separately
using the inquiry function
seg which returns the number of
locally held elements.
In the example the elements of the main data array are held in
w. The extra arrays
cn_, cp_, ..., cnn, ...
will be used to hold arrays on neighbour sites.
The ``forall loop'' initializing
w should be read as something like
forall(i in range x, j in range y) w(i, j) = fun(i, j)where
funis some function of the global indices defining the initial state of the life board. The members
nextupdate the state of
rand the range structures contained in it so that
r.sub()returns the local subscript for the current iteration, and
y.idx()return the global index values for the current iteration
The main loop uses cyclic
shift operations to obtain neighbours,
communicating data where necessary. The
shift operation is a
member of the
Node class. Eventually it will be overloaded to
accept data vectors of any primitive type--here the array elements are
w is implemented in terms of its neighbours. This could
have been done using a ``forall loop'', but since global index values
are not needed here the loop has been optimised for a simple for loop
over the local segment. This performance-critical inner loop is coded
at least as efficiently as a typical sequential program.
Note some characteristic features of the data-parallel style of programming: