A simplified version of the code for the ``Life'' demo is given in figure 1.

The object `p`

represents a 2 by 2 process grid. In
this simplified example we assume that the program executes
on exactly four processors. More generally the library provides
a member function on `Procs`

to determine whether the local
process holds any member of the virtual process grid. The
`Procs`

constructor takes the current `Node`

object as
an argument, from which it obtains information on the available
physical processes.

The objects `x`

and `y`

represent index ranges of
size `N`

(the global index is in the range `0, ..., N - 1`

)
distributed over the first and second dimensions of the grid `p`

.
The default distribution format is blockwise. Cyclic distribution
format can also be specified by using a range object of class
`CRange`

, which is derived from `Range`

(the pilot
implementation does not provide any more general distribution or
alignment options).

The object `r`

represents the shape and distribution of a two
dimensional array. Note that this ``template'' can be shared by
several actual arrays because it does not contain a data vector.
The limited polymorphism of Java makes it awkward to create
true container classes for primitive data types.
The data vectors that hold the local segments of arrays are created separately
using the inquiry function `seg`

which returns the number of
locally held elements.
In the example the elements of the main data array are held in
`w`

. The extra arrays `cn_, cp_, ..., cnn, ...`

will be used to hold arrays on neighbour sites.

The ``forall loop'' initializing `w`

should be read as something like

forall(i in range x, j in range y) w(i, j) = fun(i, j)where

`fun`

is some function of the global indices defining the
initial state of the life board. The members `forall`

,
`test`

, `next`

update the state of `r`

and the range
structures contained in it so that `r.sub()`

returns the local
subscript for the current iteration, and `x.idx()`

and
`y.idx()`

return the global index values for the current iteration
The main loop uses cyclic `shift`

operations to obtain neighbours,
communicating data where necessary. The `shift`

operation is a
member of the `Node`

class. Eventually it will be overloaded to
accept data vectors of any primitive type--here the array elements are
`byte`

s.

Finally `w`

is implemented in terms of its neighbours. This could
have been done using a ``forall loop'', but since global index values
are not needed here the loop has been optimised for a simple for loop
over the local segment. This performance-critical inner loop is coded
at least as efficiently as a typical sequential program.

Note some characteristic features of the data-parallel style of programming:

- The distribution format of the arrays can be changed just by altering a few parameters at the start of the program--the main program is insensitive to these details
- low level message-passing is abstracted into high-level collective operations on distributed array structures.