next up previous contents index
Next: Parallel Programming Up: Processes and Distributed Arrays Previous: Process Grids   Contents   Index

Distributed Arrays

Probably the most important feature HPJava adds to Java is the distributed array. A distributed array is a collective multiarray, shared by a number of processes. Like an ordinary multiarray, a distributed array has a rectangular, multidimensional index space, and stores a collection of elements of fixed type. Unlike an ordinary array, the index space and associated elements are scattered across the processes that share the array.

The distribution of an index space is parametrized by objects belonging to another class that (like Group) has special status in the HPJava language. This is the hpjava.lang.Range class. Instances of Range classes are called distributed ranges.

The global index values for a dimension of a distributed array always lie in the range $0, \ldots, N-1$, where $N$ is the extent of the dimension. This is the same as for a multiarray. The distributed range object for the dimension specifies the extent; it also specifies a process dimension over which the indexes are scattered, and it specifies the format in which the indexes are distributed over the process dimension.

Distributed arrays can be thought of as a generalization of multiarrays, and their type signatures look quite similar. The only difference is that a distributed array uses hyphens instead of asterisks in the dimension slots (we will see later that actually you can mix hyphens and asterisks in a distributed array type signature). Distributed array creation expressions look very similar to multiarray creation expressions, except that distributed ranges appear in place of the integer extent expressions.

In the following example we create a two-dimensional, N by N, array of floating point numbers, with elements distributed over the grid p.

...] a = new float [[x, y]] ;...

The decomposition of this array for the case N = 8 is illustrated in Figure 2.4. The choice of subclass BlockRange for the index ranges of the array means that the index space of each array dimension is broken down into consecutive blocks. Other possible distribution formats will be discussed later. The constructor for BlockRange takes two arguments: the extent of the range, and the process dimension over which the range is distributed.

Figure 2.4: A two-dimensional array distributed over p.

In this example the distributed array was created inside an on(p) construct. This ensured that the distribution group for elements of a was the grid p (the active process group at this point). We will see later that it is possible to create distributed arrays outside such constructs, but in our initial examples will use on constructs to implicitly set the distribution group.

In any case, the ranges of a distributed array must be distributed over different dimensions of the array's distribution group2.4. If they are not, an hpjava.lang.DimensionNotInGroupException will be thrown. Not surprisingly, distributed array creation is a collective operation, and it can only appear inside HPspmd code.

For illustration the distributed arrays appearing in examples in this report are often quite small. We should point out that in practice it's unlikely that it would worthwhile to distribute such small arrays. For data parallel programming to be effective the distributed data structures should usually be large.

next up previous contents index
Next: Parallel Programming Up: Processes and Distributed Arrays Previous: Process Grids   Contents   Index
Bryan Carpenter 2003-04-15