Next: High-level Communication Library
Up: Review of the HPJava
Previous: Introduction
Contents
HPJava--an HPspmd language
HPJava [15] is a particular implementation of the HPspmd idea.
It is a strict extension of its base language, Java, adding
some predefined classes and some extra syntax for dealing with
distributed arrays. HPJava is thus an environment for parallel programming,
especially suitable for data parallel scientific programming.
To some extent the choice of base language is accidental, and we could
have added equivalent extensions to another language, such as Fortran itself.
But Java does seem to be a better language in various respects, and it seems
likely that in the future more software will be available for modern
object-oriented languages like Java than for Fortran.
An HPJava program can freely invoke any existing Java
classes without restrictions because it incorporates all of Java as a
subset.
A concept of multidimensional distributed arrays--closely modeled on the
arrays of HPF
--has been added to Java.
Regular sections of distributed arrays are fully supported.
Distributed arrays can have any rank greater than or equal to zero and
the elements of distributed arrays can be of any standard Java type,
including primitive types, Java class types and ordinary Java array types.
A standard Java class file is produced after translating and compiling
a HPJava program. This Java class file will be executed by a distributed
collection of Java Virtual Machines.
All externally visible attributes of an HPJava class--e.g. existence of
distributed-array-valued fields or method arguments--can be automatically
reconstructed from Java signatures stored in the class file. This makes it
possible to build libraries operating on distributed arrays, while maintaining
the usual portability and compatibility features of Java. The libraries
themselves can be implemented in HPJava, or in standard Java, or through
Java Native Interface (JNI) wrappers to code implemented in other languages.
The HPJava language specification carefully documents the mapping between
distributed arrays and the standard-Java components they translate to.
Figure 3.1:
A parallel matrix addition.
![\begin{figure}
\small
\begin{verbatim}
Procs2 p = new Procs2(P, P) ;
on...
... [i, j] = a [i, j] + b [i, j] ;
}\end{verbatim}
\normalsize
\end{figure}](img15.png) |
Figure 3.1 is a simple HPJava program. It illustrates
creation of distributed arrays, and access to their elements.
An HPJava program is started concurrently in some set of processes that
are named through grids objects. The class Procs2 is a standard
library class, and represents a two dimensional grid of processes.
During the creation of
,
by
processes are selected from the
active process group. The Procs2 class extends the special base
class Group which represents a group of processes and has a privileged
status in the HPJava
language. An object that inherits this class can be used in various
special places. For example, it can be used to parameterize an on
construct. The on(p) construct is a new control construct
specifying that the enclosed actions are performed only by processes in
group
.
The distributed array is the most important feature HPJava adds to Java.
A distributed array is a collective array shared by a number of processes.
Like an ordinary array, a distributed array has some index space and stores
a collection of elements of fixed type.
The type signature of an
-dimensional distributed array involves
double brackets surrounding
comma-separated slots.
A hyphen in one of these slots indicates the dimension is distributed.
Asterisks are also allowed in these slots, specifying that
some dimensions of the array are not to be distributed, i.e. they are
``sequential'' dimensions (if all dimensions have asterisks,
the array is actually an ordinary, non-distributed, Fortran-like,
multidimensional array--a valuable addition to Java in its own right,
as many people have noted [42,43]).
In HPJava the subscripts in distributed array element references must
normally be distributed indexes (the only exceptions to this rule are
subscripts in sequential dimensions, and subscripts in arrays with
ghost regions, discussed later). The indexes must be in the
distributed range associated with the array dimension. This strict
requirement ensures that referenced array elements are held by the
process that references them.
The variables
,
, and
are all distributed array variables.
The creation expressions on the right hand side of the initializers specify
that the arrays here all have ranges x and y--they are all M
by N arrays, block-distributed over p. We see that mapping of
distributed arrays in HPJava is described in terms of the two special
classes Group and Range.
The Range is another special class with privileged status. It
represents an integer interval 0,...,
- 1, distributed somehow over
a process dimension (a dimension or axis of a grid like
).
BlockRange is a particular subclass of Range. The
arguments in the constructor of BlockRange represent the total size
of the range and the target process dimension. Thus,
has M
elements distributed over first dimension of
and
has
N elements distributed over second dimension of
.
Figure 3.2:
The HPJava Range hierarchy
 |
HPJava defines a class hierarchy of
different kinds of range object (Figure 3.2).
Each subclass
represents a different kind of distribution format for an array dimension.
The simplest distribution format is collapsed (sequential) format in
which the whole of the array dimension is mapped to the local process. Other
distribution formats (motivated by High Performance Fortran) include
regular block decomposition, and simple cyclic decomposition.
In these cases the index range (thus array
dimension) is distributed over one of the dimensions of the process grid
defined by the group object. All ranges must be
distributed over different dimensions of this grid, and if a particular
dimension of the grid is targeted by none of the ranges, the array is said to
be replicated in that dimension
.
Some of the range classes allow ghost extensions to support
stencil-based computations.
A second new control construct, overall, implements a distributed
parallel loop. It shares some characteristics of the forall
construct of HPF. The symbols i and j scoped by these
constructs are called distributed indexes. The indexes iterate
over all locations (selected here by the degenerate interval ``:'') of
ranges x and y.
HPJava also supports Fortran-like array sections. An array section
expression has a similar syntax to a distributed array element
reference, but uses double brackets. It yields a reference to a new array
containing a subset of the elements of the parent array. Those elements
can be accessed either through the parent array or through
the array section--HPJava sections behave something like array pointers
in Fortran, which can reference an arbitrary regular section of a target
array. As in Fortran, subscripts in section expressions can be index
triplets. HPJava also has built-in ideas of subranges and
restricted groups. These describe the range and distribution group of
sections, and can be also used in array constructors on the same footing
as the ranges and grids introduced earlier. They allow HPJava arrays
to reproduce any mapping allowed by the ALIGN directive of HPF.
The examples here have covered the basic syntax of HPJava. The language
itself is relatively simple. Complexities associated with varied or
irregular patterns of communication are supposed to be dealt with in
communication libraries like the ones discussed in the remainder of
this dissertation.
The examples given so far look very much like HPF data-parallel examples,
written in a different syntax. We will give one last example to emphasize the
point that the HPspmd model is not the same as the HPF model.
If we execute the following HPJava program
we could see output like:
There are 6 messages. Because the 6 processes are running concurrently in 6
JVMs, the order in which the messages appear is unpredictable. An HPJava
program is a MIMD program, and any appearance of collective behavior in
previous examples was the result of a particular programming style and a good
library of collective communication primitives. In general an HPJava program
can freely exploit the weakly coupled nature of the process cluster, often
allowing more efficient algorithms to be coded.
Next: High-level Communication Library
Up: Review of the HPJava
Previous: Introduction
Contents
Bryan Carpenter
2004-06-09