next up previous contents
Next: High-level Communication Library Up: Review of the HPJava Previous: Introduction   Contents


HPJava--an HPspmd language

HPJava [15] is a particular implementation of the HPspmd idea. It is a strict extension of its base language, Java, adding some predefined classes and some extra syntax for dealing with distributed arrays. HPJava is thus an environment for parallel programming, especially suitable for data parallel scientific programming. To some extent the choice of base language is accidental, and we could have added equivalent extensions to another language, such as Fortran itself. But Java does seem to be a better language in various respects, and it seems likely that in the future more software will be available for modern object-oriented languages like Java than for Fortran. An HPJava program can freely invoke any existing Java classes without restrictions because it incorporates all of Java as a subset. A concept of multidimensional distributed arrays--closely modeled on the arrays of HPF[*]--has been added to Java. Regular sections of distributed arrays are fully supported. Distributed arrays can have any rank greater than or equal to zero and the elements of distributed arrays can be of any standard Java type, including primitive types, Java class types and ordinary Java array types. A standard Java class file is produced after translating and compiling a HPJava program. This Java class file will be executed by a distributed collection of Java Virtual Machines. All externally visible attributes of an HPJava class--e.g. existence of distributed-array-valued fields or method arguments--can be automatically reconstructed from Java signatures stored in the class file. This makes it possible to build libraries operating on distributed arrays, while maintaining the usual portability and compatibility features of Java. The libraries themselves can be implemented in HPJava, or in standard Java, or through Java Native Interface (JNI) wrappers to code implemented in other languages. The HPJava language specification carefully documents the mapping between distributed arrays and the standard-Java components they translate to.
Figure 3.1: A parallel matrix addition.
\begin{figure}
\small
\begin{verbatim}
Procs2 p = new Procs2(P, P) ;
on...
... [i, j] = a [i, j] + b [i, j] ;
}\end{verbatim}
\normalsize
\end{figure}
Figure 3.1 is a simple HPJava program. It illustrates creation of distributed arrays, and access to their elements. An HPJava program is started concurrently in some set of processes that are named through grids objects. The class Procs2 is a standard library class, and represents a two dimensional grid of processes. During the creation of $ p$, $ P$ by $ P$ processes are selected from the active process group. The Procs2 class extends the special base class Group which represents a group of processes and has a privileged status in the HPJava language. An object that inherits this class can be used in various special places. For example, it can be used to parameterize an on construct. The on(p) construct is a new control construct specifying that the enclosed actions are performed only by processes in group $ p$. The distributed array is the most important feature HPJava adds to Java. A distributed array is a collective array shared by a number of processes. Like an ordinary array, a distributed array has some index space and stores a collection of elements of fixed type. The type signature of an $ r$-dimensional distributed array involves double brackets surrounding $ r$ comma-separated slots. A hyphen in one of these slots indicates the dimension is distributed. Asterisks are also allowed in these slots, specifying that some dimensions of the array are not to be distributed, i.e. they are ``sequential'' dimensions (if all dimensions have asterisks, the array is actually an ordinary, non-distributed, Fortran-like, multidimensional array--a valuable addition to Java in its own right, as many people have noted [42,43]). In HPJava the subscripts in distributed array element references must normally be distributed indexes (the only exceptions to this rule are subscripts in sequential dimensions, and subscripts in arrays with ghost regions, discussed later). The indexes must be in the distributed range associated with the array dimension. This strict requirement ensures that referenced array elements are held by the process that references them. The variables $ a$, $ b$, and $ c$ are all distributed array variables. The creation expressions on the right hand side of the initializers specify that the arrays here all have ranges x and y--they are all M by N arrays, block-distributed over p. We see that mapping of distributed arrays in HPJava is described in terms of the two special classes Group and Range. The Range is another special class with privileged status. It represents an integer interval 0,..., $ N$ - 1, distributed somehow over a process dimension (a dimension or axis of a grid like $ p$). BlockRange is a particular subclass of Range. The arguments in the constructor of BlockRange represent the total size of the range and the target process dimension. Thus, $ x$ has M elements distributed over first dimension of $ p$ and $ y$ has N elements distributed over second dimension of $ p$.
Figure 3.2: The HPJava Range hierarchy
\begin{figure}
\centerline{\epsfig{figure=Figs/range.eps,width=3in,height=3in}}
\end{figure}
HPJava defines a class hierarchy of different kinds of range object (Figure 3.2). Each subclass represents a different kind of distribution format for an array dimension. The simplest distribution format is collapsed (sequential) format in which the whole of the array dimension is mapped to the local process. Other distribution formats (motivated by High Performance Fortran) include regular block decomposition, and simple cyclic decomposition. In these cases the index range (thus array dimension) is distributed over one of the dimensions of the process grid defined by the group object. All ranges must be distributed over different dimensions of this grid, and if a particular dimension of the grid is targeted by none of the ranges, the array is said to be replicated in that dimension[*]. Some of the range classes allow ghost extensions to support stencil-based computations. A second new control construct, overall, implements a distributed parallel loop. It shares some characteristics of the forall construct of HPF. The symbols i and j scoped by these constructs are called distributed indexes. The indexes iterate over all locations (selected here by the degenerate interval ``:'') of ranges x and y. HPJava also supports Fortran-like array sections. An array section expression has a similar syntax to a distributed array element reference, but uses double brackets. It yields a reference to a new array containing a subset of the elements of the parent array. Those elements can be accessed either through the parent array or through the array section--HPJava sections behave something like array pointers in Fortran, which can reference an arbitrary regular section of a target array. As in Fortran, subscripts in section expressions can be index triplets. HPJava also has built-in ideas of subranges and restricted groups. These describe the range and distribution group of sections, and can be also used in array constructors on the same footing as the ranges and grids introduced earlier. They allow HPJava arrays to reproduce any mapping allowed by the ALIGN directive of HPF. The examples here have covered the basic syntax of HPJava. The language itself is relatively simple. Complexities associated with varied or irregular patterns of communication are supposed to be dealt with in communication libraries like the ones discussed in the remainder of this dissertation. The examples given so far look very much like HPF data-parallel examples, written in a different syntax. We will give one last example to emphasize the point that the HPspmd model is not the same as the HPF model. If we execute the following HPJava program

$\displaystyle \begin{minipage}[t]{\linewidth}\small\verb$ Procs2 p = new Procs2...
...() +$\\
\verb$ , '' + e.crd() + '')'') ; $\\
\verb$ }$ \\ 
\end{minipage}
$

we could see output like:

$\displaystyle \begin{minipage}[t]{\linewidth}\small\verb$ My coordinates are (0...
...ordinates are (1, 1)$\\
\verb$ My coordinates are (0, 1)$\\ 
\end{minipage}
$

There are 6 messages. Because the 6 processes are running concurrently in 6 JVMs, the order in which the messages appear is unpredictable. An HPJava program is a MIMD program, and any appearance of collective behavior in previous examples was the result of a particular programming style and a good library of collective communication primitives. In general an HPJava program can freely exploit the weakly coupled nature of the process cluster, often allowing more efficient algorithms to be coded.
next up previous contents
Next: High-level Communication Library Up: Review of the HPJava Previous: Introduction   Contents
Bryan Carpenter 2004-06-09