next up previous contents index
Next: HPspmd Classes Up: Parallel Programming in HPJava Previous: Managing Expectations   Contents   Index

Processes and Distributed Arrays

In this chapter we start to discuss parallel programming in HPJava.

The HPJava parallel programming model is one of explicitly cooperating processes. It is an implementation of of the Single Program, Multiple Data (SPMD) model. In this model a group of processors or processes execute the same program text, but the data structures--in our case the elements of distributed arrays--are divided across processes. Individual processes operate on the locally owned segment of an entire array. At some points in the computation processes usually need to access elements owned by their peers. At these points explicit communications are needed to enable this access.

This general scheme has been very successful in realistic applications. Probably most successful applications of parallel computing to large scientific and numerical problems are programmed in this style. So HPJava is attempting to add some support at the language level for the established practices of programmers. Perhaps more importantly, it also provides a framework for the development of libraries of subroutines operating on distributed data.

What HPJava is not is any kind of parallelizing compiler--the HPJava software certainly cannot take a sequential program and convert it automatically into a job that runs efficiently on a parallel computer. Before writing an HPJava program you must think about the kind of parallel algorithms you are going to employ, and about what kind of communications between processors these algorithms imply. HPJava is basically a notation for expressing and implementing the parallel algorithms you come up with after that.

HPJava was designed for writing programs for distributed memory parallel computers. In principle that could mean just about any collection of computers joined by a fast enough network connection. The ``distributed memory'' qualifier just means that the participating processors don't need to share a common main memory. In other words, generally speaking, the default situation. It is relatively straightforward, for example, to run an HPJava program on a group of Linux PCs connected by Ethernet. In practice such ad hoc clusters probably have a high communication latency, and getting good performance on these platforms may be hard. A dedicated cluster with a high performance interconnect, or a proprietary parallel computer like an IBM SP machine, may be more a promising platform for the average HPJava program.

With this kind of target in mind, the underlying programming model is one of communicating processes, where we use the term ``process'' in the usual sense of operating systems. A process is a self-contained context in which a program executes with its own thread (or threads) of control and its own, protected, memory, and associated address space. Generally speaking this memory is inaccessible to other processes. In HPJava a ``process'' in this sense is always a Java Virtual Machine.

This was the original programming model we had in mind, and it remains the main rationale for the design of HPJava. Fairly late in the day--and perhaps a little reluctantly--we accepted that, at least for development purposes, it was very conveninent to be able to run HPJava programs in a different mode. In this mode, the processes of an HPJava program are mapped to the Java threads of a single JVM. This allows you to debug and demonstrate your HPJava programs without facing the ordeal of installing MPI or running on a network. As a byproduct, it also means you can run HPJava programs on shared memory parallel computers. These kinds of machines are quite widely available today--sold, for example, as high-end UNIX servers. Because the Java threads of modern JVMs are usually executed in parallel on this kind of machine, it is possible to get quite reasonable parallel speedup running HPJava programs in the multithreaded mode.

So HPJava now has two execution models: the original multi-process model, and the newer multithreaded model. Throughout most of this report we will use the term ``process'' interchangeably to mean an actual process in the multi-process model, or a thread started when an HPJava program is run under the multithreaded model. Section 2.7 discusses the relationship between the two models in more detail, and explains how to run code under the multi-process model.

next up previous contents index
Next: HPspmd Classes Up: Parallel Programming in HPJava Previous: Managing Expectations   Contents   Index
Bryan Carpenter 2003-04-15