It is generally accepted that data parallel programming has a vital role in high-performance scientific computing. The basic implementation issues related to this paradigm are well understood. But the choice of high-level programming environment remains uncertain. Five years ago the High Performance Fortran Forum published the first standardized definition of a language for data parallel programming [19,29]. In the intervening period considerable progress has been made in HPF compiler technology, and the HPF language definition has been extended and revised in response to demands of compiler-writers and end-users . Yet it seems to be the case that most programmers developing parallel applications--or environments for parallel application development--do not code in HPF. The slow uptake of HPF can be attributed in part to immaturity in the current generation of compilers. But there is the suspicion that many programmers are actually more comfortable with the lower-level Single Program Multiple Data (SPMD) programming style, perhaps because the effect of executing an SPMD program is more controllable, and the process of tuning for efficiency is more intuitive. (Partially, no doubt, this reflects a status quo where expert programmers build parallel programs and less experienced programmers merely use them.)
SPMD programming has been very successful. There are countless applications written in the most basic SPMD style, using direct message-passing through MPI  or similar low-level packages. Many higher-level parallel programming environments and libraries assume the SPMD style as their basic model. Examples include ScaLAPACK , PetSc , DAGH , Kelp [30,18], the Global Array Toolkit  and NWChem [3,27]. While there remains a prejudice that HPF is best suited for problems with very regular data structures and regular data access patterns, SPMD frameworks like DAGH and Kelp have been designed to deal directly with irregularly distributed data, and other libraries like CHAOS/PARTI [35,16] and Global Arrays support unstructured access to distributed arrays. These successes aside, the library-based SPMD approach to data-parallel programming certainly lacks the uniformity and elegance of HPF. All the environments referred to above have some idea of a distributed array, but they all describe those arrays differently. Compared with HPF, creating distributed arrays and accessing their local and remote elements is clumsy and error-prone. Because the arrays are managed entirely in libraries, the compiler offers little support and no safety net of compile-time checking.
The work described here will investigate a class of programming languages that borrow certain ideas, various run-time technologies, and some compilation techniques from HPF, but relinquish some of its basic tenets, in particular: that the programmer should write in a language with (logically) a single global thread of control, that the compiler should determine automatically which processor executes individual computations in a program, and that the compiler should automatically insert communications if an individual computation involves accesses is to array element held outside the local processor.
If these foundational assumptions are removed from the HPF model, does anything useful remain? In fact, yes. What will be retained is an explicitly SPMD programming model complemented by syntax for representing distributed arrays, syntax for expressing that certain computations are localized to certain processors, and syntax for expressing concisely a distributed form of the parallel loop. The claim is that these are just the features needed to make calls to various data-parallel libraries, including application-oriented libraries and high-level libraries for communication, about as convenient as, say, making a call to an array transformational intrinsic function in Fortran 90. We hope to illustrate that, besides their advantages as a framework for library usage, the resulting programming languages can conveniently express various practical data-parallel algorithms. The resulting framework may also have better prospects for dealing effectively with irregular problems than is the case for HPF.
This proposal brings together several important research areas including parallel compilers, data parallel SPMD libraries and object oriented programming models. We research combinations of these ideas which achieve high performance with an approach that implies more work for the programmer than envisaged in systems such as HPF, but can more clearly be implemented in a robust fashion on a range of languages. Explicitly we are combining our research on the use of Java and Web technologies with the high performance SPMD libraries and some of the compiler techniques developed as part of HPF research. Java has many features that suggest it could be a very attractive language for scientific and engineering or what we now term ``Grande'' applications. Clearly Java needs many improvements both to the language and the support environment to achieve the required linkage of high performance with expressivity. This cannot be guaranteed but we have helped set in motion a community activity involving academia, government and industry (including IBM, Intel, Microsoft, Oracle, Sun and perhaps most importantly James Gosling from Javasoft) which is designed to both address language changes and the establishment of standards for numerical libraries and distributed scientific objects. The Java environment is still malleable and we are optimistic that this effort will be succesful and Java will emerge as a premier language for large scale computation. Our research will be aimed at multi-language programming paradigms but our new implementations will focus on Java exploiting existing high performance C++ and Fortran libraries. Our collaborater Professor Xiaoming Li from Peking University will be developing the Fortran and C++ aspects of this general high level SPMD environment. We can consider our work from either of two points of view; bringing the power of Java to a data parallel SPMD environment or alternatively researching the expression of data parallelism within Java. Note that we are adopting a more modest approach than a full scale data parallel compiler like HPF; we believe this is an appropriate approach to Java where the situation is changing rapidly and one needs to be very flexible.
We should stress what we are not doing! Many of the discussions of Java at the recent ``Grande'' workshops [23,24,22] have focussed on its use in distributed object and mobile or Web client based computing. In fact our group also is looking into this for composing large scale distributed systems. However in this proposal, we are addressing ``hard-core'' science and engineering computations where data parallelism and the highest performance are viewed as critical.
The work described in this report will continue research conducted in the the Parallel Compiler Runtime Consortium (PCRC) project . PCRC was a DARPA-supported project involving Rice, Maryland, Austin, Indiana, CSC, Rochester and Florida, with NPAC as prime contractor. Achievements included construction of an experimental HPF compilation system , delivery of the NPAC PCRC runtime kernel (Adlib)  and early work on the design and implementation of HPJava .