The language extensions described earlier were devised partly to provide a convenient interface to a distributed-array library developed in the Parallel Compiler Runtime Consortium (PCRC) project .
Compared with HPF, translation of the HPspmd languages is very straightforward. The HPJava compiler, for example, is being implemented initially as a translator to ordinary Java, through a compiler construction framework developed in the PCRC project. The distributed arrays of the extended language appear in the emitted code as a pair--an ordinary Java array of local elements and a Distributed Array Descriptor object (DAD). In the initial implementation, details of the distribution format, including non-trivial details of global-to-local translation of the subscripts, are managed in the runtime library. Even with these overheads, acceptable performance is achievable, because in useful parallel algorithms most work on distributed arrays occurs inside overall constructs with large ranges. In normal usage, the formulae for address translation can be linearized inside these constructs, and the cost of runtime calls handling non-trivial aspects of address translation (including array bounds checking) can be amortized in the startup overheads of the loop. These compiler optimizations will be important in the base level translator. If array accesses are genuinely irregular, the necessary subscripting cannot usually be directly expressed in our language; subscripts cannot be computed randomly in parallel loops without violating the SPMD restriction that accesses be local. This is not necessarily a shortcoming: it forces explicit use of an appropriate library package for handling irregular accesses (such as CHAOS, see section 2.2).
The basic HPJava translator will be available by the start date of the proposed work. In figure 1 we give benchmark results for HPJava examples manually converted to Java, following the translation scheme outlined above. The examples are essentially the ones described in section 3.1. The parallel programs are executed on 4 sparc-sun-solaris2.5.1 using MPICH and the Java JIT compiler in JDK 1.2Beta2, through a JNI interface to Adlib for collective communications. In both cases arrays are 1024 by 1024. For Jacobi iteration, the timing is for about 90 iterations. Timings are compared with sequential Java and C++ versions of the code (horizontal lines). Note that poor scaling in the Cholesky case is attributable to the poor performance of MPICH on this platform not overheads of HPJava. Scaling will be much improved by using SunHPC MPI.
The single-processor HPJava performance is better than sequential Java, because the pure Java version was coded in the natural way, using two-dimensional arrays--quite inefficient in Java. The HPJava translation scheme linearizes arrays. (We remark that in recent workshops James Gosling has stated that this is his preferred approach to adding generalized array-like structure in Java.) Although absolute performance is still somewhat lower than C++, Java performance has improved dramatically over the last year, and we expect to see further gains. Parity between Java and C or Fortran no longer seems an unrealistic expectation. In fact, even if the performance of Java does not rapidly approach that of C and Fortran, Java remains an excellent research platform for the general language model we espouse. It combines strong support for dynamic and object-oriented programming in a relatively simple language, for which preprocessors for extended versions of the language (``little languages'') are a straightforward proposition.