Next: Direct Matrix Multiplication
Up: Towards Efficient Compilation of
Previous: Discussion
Contents
Benchmarking HPJava, Part II:
Performance on Parallel Machines
In this chapter we will benchmark the HPJava programs introduced
in chapter 6 on parallel machines. In chapter
6 we have seen that HPJava produces quite efficient node
code, and also that our HPJOPT2 optimization scheme dramatically
improves performance on a single processor. With successful benchmarks
on one processor, we expect the HPJava system to achieve
high-performance on parallel machines.
The main issue of this chapter is to observe performance of HPJava
programs on the parallel machines. It won't be necessary to critically
analyze behaviour and performance of HPJava programs on these
machines. We are reasonably confident that good results for node code
will carry over to multi-processors because of the way that
translation and optimization works.
In addition to HPJava and sequential Java programs, we also benchmark
C and Fortran programs on each machine to compare to the
performance. For now the C and Fortran programs are
sequential programs, not parallel programs.
To assess performance on multi-processors, the benchmarks will be
performed on the following machines:
- Shared Memory Machine - Sun Solaris 9 with 8 UltraSPARC
III Cu 900 MHz Processors and 16 GB of main memory.
- Distributed Memory Machine - IBM SP3 running with four Power3
375 MHz CPUs and 2 GB of memory on each node.
The table 7.1 lists the compilers and
optimization options used on each machine. This chapter is based on
our recent publications such as [31,30].
The benchmarks on SP3 are from [32]. It
experimented with only naively translated HPJava programs.
Table 7.1:
Compilers and optimization options lists used on parallel machines.
|
|
Shared |
Distributed |
|
|
Memory |
Memory |
|
|
Sun JDK 1.4.1 (JIT) |
IBM Developer kit |
|
HPJava |
with -server -O |
1.3.1 (JIT) with -O |
|
|
Sun JDK 1.4.1 (JIT) |
IBM Developer kit |
|
Java |
with -server -O |
1.3.1 (JIT) with -O |
|
C |
gcc -O5 |
|
|
Fortran |
|
F90 -O5 |
Subsections
Next: Direct Matrix Multiplication
Up: Towards Efficient Compilation of
Previous: Discussion
Contents
Bryan Carpenter
2004-06-09