next up previous
Next: Discussion Up: Collective Communication for the Previous: Message Format


Benchmarks

The results of our benchmarks use an IBM SP3 running with four Power3 375MHz CPUs and 2GB of memory on each node. This machine uses AIX version 4.3 operating system and the IBM Developer Kit 1.3.1 (JIT) for the Java system. We are using the shared ``css0'' adapter with User Space(US) communication mode for MPI setting and -O compiler command for Java. For comparison, we also have completed experiments for sequential Java, Fortran and HPF version of the HPJava programs. For the HPF version of program, it uses IBM XL HPF version 1.4 with xlhpf95 compiler commend and -O3 and -qhot flag. And XL Fortran for AIX with -O5 flag is used for Fortran version.
Figure 18: Red-black relaxation of two dimensional Laplace equation with size of $512^2$.
3.3in2.5in./pde512.eps
Figure 18 shows result of four different versions (HPJava, sequential Java, HPF and Fortran) of red-black relaxation of the two dimensional Laplace equation with size of 512 by 512. In our runs HPJava can out-perform sequential Java by up to 17 times. On 36 processors HPJava can get about 78% of the performance of HPF. It is not very bad performance for the initial benchmark result without any serious optimization. Performance of the HPJava will be increased by applying optimization strategies as decribed in a previous paper [10]. Scaling behavior of HPJava is slightly better then HPF. Probably, this mainly reflects the low performance of a single Java node compare to Fortran. We do not believe that the current communication library of HPJava is faster than the HPF libray because our communication library is built on top of the portablity layers, mpjdev and MPI, while IBM HPF is likely to use a platform specific communication library. But clearly future versions of Adlib could be optimized for the platform. We see similar behavior on large size of three dimensional Diffusion equation benchark (Figure 19). In general we expect 3 dimensional problems will be more amenable to parallelism, because of the large problem size.
Figure 19: Three diemnsional Diffusion equation with size of $128^3$.
3.3in2.5in./diff128.eps
On a small problem size the three dimensional Diffusion equation benchark (Figure 20) we can see the speed of sequential Fortran is about 4-5 times faster then Java. Benchmarking results from [4] do not see this kind of result on other platforms--a factor of 2 is common. Either IBM version of Fortran is very good or we are using an old Java compiler (JDK 1.3.1).
Figure 20: Three diemnsional Diffusion equation with size of $32^3$.
3.3in2.5in./diff32.eps
Finally, we consider benchmark results on our original problem, the multigrid solver, in Figure 21. For the complete multigrid algorithm we currently get slightly disappointing speedup and absolute performance. There results are new at the time of writing, and neither the HPJava translation scheme or the Adlib implementation are yet optimized. We expect there is plenty of low hanging fruit in terms of opprtunities for improving both.
Figure 21: Multigrid slover with size of $512^2$.
3.3in2.5in./pde2-512.eps
Speedup of HPJava is summarized in Table I. Different size of problems are measured on different numbers of processors. For the reference value, we are using the result of the single-processor HPJava version. As we can see on the table we are getting upto 26.77 times speedup on Laplace equation using 36 processors with problem size of $1024^2$. Many realistic applications with more computation for each grid point (for example CFD) will be more suitable for the parallel implementation than the Laplace equation and similar simple benchmarks described here. Many such algorithms will be equally amenable to implementation in HPJava--see for example the CFD demo at www.hpjava.org. 3in

Table I: Speedup of HPJava benchmarks as compared with 1 processor HPJava.
Multigrid Slover
Processors 2 3 4 6 9
$512^2$ 1.90 2.29 2.39 2.96 3.03
2D Laplace Equation
Processors 4 9 16 25 36
$256^2$ 2.67 3.73 4.68 6.23 6.23
$512^2$ 4.06 7.75 9.47 12.18 17.04
$1024^2$ 3.67 9.95 15.1 21.75 26.77
3D Diffusion Equation
Processors 4 8 16 32
$32^3$ 2.72 3.45 4.75 5.43
$64^3$ 3.00 4.85 7.47 8.92
$128^3$ 3.31 5.76 9.98 13.88



next up previous
Next: Discussion Up: Collective Communication for the Previous: Message Format
Bryan Carpenter 2003-01-23