
Figure 6.7 show result of four different versions (HPJava, sequential Java, HPF and Fortran) of redblack relaxation of the two dimensional Laplace equation with size of 512 by 512. In our runs HPJava can outperform sequential Java by up to 17 times. On 36 processors HPJava can get about 79% of the performance of HPF. It is not very bad performance for the initial benchmark result without any serious optimization. Performance of the HPJava will be increased by applying optimization strategies as described in a previous paper [37]. Scaling behavior of HPJava is slightly better then HPF, though this mainly reflects the low performance of a single Java node compared to Fortran. We do not believe that the current communication library of HPJava is faster than the HPF library because our communication library is built on top of the portability layers, mpjdev and MPI, while IBM HPF is likely to use a platform specific communication library. But future versions of Adlib could be optimized for the platform.
Complete performance results of redblack relaxation of the Laplace equation are given in Table 6.1.
We see similar behavior on large size of three dimensional Diffusion equation benchmark (Figure 6.8). In general we expect 3 dimensional problems will be more amenable to parallelism, because of the large problem size.
On a small problem size the three dimensional Diffusion equation benchmark (Figure 6.9) we can see the speed of sequential Fortran is about 45 times faster then Java. Benchmarking results from [11] do not see this kind of result on other platformsa factor of 2 or less is common. Either IBM version of Fortran is very good or we are using an old Java compiler (JDK 1.3.1).

Complete performance results of three dimensional Diffusion equation are given in Table 6.2.

Speedup of HPJava for the various applications is summarized in Table 6.4. Different size of problems are measured on different numbers of processors. For the reference value, we are using the result of the singleprocessor HPJava version. As we can see on the table we are getting up to 25.77 times speedup on Laplace equation using 36 processors with problem size of . Many realistic applications with more computation for each grid point (for example CFD which will be discussed in next section) will be more suitable for the parallel implementation than the Laplace equation and simple benchmarks described in this section.
