next up previous contents
Next: HPJava with GUI Up: Partial Differential Equations Previous: An Application   Contents

Evaluation

Before attempting benchmark the full multigrid application, we experiment with simple kernel applications like Laplace equation and diffusion equation.

Figure 6.7: Red-black relaxation of two dimensional Laplace equation with size of $ 512^2$.
\resizebox{3.3in}{2.5in}{\includegraphics{Figs/pde512.eps}}

Table 6.1: Red-black relaxation performance. All speeds in MFLOPS.
$ 256^2$
Processors 1 4 9 16 25 36
HPF 263.33 358.42 420.23 430.11 441.89 410.93
HPJava 69.12 184.33 258.06 322.58 430.12 430.10
Fortran 224.40
Java 73.59

$ 512^2$
Processors 1 4 9 16 25 36
HPF 190.32 622.99 954.49 1118.71 1253.49 1316.96
HPJava 61.44 247.72 472.91 650.26 743.15 1040.40
Fortran 217.66
Java 59.98

$ 1024^2$
Processors 1 4 9 16 25 36
HPF 104.66 430.27 1558.93 2153.58 2901.34 3238.71
HPJava 62.36 274.86 549.73 835.59 1228.81 1606.90
Fortran 149.11
Java 58.73


Figure 6.7 show result of four different versions (HPJava, sequential Java, HPF and Fortran) of red-black relaxation of the two dimensional Laplace equation with size of 512 by 512. In our runs HPJava can out-perform sequential Java by up to 17 times. On 36 processors HPJava can get about 79% of the performance of HPF. It is not very bad performance for the initial benchmark result without any serious optimization. Performance of the HPJava will be increased by applying optimization strategies as described in a previous paper [37]. Scaling behavior of HPJava is slightly better then HPF, though this mainly reflects the low performance of a single Java node compared to Fortran. We do not believe that the current communication library of HPJava is faster than the HPF library because our communication library is built on top of the portability layers, mpjdev and MPI, while IBM HPF is likely to use a platform specific communication library. But future versions of Adlib could be optimized for the platform.

Complete performance results of red-black relaxation of the Laplace equation are given in Table 6.1.

Figure 6.8: Three dimensional Diffusion equation with size of $ 128^3$.
\resizebox{3.3in}{2.5in}{\includegraphics{Figs/diff128.eps}}

We see similar behavior on large size of three dimensional Diffusion equation benchmark (Figure 6.8). In general we expect 3 dimensional problems will be more amenable to parallelism, because of the large problem size.

Figure 6.9: Three dimensional Diffusion equation with size of $ 32^3$.
\resizebox{3.3in}{2.5in}{\includegraphics{Figs/diff32.eps}}

On a small problem size the three dimensional Diffusion equation benchmark (Figure 6.9) we can see the speed of sequential Fortran is about 4-5 times faster then Java. Benchmarking results from [11] do not see this kind of result on other platforms--a factor of 2 or less is common. Either IBM version of Fortran is very good or we are using an old Java compiler (JDK 1.3.1).


Table 6.2: Three dimensional Diffusion equation performance. All speeds in MFLOPS.
$ 32^3$
Processors 1 2 4 8 16 32
HPF 299.26 306.56 315.18 357.35 470.02 540.00
HPJava 63.95 101.25 173.57 220.91 303.75 347.14
Fortran 113.00
Java 66.50

$ 64^3$
Processors 1 2 4 8 16 32
HPF 274.12 333.53 502.92 531.32 685.07 854.56
HPJava 77.60 129.21 233.15 376.31 579.72 691.92
Fortran 113.00
Java 66.50

$ 128^3$
Processors 1 2 4 8 16 32
HPF 152.55 185.15 313.16 692.01 1214.97 1670.38
HPJava 83.15 149.53 275.28 478.81 829.65 1154.06
Fortran 113.00
Java 66.50


Complete performance results of three dimensional Diffusion equation are given in Table 6.2.

Figure 6.10: Multigrid solver with size of $ 512^2$.
\resizebox{3.3in}{2.5in}{\includegraphics{Figs/pde2-512.eps}}


Table 6.3: Multigrid solver with size of $ 512^2$. All speeds in MFLOPS.
$ 512^2$
Processors 1 2 3 4 6 9
HPF 170.02 240.59 258.23 288.56 336.03 376.09
HPJava 39.77 75.70 91.02 94.97 117.54 123.15


Finally, we consider benchmark results on our original problem, the multigrid solver, in Figure 6.10 and Table 6.3. For the complete multigrid algorithm, speedup is relatively modest. This seems to be due to the complex pattern of communication in this algorithm. Neither the HPJava translation scheme or the Adlib implementation are yet optimized. We expect there is plenty of low hanging fruit in terms of opportunities for improving them.

Speedup of HPJava for the various applications is summarized in Table 6.4. Different size of problems are measured on different numbers of processors. For the reference value, we are using the result of the single-processor HPJava version. As we can see on the table we are getting up to 25.77 times speedup on Laplace equation using 36 processors with problem size of $ 1024^2$. Many realistic applications with more computation for each grid point (for example CFD which will be discussed in next section) will be more suitable for the parallel implementation than the Laplace equation and simple benchmarks described in this section.


Table 6.4: Speedup of HPJava benchmarks as compared with 1 processor HPJava.
Multigrid Solver
Processors 2 3 4 6 9
$ 512^2$ 1.90 2.29 2.39 2.96 3.03

2D Laplace Equation
Processors 4 9 16 25 36
$ 256^2$ 2.67 3.73 4.67 6.22 6.22
$ 512^2$ 4.03 7.70 10.58 12.09 16.93
$ 1024^2$ 4.41 8.82 13.40 19.71 25.77

3D Diffusion Equation
Processors 2 4 8 16 32
$ 32^3$ 1.58 2.72 3.45 4.75 5.43
$ 64^3$ 1.67 3.00 4.85 7.47 8.92
$ 128^3$ 1.80 3.31 5.76 9.98 13.88



next up previous contents
Next: HPJava with GUI Up: Partial Differential Equations Previous: An Application   Contents
Bryan Carpenter 2004-06-09