next up previous contents
Next: Discussion Up: Benchmarking HPJava, Part II: Previous: 3-Dimensional Diffusion Equation   Contents


Q3 - Local Dependence Index

Figure 7.7: Q3 on shared memory machine
Figure 7.7 shows the performance of Q3 on the shared memory machine. Again, we need to see the Java performance over the C performance on the shared memory machine. It is 55% over C. The table 7.12 shows the speedup of the HPJava naive translation over sequential Java and C programs. Moreover, it shows the speedup of HPJOPT2 over the naive translation.


Table 7.12: Speedup of the naive translation over sequential Java and C programs for Q3 on the shared memory machine.
Number of Processors 1 2 3 4 5 6 7 8
Naive translation
over Java 1.47 3.85 7.08 9.50 11.53 13.71 16.18 18.02
Naive translation
over C 0.84 2.20 4.04 5.42 6.58 7.83 9.23 10.29
HPJOPT2 over Java 1.85 5.39 12.23 16.30 19.89 24.81 27.77 32.00
HPJOPT2 over
Naive translation 1.26 1.40 1.73 1.72 1.73 1.81 1.72 1.76

The speedups of the naive translation with 8 processors over sequential Java and C is up to 1802%. The speedups of HPJOPT2 with 8 processors over sequential Java and C is up to 3200%. The speedup of HPJOPT2 over the naive translation is up to 181%. We recall that performance of Q3 is slow compared to other applications on the Linux machine. As expected, with multi-processors, performance of Q3 is excellent even without any optimizations. It illustrates that performance of HPJava can be outstanding for applications with large problem sizes. The table 7.13 shows the speedup of the naive translation and HPJOPT2 for each number of processors over the performance with one processor.


Table 7.13: Speedup of the naive translation and HPJOPT2 for each number of processors over the performance with one processor for Q3 on the shared memory machine.
Number of Processors 2 3 4 5 6 7 8
Naive translation 2.62 4.81 6.46 7.84 9.32 11.00 12.26
HPJOPT2 2.91 6.61 8.80 10.75 13.40 15.01 17.29

The naive translation gets up to 1226% speedup using 8 processors on the shared memory machine. Moreover, HPJOPT2 gets up to 1729% speedup. Unlike traditional benchmark programs, Q3 gives a tremendous speedup with a moderate number (= 8) of processors.
next up previous contents
Next: Discussion Up: Benchmarking HPJava, Part II: Previous: 3-Dimensional Diffusion Equation   Contents
Bryan Carpenter 2004-06-09