next up previous contents
Next: Laplace Equation Using Red-Black Up: Benchmarking HPJava, Part II: Previous: Benchmarking HPJava, Part II:   Contents


Direct Matrix Multiplication

Figure 7.1: Performance for Direct Matrix Multiplication on SMP.
Figure 7.1 shows the performance of the direct matrix multiplication on the shared memory machine. In the figure, Java and C indicate the performance of sequential Java and C programs on the shared memory machine. Naive and PRE indicates the parallel performance of the naive translation and PRE on the shared memory machine. Since we observed in section 6.2 that HPJOPT2 takes no advantage over PRE for this algorithm, we benchmark only naive translation, PRE, Java, and C programs in this section. First, we need to see the Java performance over the C performance on the shared memory machine. It is 86%. With this Java performance, we can expect the performance of HPJava to be promising on the shared memory machine. The table 7.2 shows the speedup of the naive translation over sequential Java program. Moreover, it shows the speedup of PRE over the naive translation.


Table 7.2: Speedup of the naive translation over sequential Java and C programs for the direct matrix multiplication on SMP.
Number of Processors 1 2 3 4 5 6 7 8
Naive translation
over Java 0.39 0.76 1.15 1.54 1.92 2.31 2.67 3.10
PRE over Java 0.76 1.52 2.29 3.07 3.81 4.59 5.30 6.12
PRE over
Naive translation 1.96 2.02 2.00 2.00 1.98 1.99 1.99 1.98



Table 7.3: Speedup of the naive translation and PRE for each number of processors over the performance with one processor for the direct matrix multiplication on SMP.
Number of Processors 2 3 4 5 6 7 8
Naive translation 1.96 2.96 3.96 4.96 5.96 6.88 8.00
PRE 2.02 3.02 4.04 5.02 6.04 6.98 8.06

The speedup of the naive translation with 8 processors over sequential Java is up to 310%. The speedup of PRE with 8 processors over sequential Java is up to 612%. The speedup of PRE over the naive translation is up to 202%. Performance of PRE overtakes that of sequential Java on 2 processors. The table 7.3 shows the speedup of the naive translation and PRE for each number of processors over the performance with one processor on the shared memory machine. The naive translation gets up to 800% speedup using 8 processors, compared to performance with a single processor on the shared memory machine. Moreover, PRE gets up to 806% speedup. The direct matrix multiplication doesn't have any run-time communications. But, this is reasonable since we focus on benchmarking the HPJava compiler in this dissertation, not communication libraries. In the next section, we will benchmark examples with run-time communications.
next up previous contents
Next: Laplace Equation Using Red-Black Up: Benchmarking HPJava, Part II: Previous: Benchmarking HPJava, Part II:   Contents
Bryan Carpenter 2004-06-09