Next: Laplace Equation Using Red-Black
Up: Benchmarking HPJava, Part II:
Previous: Benchmarking HPJava, Part II:
Contents
Direct Matrix Multiplication
Figure 7.1:
Performance for Direct Matrix Multiplication on SMP.
|
|
Figure 7.1 shows the performance of the direct matrix
multiplication on the shared memory machine. In the figure, Java and C
indicate the performance of sequential Java and C programs on the
shared memory machine. Naive and PRE indicates the parallel
performance of the naive translation and PRE on the shared memory
machine. Since we observed in section 6.2 that HPJOPT2
takes no advantage over PRE for this algorithm, we benchmark only
naive translation, PRE, Java, and C programs in this section.
First, we need to see the Java performance over the C performance on
the shared memory machine. It is 86%. With this Java performance, we
can expect the performance of HPJava to be promising on the shared
memory machine. The table 7.2 shows the speedup of
the naive translation over sequential Java program. Moreover,
it shows the speedup of PRE over the naive translation.
Table 7.2:
Speedup of the naive translation over sequential Java and C
programs for the direct matrix multiplication on SMP.
|
Number of Processors |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Naive translation |
|
|
|
|
|
|
|
|
|
over Java |
0.39 |
0.76 |
1.15 |
1.54 |
1.92 |
2.31 |
2.67 |
3.10 |
|
PRE over Java |
0.76 |
1.52 |
2.29 |
3.07 |
3.81 |
4.59 |
5.30 |
6.12 |
|
PRE over |
|
|
|
|
|
|
|
|
|
Naive translation |
1.96 |
2.02 |
2.00 |
2.00 |
1.98 |
1.99 |
1.99 |
1.98 |
Table 7.3:
Speedup of the naive translation and PRE for each number of
processors over the performance with one processor for the direct
matrix multiplication on SMP.
|
Number of Processors |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
|
Naive translation |
1.96 |
2.96 |
3.96 |
4.96 |
5.96 |
6.88 |
8.00 |
|
PRE |
2.02 |
3.02 |
4.04 |
5.02 |
6.04 |
6.98 |
8.06 |
The speedup of the naive translation with 8 processors over
sequential Java is up to 310%. The speedup of PRE with 8 processors
over sequential Java is up to 612%. The speedup of PRE over
the naive translation is up to 202%. Performance of PRE overtakes
that of sequential Java on 2 processors.
The table 7.3 shows the speedup of the naive
translation and PRE for each number of processors over the performance
with one processor on the shared memory machine. The naive translation
gets up to 800% speedup using 8 processors, compared to performance
with a single processor on the shared memory machine. Moreover, PRE
gets up to 806% speedup.
The direct matrix multiplication doesn't have any run-time
communications. But, this is reasonable since we focus on benchmarking
the HPJava compiler in this dissertation, not communication
libraries. In the next section, we will benchmark examples with
run-time communications.
Next: Laplace Equation Using Red-Black
Up: Benchmarking HPJava, Part II:
Previous: Benchmarking HPJava, Part II:
Contents
Bryan Carpenter
2004-06-09