First we present some results for the the computational kernel of the
multigrid code, namely unaccelerated red-black relaxation algorithm
of Figure 1.
Figure 6 gives our results
for this kernel on a 512 by 512 matrix. The results are encouraging.
The HPJava version scales well, and eventually comes quite
close to the HPF code (absolute megaflop performances are modest,
but this feature was observed for all our codes,
and seems to be a property of the hardware)
.
The flat lines at the bottom of the graph give the sequential Java and Fortran performances, for orientation. We did not use any auto parallelization feature here.
Corresponding results for the complete multigrid code are given in Figure 7. The results here are not as good as for simple red-black relaxation--both HPJava speed relative to HPF, and the parallel speedup of HPF and HPJava are less satisfactory.
The poor performance of HPJava relative to Fortran
in this case can be attributed largely to the naive nature of the translation
scheme used by the current HPJava system. The overheads
are especially significant when there are many very tight overall
constructs (with short bodies). We saw several of these in section
3. Experiments done elsewhere
[13] lead us to believe
these overheads can be reduced by straightforward optimization
strategies which, however, are not yet incorporated in our
source-to-source translator
.
The modest parallel speedup of both HPJava and HPF is due to communication overheads. The fact that HPJava and HPF have similar scaling behavior, while absolute performance of HPJava is lower, suggests the communication library of HPJava is slower than the communications of the native SP3 HPF (otherwise the performance gap would close for larger numbers of processors). This is not too surprising because Adlib is built on top of a portability layer called mpjdev, which is in turn layered on MPI. We assume the SP3 HPF is more carefully optimized for the hardware. Of course the lower layers of Adlib could be ported to exploit low-level features of the hardware (we already did some experiments in this direction, interfacing Java to LAPI [14]).