We showed in a previous publication  that HPJava individual node performance is quite acceptable, and Java itself can get 70 - 75% of the performance of C and Fortran.
The ``direct'' matrix multiplication algorithm in Figure 1 is relatively easy and potentially efficient since the operand arrays have carefully chosen replicated/collapsed distributions. Figure 6 shows the performance of the direct matrix multiplication programs in Mflops/sec with the sizes of 50 50, 80 80, 100 100, 128 128, and 150 150 in HPJava, Java, and C on the Linux machine.
From Figure 6, we can see the significant benefit of applying PRE or HPJOPT25. The results use the IBM Developer Kit 1.3 (JIT) with -O flag on Pentium4 1.5GHz Red Hat 7.2 Linux machines. Thus, now, we expect that the HPJava results will scale on suitable parallel platforms, so a modest penalty in node performance is considered acceptable.