In this section we evaluate the performance of some applications written in mpiJava. We have developed both sequential and parallel message passing programs to compare the performance of mpiJava using Monte Carlo simulation of the two dimensional Ising model. The sequential programs were written in C, F77 and Java and message passing programs were written in MPI-C, MPI-F77 and mpiJava. This Ising model test can also be used for testing the quality of parallel random number generators [#!KO!#].
The two different methods, Metropolis and Swendsen-Wang cluster algorithms are used with a standard block domain decomposition. As in Potts model, the red/black updating scheme is used in parallel Metropolis Ising model simulation [#!FoxBook88!#].
Metropolis is easy to parallelize since it is just local nearest neighbor communication and standard domain decomposition so it's easy to load balance, Swendsen-Wang needs non-local communication and has fairly good load balance. Therefore we would expect Metropolis to give the best speedups, then Swendsen-Wang would be not quite as good.
The system environment was as follows:
For comparison, we have completed experiments for three different versions of our programming environment--sequential codes, parallel codes with MPICH and Sun HPC MPI. For the sequential F77 and C codes, we use Sun WorkShop FORTRAN 77 5.0 and C 5.0 compilers and Sun JDK1.2.2 for sequential Java code. For the parallel mpiJava codes, we use MPICH 1.2.0 and Sun HPC MPI 2.0 on 1, 2, 4 and 8 nodes. The MPICH 1.2.0 was compiled with -comm=shared communication option to permit the use of communication using both shared memory and TCP/IP, since each of four our Solaris machines has dual processors. Sun HPC MPI uses the Sun ATM network which is much faster than p4 device of MPICH. For better performance, all sequential and parallel Fortran, C and Java codes were compiled using -O optimization option and all sequential Java and mpiJava codes executed with JIT enabled.
We performed Monte Carlo Ising simulations on simple 2-D lattices with linear size of L=32 to 1024 and periodic boundary conditions. To measure the total execution time, we performed 20 iterations to thermalize the system and at least 2000 Monte Carlo sweeps. The inverse temperature was taken to be the critical inverse temperature for the 2-D Ising model. Timings were measured using MPI_Wtime for parallel codes and the shell built-in time command for sequential codes. We have repeated the benchmarks several times when there was little network activity and on quiet machines.
The Metropolis timing results for sequential and parallel tests for different lattice sizes are shown in Figure 6.7 through 6.12 and Swendsen-Wang in Figure 6.13 through 6.18. The results demonstrate as follows;
The sequential and parallel Java to F77 and Java to C run time ratios of Metropolis and Swendsen-Wang are shown in Table 6.1 and 6.2 respectively. It shows that the performance difference between mpiJava and MPI F77 or C decreases as the number of processors increases, because the main overhead is in the communications rather than computations. Relatively the mpiJava codes perform best when on 8 nodes. For this case the execution times are only 8-14% longer than MPICH F77 or C, but in this case absolute performance is poor because this is the regime where communication overhead dominate. For the largest lattice size , the results demonstrate that the performance of the mpiJava is within a factor of two to three of MPI F77 or C. Comparing the performance of mpiJava with MPI F77 or C, it is promising, but not acceptable yet. An interesting series of papers from IBM [#!Moreira98A!#,#!Moreira98B!#,#!Wu99!#], confirmed that the current generation of Java virtual machines have rather poor performance on Fortran-like, array-intensive computations, but went on to demonstrate how to apply aggressive optimizations in Java compilers to obtain performance competitive with Fortran. In a recent paper [#!Moreira99!#] they described a case study involving a data mining application that used the Java Array package supported by the Java Grande Numerics Working Group. Using the experimental IBM HPCJ Java compiler they reported obtaining over 90% of the performance of Fortran. Therefore we expect that in the future Java will became quite competive with Fortran.