next up previous contents
Next: Benchmarking HPJava, Part II: Up: Benchmarking HPJava, Part I: Previous: Experimental Study - Q3   Contents


Discussion

In this chapter, we have experimented on and benchmarked the HPJava language with scientific and engineering applications on a Linux machine (Red Hat 7.3) with a single processor (Pentium IV 1.5 GHz CPU with 512 MB memory and 256 KB cache). The main purpose we concentrated on benchmarking on the Linux machine in this chapter is to verify if the HPJava system and its optimization strategies produces efficient node code. Without confidence of producing efficient node code, there is no prospect of high-performance for the HPJava system on parallel machines. Moreover, through these benchmarks, we studied the behaviour of the overall construct and the subscript expression of a multiarray element access in HPJava programs and experimented with the effect of optimization strategies to the HPJava system. Unlike direct matrix multiplication and Q3 index, the index triplets of overall constructs of Laplace equation using red-black relaxation and 3D diffusion equation in HPJava are not using the default value (e.g. overall(x = i for :)). If the index triplet depends on variables, then one of control variables, whose type is localBlock(), is not loop invariant. This means that it can't be hoisted outside the most outer overall construct. To eliminate this problem in a common case, we adopted a Loop Unrolling optimization in our HPJOPT2. Moreover, when creating multiarrays, all dimensions are distributed in these PDE examples. In contrast, some of dimensions in direct matrix multiplication and Q3 index are sequential. The translation scheme for the subscript expression of a distributed dimension is obviously more complicated than that of a sequential dimension.


Table 6.2: Speedup of each application over naive translation and sequential Java after applying HPJOPT2.
Direct Matrix Laplace equation
Multiplication red-black relaxation 3D Diffusion Q3
HPJOPT2 over
naive translation 150% 361% 200% 115%
HPJOPT2 over
sequential Java 122% 94% 161% 138%

As we see from table 6.2, HPJava with HPJOPT2 optimization can maximally increase performance of scientific and engineering applications with large problem size and more distributed dimensions. It proves that the HPJava system should be able to produce efficient node code and the potential performance of HPJava on multi-processors looks very promising.
next up previous contents
Next: Benchmarking HPJava, Part II: Up: Benchmarking HPJava, Part I: Previous: Experimental Study - Q3   Contents
Bryan Carpenter 2004-06-09