overall loops.
One main issue optimization strategies must address is the complexity of the
associated terms in the subscript expressions for addressing local
element (see Figure 3).
Optimization strategies should remove overheads of
the naive translation scheme (especially for overall
construct), and speed up HPJava, i.e. produce a Java-based environment
competitive in performance with existing Fortran programming environments.
To eliminate complicated distributed index subscript expressions
in the inner loops, the final translator will certainly make
extensive use
strength-reduction optimizations--introducing induction variables that
can be computed
efficiently by incrementing at suitable points with the
induction increments.
Another simple but potentially profitable strategy is loop-unrolling.
Furthermore we are likely to need code movement optimizations to
reduce the need for method invocations on the special run-time support classes,
like Block and Group.
In the original overall translation scheme, we
use the localBlock() method to compute parameters of
the local loop. The translation is identical for every
distribution format--block-distribution, simple-cyclic distribution,
aligned subranges, and so on--supported by the language.
Of course there is an overhead related to abstracting this local-block
parameter computation into a method call, but at least the method call
is made at most once at the start of each loop. Still we need to
minimize this overhead where possible, by code movement strategies like
partial redundancy elimination, which will need to be adapted for
our source-to-source translator. There is considerable scope for
common-subexpression elimination in typical overall loops.
Naive translation for inner loops tends to repeatly declare some variables
for holding str() methods, global bases and steps, and so on.