next up previous contents
Next: Case Study: Laplace Equation Up: Applying PRE, SR, DCE, Previous: Applying PRE, SR, DCE,   Contents


Case Study: Direct Matrix Multiplication

First, we start applying PRE and SR to direct matrix multiplication.
Figure 5.5: The innermost for loop of naively translated direct matrix multiplication
double sum = 0 ;
                 
for (int k = 0 ; k < N ; k ++) {
  sum += a__$DS [a__$bas.base  + a__$0.stride  * __$t47 + 
                 a__$1.off_bas + a__$1.off_stp * k] * 
         b__$SD [b__$bas.base + b__$0.off_bas + b__$0.off_stp * k + 
                 b__$1.stride * __$t54] ;
}
Figure 5.5 shows the innermost for loop of the naively translated direct matrix multiplication program in HPJava, using the basic translation schemes of the section 4.5.
Figure 5.6: Inserting a landing pad to a loop
\includegraphics[width=4in]{Figures/landingpad}
In the loop, we find complex subscript expressions of multiarray element accesses, involving final variables such as a__$bas.base, a__$0.stride, a__$1.off_bas, a__$1.off_stp, b__$bas.base, b__$0.off_bas, b__$0.off_stp, and b__$1.stride, __$t47, __$t54. We need to recall the main idea of PRE: eliminating redundant computations of expressions that do not necessarily occur on all control flow paths that lead to a given redundant computation. So they variables do not necessarily occur on all control flow paths that lead to a given redundant computation since they are final (i.e. in general they are constant variables.). Thus, these variables can be replaced with temporary variables, declared outside the loop. Before applying PRE, we need to introduce a landing pad [42]. To make PRE operative, we give each loop a landing pad representing entry to the loop from outside. Figure 5.6 shows how to insert a landing pad to a loop. Thus, all the newly introduced temporary variables by PRE, can be declared and initialized in the landing pad for the for loop.
Figure 5.7: After applying PRE to direct matrix multiplication program
double sum = 0 ;
int k = 0 ;
if (k < N) {
    ///////////////////
    //  Landing Pad  //
    ///////////////////
    int a1 = a__$bas.base,  a2 = a__$0.stride, a3 = a__$1.off_bas, 
        a4 = a__$1.off_stp, a5 = b__$bas.base, a6 = b__$0.off_bas, 
        a7 = b__$0.off_stp, a8 = b__$1.stride ;

    int a9  = a1 + a2 * __$t47 + a3 ;
    int a10 = a5 + a6 ;
    int a11 = a8 * __$t54 ;
    ///////////////////
    do {
        sum += a__$DS [a9 + a4 * k] * b__$SD [a10 + a7 * k + a11] ;
        k ++ ;
    } while (k < N) ;
}
Figure 5.7 is the optimized code using PRE. Finally applying SR, which eliminates induction variables in the loop, the optimized HPJava program of direct matrix multiplication is shown in Figure 5.8.
Figure 5.8: Optimized HPJava program of direct matrix multiplication program by PRE and SR
double sum = 0 ;
int k = 0 ;
if (k < N) {
    ///////////////////
    //  Landing Pad  //
    ///////////////////
    int a1 = a__$bas.base,  a2 = a__$0.stride, a3 = a__$1.off_bas, 
        a4 = a__$1.off_stp, a5 = b__$bas.base, a6 = b__$0.off_bas, 
        a7 = b__$0.off_stp, a8 = b__$1.stride ;

    int a9  = a1 + a2 * __$t47 + a3 ;
    int a10 = a5 + a6 ;
    int a11 = a8 * __$t54 ;
    ///////////////////
    int aa1 = a9, aa2 = a10 + a11 ;
    ///////////////////
    do {
        sum += a__$DS [aa1] * b__$SD [aa2] ;
        k ++ ;
        ///////////////////
        aa1 += a4 ; aa2 += a7 ;
        ///////////////////
    } while (k < N) ;
}
Applying PRE and SR makes this simple HPJava program well-optimized. Moreover, we expect the optimized code to be more competitive and faster in performance than the naively translated one. In the next section, we will investigate more complicated examples.
next up previous contents
Next: Case Study: Laplace Equation Up: Applying PRE, SR, DCE, Previous: Applying PRE, SR, DCE,   Contents
Bryan Carpenter 2004-06-09