next up previous contents
Next: Message-Passing for HPC Up: High Performance Fortran Previous: The Processor Arrangement and   Contents


Data Alignment

The directive, ALIGN aligns arrays to the templates. We consider an example. The core code of an LU decomposition subroutine looks as follows;
01      REAL A (N, N)
02      INTEGER N, R, R1
03      REAL, DIMENSION (N) :: L_COL, U_ROW
04
05      DO R = 1, N - 1
06        R1 = R + 1
07        L_COL (R  : ) = A (R : , R)
08        A (R , R1 : ) = A (R, R1 : ) / L_COL (R)
09        U_ROW (R1 : ) = A (R, R1 : )
10        FORALL (I = R1 : N, J = R1 : N)
11     &    A (I, J) = A (I, J) - L_COL (I) * U_ROW (J)
12      ENDDO
After looking through the above algorithm, we can choose a template,
!HPF$ TEMPLATE T (N, N)
The major data structure of the problem, the array A that holds the matrix, is identically matched with this template. In order to align A to T we need an ALIGN directive like;
!HPF$ ALIGN A(I, J) WITH T (I, J)
Here, integer subscripts of ``alignee''--the array which is to be aligned--are called alignment dummies. In this manner, every element of the alignee is mapped to some element of the template. Most of the work in each iteration of the DO-loop from our example is in the following statement, which is line 11 of the program,
A (I, J) = A (I, J) - L_COL (I) * U_ROW
With careful inspection of the above assignment statement, we see we can avoid the communications if copies of L_COL (I) and U_ROW (J) are allocated wherever A (I, J) is allocated. The following statement can manage it using a replicated alignment to the template T,
!HPF$ ALIGN  L_COL (I) WITH T (I, *)
!HPF$ ALIGN  U_ROW (J) WITH T (*, I)
where an asterisk means that array elements are replicated in the corresponding processor dimension, i.e. a copy of these elements is shared across processors. Figure 2.5 shows the alignment of the three arrays and the template. Thus, no communications are needed for the assignment in the FORALL construct since all operands of each elemental assignment will be allocated on the same processor. Do the other statements require some communications?

Figure 2.5: Alignment of the three arrays in the LU decomposition example
\includegraphics[height=3in]{Figures/row-col-alignment}
The line 8 is equivalent to
FORALL (J = R1 : N)  A (R, J) = A (R, J) / L_COL (R)
Since we know that a copy of L_COL (R) will be available on any processor wherever A (R, J) is allocated, it requires no communications. But, the other two array assignment statements do need communications. For instance, the assignment to L_COL, which is the line 7 of the program, is equivalent to
FORALL (I = R : N)  L_COL (I) = A (I, R)
Since L_COL (I) is replicated in the J direction, while A (I, R) is allocated only on the processor which holds the template element where $ J = R$, updating the L_COL element is to broadcast the A element to all concerned parties. These communications will be properly inserted by the compiler. The next step is to distribute the template (we already aligned the arrays to a template). A BLOCK distribution is not good choice for this algorithm since successive iterations work on a shrinking area of the template. Thus, a block distribution will make some processors idle in later iterations. A CYCLIC distribution will accomplish better load balancing In the above example, we illustrated simple alignment--``identity mapping'' array to template--and also replicated alignments. What would general alignments look like? One example is that we can transpose an array to a template.
      DIMENSION B(N, N)
!HPF$ ALIGN B(I, J) WITH T(J, I)
transpositionally maps B to T (B (1, 2) is aligned to T (2, 1), and so on). More generally, a subscript of an align target (i.e. the template) can be a linear expression in one of the alignment dummies. For example,
      DIMENSION C(N / 2, N / 2)
!HPF$ ALIGN C(I, J) WITH T(N / 2 + I, 2 * J)
The rank of the alignee and the align-target don't need to be identical. An alignee can have a ``collapsed'' dimension, an align-target can have ``constant'' subscript (e.g. a scalar might be aligned to the first element of a one-dimensional template), or an alignee can be ``replicated'' over some dimensions of the template:
      DIMENSION D(N, N, N)
!HPF$ ALIGN D(I, J, K) WITH T(I, J)
is an example of a collapsed dimension. The element of the template, T, is not dependent on K. For fixed I and J, each element of the array, D, is mapped to the same template element. In this section, we have covered HPF's processor arrangement, distributed arrays, and data alignment which we will basically adopt to the HPspmd programming model we present in chapter 4.
next up previous contents
Next: Message-Passing for HPC Up: High Performance Fortran Previous: The Processor Arrangement and   Contents
Bryan Carpenter 2004-06-09