Next: Message-Passing for HPC Up: High Performance Fortran Previous: The Processor Arrangement and   Contents

## Data Alignment

The directive, `ALIGN` aligns arrays to the templates. We consider an example. The core code of an LU decomposition subroutine looks as follows;
 ```01 REAL A (N, N) 02 INTEGER N, R, R1 03 REAL, DIMENSION (N) :: L_COL, U_ROW 04 05 DO R = 1, N - 1 06 R1 = R + 1 07 L_COL (R : ) = A (R : , R) 08 A (R , R1 : ) = A (R, R1 : ) / L_COL (R) 09 U_ROW (R1 : ) = A (R, R1 : ) 10 FORALL (I = R1 : N, J = R1 : N) 11 & A (I, J) = A (I, J) - L_COL (I) * U_ROW (J) 12 ENDDO ```
After looking through the above algorithm, we can choose a template,
 ```!HPF\$ TEMPLATE T (N, N) ```
The major data structure of the problem, the array `A` that holds the matrix, is identically matched with this template. In order to align `A` to `T` we need an `ALIGN` directive like;
 ```!HPF\$ ALIGN A(I, J) WITH T (I, J) ```
Here, integer subscripts of ``alignee''--the array which is to be aligned--are called alignment dummies. In this manner, every element of the alignee is mapped to some element of the template. Most of the work in each iteration of the `DO`-loop from our example is in the following statement, which is line 11 of the program,
 ```A (I, J) = A (I, J) - L_COL (I) * U_ROW ```
With careful inspection of the above assignment statement, we see we can avoid the communications if copies of `L_COL (I)` and `U_ROW (J)` are allocated wherever `A (I, J)` is allocated. The following statement can manage it using a replicated alignment to the template `T`,
 ```!HPF\$ ALIGN L_COL (I) WITH T (I, *) !HPF\$ ALIGN U_ROW (J) WITH T (*, I) ```
where an asterisk means that array elements are replicated in the corresponding processor dimension, i.e. a copy of these elements is shared across processors. Figure 2.5 shows the alignment of the three arrays and the template. Thus, no communications are needed for the assignment in the `FORALL` construct since all operands of each elemental assignment will be allocated on the same processor. Do the other statements require some communications?

The line 8 is equivalent to
 ```FORALL (J = R1 : N) A (R, J) = A (R, J) / L_COL (R) ```
Since we know that a copy of `L_COL (R)` will be available on any processor wherever `A (R, J)` is allocated, it requires no communications. But, the other two array assignment statements do need communications. For instance, the assignment to `L_COL`, which is the line 7 of the program, is equivalent to
 ```FORALL (I = R : N) L_COL (I) = A (I, R) ```
Since `L_COL (I)` is replicated in the `J` direction, while A (I, R) is allocated only on the processor which holds the template element where , updating the `L_COL` element is to broadcast the `A` element to all concerned parties. These communications will be properly inserted by the compiler. The next step is to distribute the template (we already aligned the arrays to a template). A `BLOCK` distribution is not good choice for this algorithm since successive iterations work on a shrinking area of the template. Thus, a block distribution will make some processors idle in later iterations. A `CYCLIC` distribution will accomplish better load balancing In the above example, we illustrated simple alignment--``identity mapping'' array to template--and also replicated alignments. What would general alignments look like? One example is that we can transpose an array to a template.
 ``` DIMENSION B(N, N) !HPF\$ ALIGN B(I, J) WITH T(J, I) ```
transpositionally maps `B` to `T` (`B (1, 2)` is aligned to `T (2, 1)`, and so on). More generally, a subscript of an align target (i.e. the template) can be a linear expression in one of the alignment dummies. For example,
 ``` DIMENSION C(N / 2, N / 2) !HPF\$ ALIGN C(I, J) WITH T(N / 2 + I, 2 * J) ```
The rank of the alignee and the align-target don't need to be identical. An alignee can have a ``collapsed'' dimension, an align-target can have ``constant'' subscript (e.g. a scalar might be aligned to the first element of a one-dimensional template), or an alignee can be ``replicated'' over some dimensions of the template:
 ``` DIMENSION D(N, N, N) !HPF\$ ALIGN D(I, J, K) WITH T(I, J) ```
is an example of a collapsed dimension. The element of the template, `T`, is not dependent on `K`. For fixed `I` and `J`, each element of the array, `D`, is mapped to the same template element. In this section, we have covered HPF's processor arrangement, distributed arrays, and data alignment which we will basically adopt to the HPspmd programming model we present in chapter 4.

Next: Message-Passing for HPC Up: High Performance Fortran Previous: The Processor Arrangement and   Contents
Bryan Carpenter 2004-06-09