Quite often a parallel program needs to execute some code like
FORALL (I = 1 : 50) RES (I) = A (IND (I))
where A is one distributed array, and IND is some
other distributed array. The special feature is that some subscript
expressions (the subscripts for A in this case) are not a simple
linear functions of the index variables.
This operation cannot be reduced to the kind of a simple array assignment
described in Section 1.1. Even in the most
favourable case, when the arrays RES and IND are identically
aligned, the data movement cannot be reduced to a single MPI collective
operation.
Assuming RES and IND are aligned, the owner of the element RES(I) also owns the subscript IND(I). So the processor that holds the target variable of the assignment can compute where the required element of A lives. This is generally on some other processor. Unfortunately that other processor--the one that owns the A element--does not have the information to tell the value is required by the first processor. It seems that there must be at least two phases of communication--one in which the target processors send out requests to the owners of the required data, and one in which the owners return the data. The situation is illustrated in Figure 4 for the case
!HPF$ PROCESSORS P(4)
REAL A(50)
!HPF$ DISTRIBUTE A(CYCLIC) ONTO P
REAL RES(50)
INTEGER IND(50)
!HPF$ DISTRIBUTE RES(BLOCK) ONTO P
!HPF$ DISTRIBUTE IND(BLOCK) ONTO P
IND = (/ 5, 41, 7, ... /)
The first processor can deal with the assignment of the values
A(5) and A(41) to the first two elements of RES locally.
But the element A(7) lives on the third processor, and initially
the third processor has no way to know that this element is needed by
the first. Some two-way communication is needed.
There are many generalizations of this problem. The non-linear subscript expressions may appear on the left-hand-side rather than the right-hand-side:
FORALL (I = 1 : 50) RES (IND (I)) = A (I)
(More generally, non-linear subscripts may appear on both sides.)
The non-linear subscript
may be a function or expression rather than an indirection array.
The irregularly subscripted arrays may be multidimensional:
FORALL (I = 1 : 50) RES (I) = A (IND1 (I), IND2(I))
As a special case it may be that the subscripts are actually be linear, but
because the subscripts in orthogonal dimensions are not independent,
the assignment can still not be reduced to a simple array assignment:
FORALL (I = 1 : 50) RES (I) = A (I, I)
The loop itself may be multidimensional:
FORALL (I = 1 : 50, J = 1 : 50) RES (I, J) = A (IND (I, J))
These examples are special cases of two (rather complicated) general families:
where the IND arrays are aligned with the RES array, andFORALL (= 1 :, ...,= 1 :)& RES () = SRC ((), ...,())
where the IND arrays are aligned with the SRC array. We will call these generalized gather and generalized scatter operations respectively3. A large class of FORALL statements can be reduced to a series of generalized gather and scatter operations, interspersed with some ordinary array assignments (although there is no guarantee that this is always the most efficient way to translate a FORALL statement).FORALL (= 1 :, ...,= 1 :)& RES ((), ...,()) = SRC ()