All the previous examples considered patterns of communication occurring in array parallel statements--array assignments or FORALL statements. These communication patterns are quite naturally treated as generalized collective operations. But there are situations in HPF--and in general SPMD programming--where this approach is not readily applicable.
One example is the INDEPENDENT DO loop of HPF, which takes the form:
!HPF$ INDEPENDENT
DO i = 1, 10
...
END DO
The INDEPENDENT directive asserts that there are no data dependences
between individual iterations of the following loop, and the iterations
may therefore be executed in parallel4.
Unlike the FORALL statement, which explicitly limits the code executed in
parallel to simple assignments, the body of an INDEPENDENT DO can involve
any Fortran construct, including conditionals, loops and procedure calls.
So the patterns of access to remote data inside parallel ``iterations''
may vary in unpredictable ways from one iteration to the next.
It may become difficult to do any advance orchestration of data exchanges.
An HPF the compiler is free to ignore the INDEPENDENT directive
if it decides the loop is too complex to parallelize. But this may deprive
the programmer of one of the few options in HPF for expressing the
task-farming style of parallelism.
Actually there is at least one other way to express task parallelism in HPF. A user-defined procedure with the PURE attribute can be called from within a FORALL statement:
PURE REAL FUNCTION FOO(INTEGER I)
...
END
...
FORALL (I = 1 : N) RES (I) = FOO(I)
There are quite strict restrictions on PURE procedures, but nothing to
prevent a procedure from reading elements of global distributed
data--a distributed array in a COMMON block, for example.
Unfortunately this makes it difficult or impossible for the compiler
to determine at the point of call of FOO exactly what
remote variables it will access. For example, the actual behaviour of the
program might be similar to the first example of Section
1.4:
PURE REAL FUNCTION FOO(INTEGER I)
REAL RES(50)
INTEGER IND(50)
!HPF$ DISTRIBUTE RES(BLOCK) ONTO P
!HPF$ DISTRIBUTE IND(BLOCK) ONTO P
COMMON /GLOBALS/ RES, IND
RETURN RES (IND (I))
END
But by the time RES(IND(I)) is accessed, instances of the function
FOO have already been dispatched to execute independently
across the available set of processors. In a real sense, once inside FOO
processors are no longer sharing a single ``loosely synchronous'' thread
of control. It is difficult to see how the parallel invocations
of FOO can behave collectively. In particular if the underlying
model is MPI point-to-point communication it is difficult
to see how the owner of a particular array element can always be ready
to send an element when its value is accessed by a peer processor.
INDEPENDENT DO loops have similar problems, compounded because they do not have the restrictions on PURE procedures that prevent them from writing to global variables. If this sort of code is to be compiled to run in parallel the most practical approach is probably to assume the availability of one-sided communication. The MPI 2 standard added this functionality to MPI, but it is still not widely implemented.