It seems that, in the translation of procedure calls like the one discussed in the last section, some non-trivial data structure must be passed to the translated procedure to describe the layout of the actual argument. This data structure is called a Distributed Array Descriptor or DAD. We wish to understand the requirements on the DAD, and how best to organize it. In view of the times, that organization should probably be informed by object-oriented principles.
One of the most obvious structural features of an HPF array is that it is a multi-dimensional entity. Each dimension can be mapped essentially independently by many different strategies including:
In the translation of Figure 1 there were various formulae and code fragments that were special to the choice of simple block distribution format. These included the formula:
BLK_SIZE = (N + P - 1) / Pfor the size of locally allocated memory in terms of the array extent, , and the extent, , of the processor arrangement; the formula
BLK_START = R * BLK_SIZEfor the least global index associated with the local block in terms of the process coordinate ; the recipe
IF(N - BLK_START >= BLK_SIZE) THEN BLK_COUNT = BLK_SIZE ELSEIF(N - BLK_START > 0) THEN BLK_COUNT = N - BLK_START ELSE BLK_COUNT = 0 ENDIFfor the actual number of elements held locally4, and the formula
I = BLK_START + Lwhich computes the global index in terms of the local loop index .
All of these prescriptions can differ for other mapping strategies. For example, if the array had been distributed in CYCLIC fashion the formula for BLK_START would be simply
BLK_START = Rthe recipe for BLK_COUNT would be just
BLK_COUNT = (N - R + P - 1) / P(ie, ) and the global index would be given by
I = BLK_START + (P * L)These formulae may be checked against the second picture in Figure 2. If we are dealing with, say, block-cyclic distribution, or a dimension with strided alignment, the corresponding prescriptions get considerably more complicated.
The important thing is that we see evidence of a common set of operations that must be performed on different kinds of distributed array dimensions. The actual computations differ, depending on the distribution format and alignment of the individual dimensions. To the object-oriented programmer it looks as if a class hierarchy is emerging, with the common operations implemented as virtual functions. In the organization of the DAD described in this lecture this will be the Range hierarchy--a series of classes describing distributed index ranges. The details will be left to Section 4.
Apart from ranges, there is another important kind of subcomponent in our DAD. We can explain the need for this component by considering the last example in Figure 2, wich involved a rank-1 section of a rank-2 array.
In most respects this section should behave exactly like an ordinary rank-1 array--this is a requirement of the Fortran language, and a very convenient one. It means, for example, that sections can be freely passed to all the transformational intrinsic functions of Fortran. A reasonable expectation is that the DAD for the rank-1 section should contain just one Range object describing its single dimension. In the current example it would be most natural for that Range object to simply be a copy of the corresponding object in the parent DAD.
But the implementation of the section is not exactly like an ordinary one-dimensional array. It retains some vestige of the orthogonal dimension of its parent array. In particular, in the example, the section is only mapped to the top row of the processor arrangement Q. The translation of INIT must somehow take account of the possible existence of hidden ``higher dimensions'' of some parent array. If they exist, the translated code should only execute the assignments on processors in the appropriate row, or column--or more general slice--of the processor arrangement.
In fact a related problem was addressed in our first example, when we insisted on a test
IF (W_RANK < 4) THEN ... ENDIFto ensure that the local processor was a member of the processor arrangement over which elements of the assignment variable were distributed. The solution of the current problem is to generalize the idea of distributing an array over a processor arrangement, and explicitly allow that an array can be distributed over some restricted slice of a processor arrangement. A class will be introduced to represent this more general idea of a process group, and an object from this class will be added to the DAD. Then the translated code just needs to check that the local process is a member of the group, by a suitable method. As we will see in the next section, the required class has a compact and efficient implementation.