next up previous contents
Next: Groups Up: Issues in Translation of Previous: Introduction   Contents

Requirements for an array descriptor

It seems that, in the translation of procedure calls like the one discussed in the last section, some non-trivial data structure must be passed to the translated procedure to describe the layout of the actual argument. This data structure is called a Distributed Array Descriptor or DAD. We wish to understand the requirements on the DAD, and how best to organize it. In view of the times, that organization should probably be informed by object-oriented principles.

One of the most obvious structural features of an HPF array is that it is a multi-dimensional entity. Each dimension can be mapped essentially independently by many different strategies including:

In the translation of Figure 1 there were various formulae and code fragments that were special to the choice of simple block distribution format. These included the formula:

      BLK_SIZE = (N + P - 1) / P
for the size of locally allocated memory in terms of the array extent, $N$, and the extent, $P$, of the processor arrangement; the formula
      BLK_START = R * BLK_SIZE
for the least global index associated with the local block in terms of the process coordinate $R$; the recipe
      IF(N - BLK_START >= BLK_SIZE) THEN
          BLK_COUNT = BLK_SIZE
      ELSEIF(N - BLK_START > 0) THEN
          BLK_COUNT = N - BLK_START
      ELSE
          BLK_COUNT = 0
      ENDIF
for the actual number of elements held locally4, and the formula
      I = BLK_START + L
which computes the global index in terms of the local loop index $L$.

All of these prescriptions can differ for other mapping strategies. For example, if the array had been distributed in CYCLIC fashion the formula for BLK_START would be simply

      BLK_START = R
the recipe for BLK_COUNT would be just
      BLK_COUNT = (N - R + P - 1) / P
(ie, $\lceil (N - R) / P \rceil$) and the global index would be given by
      I = BLK_START + (P * L)
These formulae may be checked against the second picture in Figure 2. If we are dealing with, say, block-cyclic distribution, or a dimension with strided alignment, the corresponding prescriptions get considerably more complicated.

The important thing is that we see evidence of a common set of operations that must be performed on different kinds of distributed array dimensions. The actual computations differ, depending on the distribution format and alignment of the individual dimensions. To the object-oriented programmer it looks as if a class hierarchy is emerging, with the common operations implemented as virtual functions. In the organization of the DAD described in this lecture this will be the Range hierarchy--a series of classes describing distributed index ranges. The details will be left to Section 4.

Apart from ranges, there is another important kind of subcomponent in our DAD. We can explain the need for this component by considering the last example in Figure 2, wich involved a rank-1 section of a rank-2 array.

In most respects this section should behave exactly like an ordinary rank-1 array--this is a requirement of the Fortran language, and a very convenient one. It means, for example, that sections can be freely passed to all the transformational intrinsic functions of Fortran. A reasonable expectation is that the DAD for the rank-1 section should contain just one Range object describing its single dimension. In the current example it would be most natural for that Range object to simply be a copy of the corresponding object in the parent DAD.

But the implementation of the section is not exactly like an ordinary one-dimensional array. It retains some vestige of the orthogonal dimension of its parent array. In particular, in the example, the section is only mapped to the top row of the processor arrangement Q. The translation of INIT must somehow take account of the possible existence of hidden ``higher dimensions'' of some parent array. If they exist, the translated code should only execute the assignments on processors in the appropriate row, or column--or more general slice--of the processor arrangement.

In fact a related problem was addressed in our first example, when we insisted on a test

      IF (W_RANK < 4) THEN
          ...
      ENDIF
to ensure that the local processor was a member of the processor arrangement over which elements of the assignment variable were distributed. The solution of the current problem is to generalize the idea of distributing an array over a processor arrangement, and explicitly allow that an array can be distributed over some restricted slice of a processor arrangement. A class will be introduced to represent this more general idea of a process group, and an object from this class will be added to the DAD. Then the translated code just needs to check that the local process is a member of the group, by a suitable method. As we will see in the next section, the required class has a compact and efficient implementation.


next up previous contents
Next: Groups Up: Issues in Translation of Previous: Introduction   Contents
Bryan Carpenter 2002-07-12