next up previous contents
Next: Data alignment Up: High Performance Fortran Previous: The Processor Arrangement   Contents

The Template and distribution

Having seen how to define one or more target processor arrangements, we need to introduce mechanisms for distributing data arrays over those arrangements. HPF allows arrays to be distributed over processors directly (see Section 4.5), but it is often more satisfactory to go through the intermediary of an explicit template. HPF templates are used in ways reminiscent of the implicit VP set of CM Fortran or the shape of C*. In the MIMD world anticipated by HPF, a template is distinct from a processor arrangement. The set of abstract processors in an HPF processor arrangement might not exactly match the set of physical processors, but there is a tacit assumption that abstract processors will be used at a similar level of granularity to the physical processors. Usually it would be inappropriate for the shape parameters of the abstract processor arrangement to correspond to those of the data arrays of the algorithm. Instead the fine-grained grid of the data arrays is captured in the template concept.

Figure 7 represents the HPF data mapping scheme: the rest of this section is concerned with the bottom half of the diagram. Mapping of arrays to templates will be discussed in the next section.

Figure 7: The stages of data mapping in HPF.

A template can be declared in much the same way as a processor arrangement.

!HPF$ TEMPLATE T(50, 50, 50)
delares a 50 by 50 by 50 three-dimensional template called T. Having declared it, we usually want to establish a relation between a template and some processor arrangement. We want to say in more or less detail how the elements of the template are distributed amongst the elements of the processor arrangement. This is done by using DISTRIBUTE directive.

As a first example, suppose we have

There are various ways in which T1 may be distributed over P1. The four basic distribution formats are illustrated in figure 8.
Figure 8: The four basic formats of template distribution
In the figure, each template element inhabits a particular memory area. This should be taken to mean that any data element aligned with a particular template element will be stored in the associated memory area (the template element itself doesn't occupy any memory).

Simple block distribution is specified by

In this case, each processor gets a contiguous block of template elements. All processors get the same sized block, unless the number of processors doesn't divide the number of template elements. In this case the template elements are divided evenly over most of the processors, with some trailing processor(s) having less (or zero).

Simple cyclic distribution is specified by

The first processor gets the first template element, the second gets the second, and so on. When the set of processors is exhausted, go back to the first processor, and continue allocating the template elements from there.

In a variant of the block distribution, the number of template elements allocated to each processor can be explicitly specified, as in

If this means that we allocate all template elements before exhausting processors, some processors are left empty. It is illegal with to choose a block size here which cause template elements to be left over after all processors have had their blocks allocated. But in an analogous variant of the cyclic distribution (``block-cyclic distribution'')
the product of the number of processors with the block size can be smaller than the template size, and allocation wraps round after the first assignment of blocks to all processors.

That covers the case where both template and processor are one dimensional. When the template both have (the same) higher dimension, each dimension can be distributed independently, mixing any of the four distribution formats. The correspondence between the template and the processor dimension is the obvious one. In

!HPF$ TEMPLATE T2 (17, 20)
the first dimension of T2 is distributed cyclically over the first dimension of P2; the second dimension is distributed blockwise over the second dimension of P2.

Finally, some dimensions of a template may have ``collapsed distributions'', allowing a template to be distributed onto a processor arrangement with fewer dimensions than the template. So

means that the first dimension of T2 will be distributed over P1 as was T1 in the first example above. But for a fixed value of the first index of T2, all values of the second subscript are mapped to the same processor.

next up previous contents
Next: Data alignment Up: High Performance Fortran Previous: The Processor Arrangement   Contents
Bryan Carpenter 2002-07-12