next up previous contents
Next: Data Alignment Up: High Performance Fortran Previous: High Performance Fortran   Contents


The Processor Arrangement and Templates

The programmer often wishes to explicitly distribute program variables over the memory areas with respect to a set of processors. Then, it is desirable for a program to have some representations of the set of processors. This is done by the PROCESSORS directive. The syntax of a PROCESSOR definition looks like the syntax of an array definition of Fortran:
 
!HPF$ PROCESSORS P (10)
This is a declaration of a set of 10 abstract processors, named P. Sets of abstract processors, or processor arrangements, can be multi-dimensional. For instance,
!HPF$ PROCESSORS Q (4, 4)
declares 16 abstract processors in a $ 4 \times 4$ array. The programmer may directly distribute data arrays over processors. But it is often more satisfactory to go through the intermediate of an explicit template. Templates are different from processor arrangements. The collection of abstract processors in an HPF processor arrangement may not correspond identically to the collection of physical processors, but we implicitly assume that abstract processors are used at a similar level of granularity as the physical processors. It would be unusual for shapes of the abstract processor arrangements to correspond to those of the data arrays in the parallel algorithms. With the template concept, on the other hand, we can capture the fine-grained grid of the data array.
!HPF$ TEMPLATE T (50, 50, 50)
declares a $ 50 \times 50 \times 50$ three-dimensional template, called T. Next we need to talk about how the elements of the template are distributed among the elements of the processor arrangement. A directive, DISTRIBUTE, does this task. Suppose that we have
!HPF$ PROCESSORS P1 (4)
!HPF$ TEMPLATE T (17)
There are various schemes by which T may be distributed over P1. A distribution directive specifies how template elements are mapped to processors. Block distributions are represented by
!HPF$ DISTRIBUTE T1 (BLOCK) ONTO P1
!HPF$ DISTRIBUTE T1 (BLOCK (6)) ONTO P1
In this situation, each processor takes a contiguous block of template elements. All processors take the identically sized block, if the number of processors evenly divides the number of template elements. Otherwise, the template elements are evenly divided over most of the processors, with last processor(s) holding fewer. In a modified version of block distribution, we can explicitly specify the specific number of template elements allocated to each processor. Cyclic distributions are represented by
!HPF$ DISTRIBUTE T1 (CYCLIC) ONTO P1
!HPF$ DISTRIBUTE T1 (CYCLIC (6)) ONTO P1
In the basic situation, the first template element is allocated on the first processor, and the second template element on the second processor, etc. When the processors are used up, the next template element is allocated from the first processor in wrap-around fashion. In a modified version of cyclic distribution, called block-cyclic distribution, the index range is first divided evenly into contiguous blocks of specified size, and these blocks are distributed cyclically. In the multidimensional case, each dimension of the template can be independently distributed, mixing any of the four distribution patterns above. In the example:
!HPF$ PROCESSOR P2 (4, 3)
!HPF$ TEMPLATE T2 (17, 20)
!HPF$ DISTRIBUTE T2 (CYCLIC, BLOCK) ONTO P2
the first dimension of T2 is cyclically distributed over the first dimension of P2, and the second dimension of of T2 is distributed blockwise over the second dimension of P2. Another important feature is that some dimensions of a template might have collapsed mapping, allowing a template to be distributed onto a processor arrangement with fewer dimensions than template:
!HPF$ DISTRIBUTE T2 (BLOCK, *) ONTO P1
represents that the first dimension of T2 will be block-distributed over P1. But, for a fixed value of the index T2, all values of the second subscript are mapped to the same processor.


next up previous contents
Next: Data Alignment Up: High Performance Fortran Previous: High Performance Fortran   Contents
Bryan Carpenter 2004-06-09