Next: High Performance Fortran
Up: Historical Review of Data
Previous: Historical Review of Data
Contents
Figure 2.1:
Idealized picture of a distributed memory parallel computer
|
|
Figure 2.1 represents many successful parallel
computers obtainable at the present time, which have a large number of
autonomous processors, each possessing its own local memory
area. While a processor can very rapidly access values in its own
local memory, it must access values in other processors' memory either
by special machine instructions or message-passing software. But the
current technology can't make remote memory accesses nearly as fast as
local memory accesses. Who then should get rid of this burden to
minimize the number of accesses to non-local memory? The programmer or
compiler? This is the communication problem.
A next problem comes from ensuring each processor has a
fair share of the whole work load. We absolutely lose the advantage of
parallelism if one processor, by itself, finishes almost all the
work. This is the load-balancing problem.
High Performance Fortran (HPF) for example allows the programmer to add
various directives to a program in order to explicitly specify
the distribution of program data among the memory areas associated
with a set of (physical or virtual) processors. The directives don't
allow the programmer to directly specify which processor will perform
a specific computation. The compiler must decide where to do
computations. We will see in chapter 3 that HPJava
takes a slightly different approach but still requires
programmers to distribute data.
We can think of some particular situation where all operands
of a specific sub-computation, such as an assignment, reside on the same
processor. Then, the compiler can allocate that part of the computation
to the processor having the operands, and no remote memory access
will be needed. So an onus on parallel programmers is to distribute
data across the processors in the following ways:
Figure 2.2:
A data distribution leading to excessive
communication
|
|
- to minimize remote memory accesses, operands allocated to the
same processor should involve as many sub-computations as possible,
on the other hand,
- to maximize parallelism, the group of distinct
sub-computations which can execute in parallel at any time should
involve data on as many different processors as possible.
Sometimes these are contradictory goals. Highly parallel programs can be
difficult to code and distribute efficiently. But, equally often it is
possible to meet the goals successfully.
Suppose that we have an ideal situation where a program contains
sub-computations which can be executed in parallel. They
might be
basic assignments consisting of an array assignment
or FORALL construct. Assume that each expression which is
computed combines two operands. Including the variable being assigned, a
single sub-computation therefore has three operands. Moreover, assume
that there are
processors available. Generally, the number of
processors and the number of sub-computations are probably different,
but this is a simplified situation.
Figure 2.3:
A data distribution leading to poor load
balancing
|
|
Figure 2.4:
An ideal data distribution
|
|
Figure 2.2 depicts that all operands of each
sub-computation are allocated in different memory areas. Wherever the
computation is executed, each assignment needs at least two
communications.
Figure 2.3 depicts that no communication is necessary,
since all operands of all assignments reside on a single processor. In
this situation, the computation might be executed where the data
resides. In this case, though, no effective parallelism occurs.
Alternatively, the compiler might decide to share the tasks out anyway.
But then, all operands of tasks on processors other than the first
would have to be communicated.
Figure 2.4 depicts that all operands of an individual
assignment occur on the same processor, but each group of the
sub-computations is uniformly well-distributed over processors. In
this case, we can depend upon the compiler to allocate each
computation to the processor holding the operands, requiring no
communication, and perfect distribution of the work load.
Except in the most simple programs, it is impossible to choose
a distribution of program variables over processors like Figure
2.4. The most important thing in distributed memory
programming is to distribute the most critical regions of the
program over processors to make them as much like Figure 2.4
as possible.
Next: High Performance Fortran
Up: Historical Review of Data
Previous: Historical Review of Data
Contents
Bryan Carpenter
2004-06-09