next up previous contents
Next: High Performance Fortran Up: Historical Review of Data Previous: Historical Review of Data   Contents

Multi-processing and Distributed Data

Figure 2.1: Idealized picture of a distributed memory parallel computer
Figure 2.1 represents many successful parallel computers obtainable at the present time, which have a large number of autonomous processors, each possessing its own local memory area. While a processor can very rapidly access values in its own local memory, it must access values in other processors' memory either by special machine instructions or message-passing software. But the current technology can't make remote memory accesses nearly as fast as local memory accesses. Who then should get rid of this burden to minimize the number of accesses to non-local memory? The programmer or compiler? This is the communication problem. A next problem comes from ensuring each processor has a fair share of the whole work load. We absolutely lose the advantage of parallelism if one processor, by itself, finishes almost all the work. This is the load-balancing problem. High Performance Fortran (HPF) for example allows the programmer to add various directives to a program in order to explicitly specify the distribution of program data among the memory areas associated with a set of (physical or virtual) processors. The directives don't allow the programmer to directly specify which processor will perform a specific computation. The compiler must decide where to do computations. We will see in chapter 3 that HPJava takes a slightly different approach but still requires programmers to distribute data. We can think of some particular situation where all operands of a specific sub-computation, such as an assignment, reside on the same processor. Then, the compiler can allocate that part of the computation to the processor having the operands, and no remote memory access will be needed. So an onus on parallel programmers is to distribute data across the processors in the following ways:
Figure 2.2: A data distribution leading to excessive communication
Sometimes these are contradictory goals. Highly parallel programs can be difficult to code and distribute efficiently. But, equally often it is possible to meet the goals successfully. Suppose that we have an ideal situation where a program contains $ p$ sub-computations which can be executed in parallel. They might be $ p$ basic assignments consisting of an array assignment or FORALL construct. Assume that each expression which is computed combines two operands. Including the variable being assigned, a single sub-computation therefore has three operands. Moreover, assume that there are $ p$ processors available. Generally, the number of processors and the number of sub-computations are probably different, but this is a simplified situation.
Figure 2.3: A data distribution leading to poor load balancing
Figure 2.4: An ideal data distribution
Figure 2.2 depicts that all operands of each sub-computation are allocated in different memory areas. Wherever the computation is executed, each assignment needs at least two communications. Figure 2.3 depicts that no communication is necessary, since all operands of all assignments reside on a single processor. In this situation, the computation might be executed where the data resides. In this case, though, no effective parallelism occurs. Alternatively, the compiler might decide to share the tasks out anyway. But then, all operands of tasks on processors other than the first would have to be communicated. Figure 2.4 depicts that all operands of an individual assignment occur on the same processor, but each group of the sub-computations is uniformly well-distributed over processors. In this case, we can depend upon the compiler to allocate each computation to the processor holding the operands, requiring no communication, and perfect distribution of the work load. Except in the most simple programs, it is impossible to choose a distribution of program variables over processors like Figure 2.4. The most important thing in distributed memory programming is to distribute the most critical regions of the program over processors to make them as much like Figure 2.4 as possible.
next up previous contents
Next: High Performance Fortran Up: Historical Review of Data Previous: Historical Review of Data   Contents
Bryan Carpenter 2004-06-09