One disadvantage of the program in Figure 3.18 of the last section is that it allocates two very large temporary arrays, ta and tb. Because these are both replicated in one dimension, they can easily consume more memory than the original arguments. This problem can be solved by storing copies of elements only in restricted bands of the original matrices at any one time. Figure 4.5 gives a modified algorithm where the maximum band width is B.
For simplicity we assumed here that
B is a compile-time constant. Alternatively we can compute
this value dynamically.
The volume() method on Range is used internally by array
constructors to control allocation of memory for array elements. It defines
the largest block of locations of the current range held by any processor.
Hence an upper bound on the number of elements held by any processor
for ta and tb combined is