One disadvantage of the program in Figure 3.15
is that it allocates two very large temporary arrays, `ta` and `tb`. Because these are both replicated in one dimension, they can
easily consume more memory than the original data. This problem can be
solved by only storing copies of elements in restricted bands of the
original matrices at any one time. Figure 4.5
gives a modified algorithm where the maximum band width is `B`.

For simplicity we assumed here that
`B` is a compile-time constant. Alternatively we can compute
this value dynamically.
The `volume()` method on `Range` is used internally by array
constructors to control allocation of memory for array elements. It defines
the largest block of locations for the current range held by any processor.
Hence an upper bound on the number of elements held by any processor
for `ta` and `tb` combined is

B * x.volume() + B * y.volume()If

B = MAX_TEMPORARY_SIZE / (x.volume() + y.volume())With a few refinements like this, the algorithm of Figure 4.5 is a credible basis for a library matrix multiplication routine, applicable to generic distributed arrays.