Recall that if and are square matrix of order , then is also a square matrix of order , and is obtained by taking the dot product of the th row of and th column of . That is,

Fox's algorithm [3] for multiplication organize , and
into submatrix on a P by P process array. It take steps, in
each step, broadcast corresponding submatrix of on each row of the
processes, do local computation and then shift array for the next
step computation. Write this algorithm in HPJava, we still use
`Adlib.remap` to broadcast submatrix, `matmul` is a
subroutine for local matrix multiplication. `Adlib.shift` is
used to shift array , and `Adlib.copy` copies data back
after shift, it can also be implemented as nested `over` and
`for` loops.

Group p=new Procs2(P,P); Range x=p.dim(0); Range y=p.dim(1); on(p) { float [[#,#, , ]] a = new float [[x,y,B,B]]; float [[#,#, , ]] b = new float [[x,y,B,B]]; //input a, b here; float [[#,#, , ]] c = new float [[x,y,B,B]]; float [[#,#, , ]] temp = new float [[x,y,B,B]]; for (int k=0; k<p; k++) { over(Location i=x|:) { float [[,]] sub = new float [[B,B]]; //Broadcast submatrix in 'a' ... Adlib.remap(sub, a[[i, (i+k)%P, z, z]]); over(Location j=y|:) { //Local matrix multiplication ... matmul(c[[i, j, z, z]], sub, b[[i, j, z, z]]); } } //Cyclic shift 'b' in 'y' dimension ... Adlib.shift(tmp, b, 1, 0, 0); // dst, src, shift, dim, mode; Adlib.copy(b, tmp); } }

Bryan Carpenter 2002-07-12