Recall that if
and
are square matrix of order
, then
is also a square matrix of order
, and
is obtained
by taking the dot product of the
th row of
and
th column of
. That is,
Fox's algorithm [3] for multiplication organize
,
and
into submatrix on a P by P process array. It take
steps, in
each step, broadcast corresponding submatrix of
on each row of the
processes, do local computation and then shift array
for the next
step computation. Write this algorithm in HPJava, we still use
Adlib.remap to broadcast submatrix, matmul is a
subroutine for local matrix multiplication. Adlib.shift is
used to shift array
, and Adlib.copy copies data back
after shift, it can also be implemented as nested over and
for loops.
Group p=new Procs2(P,P);
Range x=p.dim(0);
Range y=p.dim(1);
on(p) {
float [[#,#, , ]] a = new float [[x,y,B,B]];
float [[#,#, , ]] b = new float [[x,y,B,B]];
//input a, b here;
float [[#,#, , ]] c = new float [[x,y,B,B]];
float [[#,#, , ]] temp = new float [[x,y,B,B]];
for (int k=0; k<p; k++) {
over(Location i=x|:) {
float [[,]] sub = new float [[B,B]];
//Broadcast submatrix in 'a' ...
Adlib.remap(sub, a[[i, (i+k)%P, z, z]]);
over(Location j=y|:) {
//Local matrix multiplication ...
matmul(c[[i, j, z, z]], sub, b[[i, j, z, z]]);
}
}
//Cyclic shift 'b' in 'y' dimension ...
Adlib.shift(tmp, b, 1, 0, 0); // dst, src, shift, dim, mode;
Adlib.copy(b, tmp);
}
}