Next: Summary Up: Programming examples in HPJava Previous: Choleski decomposition

## Fox's algorithm for matrix multiplication

Recall that if and are square matrix of order , then is also a square matrix of order , and is obtained by taking the dot product of the th row of and th column of . That is,

Fox's algorithm [3] for multiplication organize , and into submatrix on a P by P process array. It take steps, in each step, broadcast corresponding submatrix of on each row of the processes, do local computation and then shift array for the next step computation. Write this algorithm in HPJava, we still use Adlib.remap to broadcast submatrix, matmul is a subroutine for local matrix multiplication. Adlib.shift is used to shift array , and Adlib.copy copies data back after shift, it can also be implemented as nested over and for loops.

Group p=new Procs2(P,P);

Range x=p.dim(0);
Range y=p.dim(1);

on(p) {
float [[#,#, , ]] a = new float [[x,y,B,B]];
float [[#,#, , ]] b = new float [[x,y,B,B]];
//input a, b here;

float [[#,#, , ]] c = new float [[x,y,B,B]];
float [[#,#, , ]] temp = new float [[x,y,B,B]];

for (int k=0; k<p; k++) {
over(Location i=x|:) {
float [[,]] sub = new float [[B,B]];

over(Location j=y|:) {
//Local matrix multiplication ...
matmul(c[[i, j, z, z]], sub, b[[i, j, z, z]]);
}
}

//Cyclic shift 'b' in 'y' dimension ...
Adlib.shift(tmp, b, 1, 0, 0); // dst, src, shift, dim, mode;