Libraries are at the heart of the HPspmd model. From one point of view, the language extensions are simply a framework for invoking libraries that operate on distributed arrays. The base language model was originally motivated by work on HPF runtime libraries carried out in the Parallel Compiler Runtime Consortium (PCRC) project  led by Syracuse (and earlier related work by one of us ).
Hence an essential component of the proposed work is to define a series of bindings from our languages to established SPMD libraries and environments. Because our language model is explicitly SPMD, such bindings are a more straightforward proposition than in HPF, where one typically has to pass some extrinsic interface barrier before invoking SPMD-style functions.
Various issues must be addressed in interfacing to multiple libaries. For example, low-level communication or scheduling mechanisms used by the different libraries may be incompatible. As a practical matter these incompatibilities must be addressed, but the main thrust of the proposed research is at the level of designing compatible interfaces, rather than solving interference problems in specific implementations.
We will group the existing SPMD libraries for data parallel programming into three classes, loosely based on the complexity of design issues involved in integrating them into our language framework.
In the first class we have libraries like ScaLAPACK  and PetSc  where the primary focus is similar to conventional numerical libraries--providing implementations of standard matrix algorithms, say, but operating on elements in regularly distributed arrays. We believe that designing HPspmd interfaces to this kind of package will be relatively straightforward
ScaLAPACK for example, provides linear algebra routines for distributed-memory computers. These routines operate on distributed arrays--in particular, distributed matrices. The distribution formats supported are restricted to two-dimensional block-cyclic distribution for dense matrices and one-dimensional block distribution for narrow-band matrices. Since both these distribution formats are supported by HPspmd (it supports all HPF-compatible distribution formats), using ScaLAPACK routines from the HPspmd framework should present no fundamental difficulties. Problems can only arise if the caller attempts to pass in matrix with a distribution format unsupported by the ScaLAPACK routines. The interface code between HPspmd and ScaLAPACK (which converts between array descriptors) must either flag a runtime error in this case, or remap the argument array (using, for example, the remap primitive of Adlib ).
In the second class we place libraries conceived primarily as underlying support for general parallel programs with regular distributed arrays. They emphasize high-level communication primitives for particular styles of programming, rather than specific numerical algorithms. These libraries include rutimes libraries for HPF-like languages, such as Adlib and Multiblock Parti , and the Global Array toolkit .
Adlib is a runtime library was initially designed to support HPF translation. It provides communication primitives similar to Multiblock PARTI, plus all Fortran 90 transformational intrinsics for arithmetic on distributed arrays. It also provides some gather/scatter operations for irregular access.
The array descriptor of Adlib supports the full HPF 1.0 distributed array model--including all standard distribution formats, all alignment options including replicated alignment, and a facility to map an array to an arbitrary subgroup of the set of active processors. The runtime array descriptor of the HPspmd languages will be an enhanced version of the Adlib descriptor (with a few extra features, such as support for the GENBLOCK distribution format of HPF 2.0 ). The Adlib collective communication library will provide initial library support for regular applications in HPspmd.
The Global Array (GA) toolkit, developed at Pacific Northwest National Lab, provides an efficient and portable ``shared-memory'' programming interface for distributed-memory computers. Each process in a MIMD parallel program can asynchronously access logical blocks of distributed arrays, without need for explicit cooperation by other processes (``one-sided communication''). This model has been popular and successful. GA is a foundation of the NWChem [3,27] computational chemistry package.
The existing interface to Global Arrays only supports two-dimensional arrays with general block distribution format. Distributed arrays are created by calls to Fortran functions which return integer handles to an array descriptor. The authors of the package are currently investigating generalization to support multi-dimensional arrays, with more general distribution format. They have already expressed interest in making their library accessible through the kind of language extensions for distributed arrays described in this proposal.
Besides providing a much more tractable interface for creation of multidimensional distributed arrays, our syntax extensions will provide a more convenient interface to primitives such as ga_get, which copies a patch of a global array to a local array. Advantages over the existing API include the fact is that the interface can be made uniform for all ranks of arrays, and various sorts of checking can subsumed by the general mechanisms for array section creation, leading to improved safety and compile-time analysis.
Regular problems (such as the linear algebra examples in section 3.1) previous section) are an important subset of parallel applications, but of course they are far from exclusive. Many important problems involve data structures too irregular to express purely through HPF-style distributed arrays.
Our third class of libraries therefore includes libraries designed to support irregular problems. These include CHAOS [35,16] and DAGH .
We anticipate that irregular problems will still benefit from regular data-parallel language extensions (because, at some level they usually resort to representations involving regular arrays). But lower level SPMD programming, facilitated by specialized class libraries, are likely to take a more dominant role when dealing with irregular problems.
The CHAOS/PARTI runtime support library provides primitives for efficiently handling irregular problems on distributed memory computers. The complete library includes partitioners to choose optimized mapping on arrays to processors, functions to remap input arrays to meet the optimized partitioning, and functions which optimize interprocessor communications. After data is repartitioned (if necessary) CHAOS programs involve two characteristic phases. The inspector phase analyses data access patterns in the main loop, and generates a schedule of optimized optimized communication calls. The executor phase involves executing a loop essentially similar to the loop of the original sequential program.
How best to capture this complexity in a convenient HPspmd interface will be a subject of research in the proposed work. A baseline approach (in HPJava, for example) is to handle the translation tables, schedules, etc of CHAOS as ordinary Java objects, constructed and accessed in explicit library calls. Presumbly the initial values for the data and indirection arrays will be provided as normal HPspmd distributed arrays. The simplest assumption is that the CHAOS preprocessing phases yield new arrays: the indirection arrays may well be left as HPspmd distributed arrays, but the data arrays may be reduced to ordinary Java arrays holding local elements (in low-level SPMD style). Then, with no extensions to the currently proposed HPJava language, the parallel loops of the executor phase can be expressed using overall constructs. More advanced schemes may incorporate irregular maps into generalized array descriptor [20,17,13]. Extensions to the HPspmd language model may be indicated.
DAGH (Distributed Adaptive Grid Hierarchy) was developed at Texas, Austin as a computational toolkit for several projects including the Binary Black Hole NSF Grand Challenge Project. It provides the framework to solve systems of partial differential equations using adaptive mesh refinement methods. The computations can be executed sequentially or in parallel according to the specification of the user. In the parallel case DAGH takes over communication, updating ghost regions on the boundaries of component grids.
Conceivably the HPspmd distributed array descriptor could be generalized to directly represent a DAGH grid hierarchy. This is probably unrealistic. DAGH implements a non-trivial storage scheme for its grid hierarchy, based on space-filling curves. It seems unlikely that the details of such a structure can be sensibly handled by a compiler. A more straightforward possibility is to represent the individual grid functions (on the component regular meshes of the hierachy) as essentially standard HPspmd distributed arrays. Since DAGH is supposed to maintain storage for these functions in Fortran-compatible fashion, it should be practical to create an HPspmd array descriptor for them. The hierarchy itself would be represented as a Java object from a library-defined class. This is a crude outline of a particular scenario. Devising practical and convenient HPspmd bindings for DAGH and similar application-oriented libraries is a research topic in the proposed work.