It often happens that some parts of a large parallel program cannot be written efficiently in the pure data parallel style, using overall constructs to process all elements of distributed arrays on essentially the same footing. Sometimes, for efficiency, a process has to be more ``introspective''--it has to get down and do some procedure that combines the locally held array elements in a non-trivial way. The local results may be combined with off-processor results in a separate step.