The second implement uses only LAPI_Amsend as communication. It will send actual messages along with message information. In this implementation the advantage is that we can reduce an extra communication to get message. But this increase number of message copies.
There are some common design issues in both implementations. Both implementations store and manage message buffer in C, like in mpiJava-based implementation. They both use Java thread synchronization to implement waiting in the MPI. Both use two static objects--``send queue'' and ``receive queue''--to maintain early arrived send and receive requests.
Figure 5.15 illustrates LAPI implementation with LAPI_Amsend and LAPI_Get communication calls. When source process receives a send request, it issues active message to target process. This active message contains message information like length, source and destination id, tag, and address of actual messages. Those information are used to identify matching send by the target process. Actual buffer data will remain with the sender until the target process gets the messages. This means sender thread must block until completion of data send when completion method (iwait() or iwaitany()) is called.
After the initial active message arrives at the target process, it calls the completion handler. In this handler, all the active message information are extracted and passed to the JVM by calling Java static method from JNI. In this static method, the posted receive queue is searched to see if receive has already been posted with matching the message description. If a matching receive is found, target issues GET operation to get actual messages from the source. To complete the transaction, the target must notify to source to wake any waiting thread. This is done by a second active message call to the source. It also wakes any user thread that is waiting for this receive by issuing a local notify signal. If there is no matching receive, it will store all the information into the send queue for later use.
A receive request on the target process behaves similarly to the target side of a sending active message call. The difference is it searches send queue instead of receive queue. And it stores to the receive queue when a matching send is not found.
Figure 5.16 illustrates LAPI implementation with single LAPI_Amsend communication call. Architecture of this implementation is simpler than the previous. Because messages are sent out when active message call is made, the source process does not have to wait for completion of communication. This decision eliminate call backs to the source from the target. The target does not have to perform any GET operation any more. However, this implementation must perform extra message copies that do not exist in the previous implementation. Whenever we see a transaction of a message from the Figure 5.16, whole messages are copied into different storage.
Even though the simpler implementation has extra message copy operations, it is faster with our problem sizes (up to float array size of 1024 x 1024). We will see in section 6.4 that our LAPI implementation was not faster than the SP MPI implementation. We believe this was due to reliance on Java-side thread synchronization, which appears to be slow. We believe that this problem could be overcome by doing thread synchonization on the C side using POSIX mechanisms, but didn't have time to test this.