next up previous contents
Next: Sketch of a ``Device-Level'' Up: Process creation and monitoring Previous: Handling MPJ aborts Jini   Contents

Other failures--Jini leasing

The distributed event mechanism can rapidly clean up processes in the case where some slaves disappear unexpectedly, but it cannot generally reclaim resources in the case where the client process is killed during execution of an MPJ job, or the daemon process is killed while it has some active slaves, or in the case of network failures that don't directly affect the client. There is a danger that orphaned slave processes will be left running in the network.

The solution is to use the Jini leasing paradigm. The client leases the services of each daemon for some interval, and continues renewing leases until all slaves terminate, at which point it cancels its leases. If the client process is killed (or it connection to the slave machine fails), its leases will expire. If a client's lease expires the daemon applies the destroy method to the appropriate slave Process object.

If a user program deadlocks, it is assumed that the user eventually notices this fact and kills the client process. Soon after, the client's leases expire, and the orphaned slaves are destroyed. We anticipate that lease periods will be relatively short by Jini standards--perhaps on the order of 60 seconds.

This doesn't deal with the (presumably less common) case where a daemon is killed while it is servicing some MPJ job, but the slave continues to run. To deal with this case a daemon may lease the service of its own slave processes immediately after creating them. Should the daemon die, its leases on its slaves expire, and the slaves self-destruct.


next up previous contents
Next: Sketch of a ``Device-Level'' Up: Process creation and monitoring Previous: Handling MPJ aborts Jini   Contents
Bryan Carpenter 2002-07-12