DEJAN S. MILOJICIC†, FRED DOUGLIS‡, YVES PAINDAVEINE††, RICHARD WHEELER‡‡ and SONGNIAN ZHOU* † HP Labs, ‡ AT&T Labs–Research, †† TOG Research Institute, ‡‡ EMC, and *University of Toronto and Platform Computing
Process migration is the act of transferring a process between two machines. It enables dynamic load distribution, fault resilience, eased system administration, and data access locality. Despite these goals and ongoing research efforts, migration has not achieved widespread use. With the increasing deployment of distributed systems in general, and distributed operating systems in particular, process migration is again receiving more attention in both research and product development. As high-performance facilities shift from supercomputers to networks of workstations, and with the ever-increasing role of the World Wide Web, we expect migration to play a more important role and eventually to be widely adopted. This survey reviews the field of process migration by summarizing the key concepts and giving an overview of the most important implementations. Design and implementation issues of process migration are analyzed in general, and then revisited for each of the case studies described: MOSIX, Sprite, Mach and Load Sharing Facility. The benefits and drawbacks of process migration depend on the details of implementation and therefore this paper focuses on practical matters. This survey will help in understanding the potentials of process migration and why it has not caught on. Categories and Subject Descriptors: C.2.4 [Computer-Communication Networks]: Distributed Systems - network operating systems; D.4.7 [Operating Systems]: Organization and Design - distributed systems; D.4.8 [Operating Systems]: Performance: measurements; D.4.2 [Operating Systems]: Storage Management - distributed memories. Additional Key Words and Phrases: process migration, distributed systems, distributed operating systems, load distribution.
1 INTRODUCTION A process is an operating system abstraction representing an instance of a running computer program. Process migration is the act of transferring a process between two machines during its execution. Several implementations have been built for different operating systems, including MOSIX [Barak and Litman, 1985], V [Cheriton, 1988], Accent [Rashid and Robertson, 1981], Sprite [Ousterhout et al., 1988], Mach [Accetta et al., 1986], and OSF/1 AD TNC [Zajcew et al., 1993]. In addition, some systems provide mechanisms that checkpoint active processes and resume their execution in essentially the same state on another machine, including Condor [Litzkow et al., 1988] and Load Sharing Facility (LSF) [Zhou et al., 1994]. Process migration enables: • dynamic load distribution, by migrating processes from overloaded nodes to less loaded ones, • fault resilience, by migrating processes from nodes that may have experienced a partial failure, • improved system administration, by migrating processes from the nodes that are about to be shut down or otherwise made unavailable, and • data access locality, by migrating processes closer to the source of some data.
Despite these goals and ongoing research efforts, migration has not achieved widespread use. One reason for this is the complexity of adding transparent migration to systems originally designed to run stand-alone, since designing new systems with migration in mind from the beginning is not a realistic option anymore. Another reason is that there has not been a compelling commercial argument for operating system vendors to support process migration. Checkpoint-restart approaches offer a compromise here, since they can run on more looselycoupled systems by restricting the types of processes that can migrate. In spite of these barriers, process migration continues to attract research. We believe that the main reason is the potentials offered by mobility as well as the attraction to hard problems, so inherent to the...