|
A Brief Retrospective on the Sprite Network Operating SystemSprite was a research operating system developed by my research group at UC Berkeley between 1984 and 1994. Four graduate students and I started the project in the Fall of 1984 because we felt that current operating systems were not paying enough attention to local-area networks. It seemed to us that networking support had been added in a quick and dirty fashion to systems that were designed to run stand-alone. As a result, networked workstations didn't work well together. At the time we started Sprite there were no good network file systems (even NFS didn't exist yet) and administering a network of workstations was a nightmare. Major AccomplishmentsOur goal for Sprite was to "do networking right" by building a new operating system kernel from scratch and designing the network support into the system from the start. We hoped to create a system where a collection of networked workstations would behave like a single system with both storage and processing power shared uniformly among the workstations. We hoped that users would be able to tap the power of the entire network while preserving the simple behavior and administrative ease of a traditional time-shared system. I think that we achieved these goals. Four technical accomplishments stand out in my mind. First and foremost was Sprite's network file system, which demonstrated that network file systems can provide a convenient user model without sacrificing performance. Sprite's file system allowed file sharing while completely hiding the network. It provided the same behavior in a networked environment that users would see if they all ran on a traditional time-sharing system. Even I/O devices could be accessed unformly across the network, and user processes could extend the file system by implementing its I/O and naming protocols using pseudo devices and pseudo file systems. At the same time, Sprite used aggressive file caching to achieve high performance. Sprite's network file system was the fastest in the world until well into the 1990s. Sprite's second key accomplishment was its process migration mechanism, which allowed processes to be moved transparently between workstations at any time. With process migration a single user could harness the power of many workstations simultaneously, achieving speedups of four or more on common system tasks such as recompilation. The migration mechanism kept track of idle machines, used them for migration, and evicted migrated processes when a workstation's owner returns, so that migration didn't impact the response time of active users. Evicted processes were remigrated to different idle machines or executed on their home machine. Sprite was one of only a few systems where process migration was been used on a day-to-day basis by a large user community. (In fairness, it's worth mentioning that process migration was complicated to implement and difficult to keep running as the system evolved; it's not surprising that migration never found its way into mainstream operating systems. The best implementations of process migration are now found in virtual machine monitors: the API for a virtual machine is easier to encapsulate than that for a modern operating system. One of the implementers of the virtual machine migration mechanism at VMware was Mike Nelson, who was one of the original members of the Sprite project.) Sprite's third key accomplishment was its single system image. The file system and process migration provided the most obvious evidence of the single system image, since they made storage and processing power sharable among workstations. But in many other ways Sprite looked and felt just like a single system. There was one root partition, one password file, one swap area (in the network file system), one login database, and so on. The "finger" command reported all users on all workstations in the Sprite cluster, not just those on the workstation where the command was invoked. System administration was no harder with fifty machines in the network than it was with ten, and adding a new machine was no more difficult than adding a new user account. Sprite's single system image also supported different machine architectures in the same cluster. We developed a framework for separating architecture-independent information from architecture-specific information. All the information for all architectures was visible at all times, which simplified cross-development, yet each machine used the appropriate architecture-dependent information when it was needed. Sprite's fourth key accomplishment was its log-structured file system (LFS), which demonstrated a radical new approach to file system design. LFS treated the disk more like a tape, writing information sequentially in large runs that permit great efficiency. We developed a new garbage collection mechanism that continually opens up large extents of free space on the disk. The result was a system that wrote small files to disk an order of magnitude faster than any other existing file system. At the same time it handled other operations, such as reads and large writes, at least as well as other systems. Log-structured file systems have many other advantages as well, such as fast crash recovery, the ability to store information on disk in compressed form, and the ability to vary the block size from file to file. The techniques from LFS have been adopted in commercial file system products such as those from NetApp and are also being used for new devices such as flash memory. Throughout the Sprite project we have tried to characterize the behavior of the system and to use this information to guide future developments. Some of our most important results were the measurements we made. The founding of the project was based in part on file system measurements made on time-shared systems in 1984 and 1985. We made additional measurements of Sprite usage in 1991 to see how patterns had changed and to analyze the potential application of non-volatile memory in networked systems. Perhaps the most significant accomplishment of all is that we were able to make the system work, not just for ourselves but for a community of users that numbered as high as 80 or more at the peak of the project. Many of these users depended on Sprite for all of their day-to-day computing needs, such as mail and printing. For a period of several years it was common to see 25-35 simultaneous logins of which only a half-dozen were Sprite developers. I know of only one other university project that developed a new operating system kernel from scratch and used it to support a user community this large for this long; that project was Multics, which was carried out at MIT in the late 1960's. Furthermore, we built Sprite (more than 200,000 lines of new code in all) with a small team that averaged only about four graduate students and one or two staff or undergraduate assistants. We never got too large to have project meetings in my office, although there were times when we had to borrow two additional chairs to supplement the six already in my office. Project HistoryThe history of Sprite divides into roughly three phases: initial development, consolidation and LFS development, and new ventures and closeout. The first phase of Sprite, initial development, lasted from the founding of the project in the fall of 1984 until about the end of 1987. We began coding on Sun-2 workstations in early 1985 and had a system that could execute shell commands by the spring of 1986. In the summer of 1986 we started developing the "real" Sprite file system (we'd used an older network file system called BNFS up until that point). About that time we also started on process migration and porting the X window system. By the fall of 1987 all of these things were working, along with an internet protocol sever. We had also ported Sprite to Sun-3's. At this point we copied the kernel sources over to Sprite and began doing all of our kernel development on Sprite itself. The second phase in Sprite's history lasted from late 1987 to late 1990. This phase consisted mostly of consolidation. In early 1988 we made a major revision of the file system. Remote device support was improved, the pseudo device implementation was rewritten, and a simple recovery protocol was introduced so the system could recover gracefully from server crashes. Process migration underwent major improvements also, and by late 1988 it became stable enough for us to use it daily in system development. In 1988 we ported Sprite to the SPUR research multiprocessor (the SPUR project provided much of the early funding for Sprite), and in 1989 we ported it to DECstation-3100 and Sun-4 platforms. A port to the Sequent Symmetry machine was carred out at Sequent in late 1989 and early 1990. In late 1988 we began to support users other than Sprite developers. The user community gradually grew in size, peaking at around 80 in 1990 and 1991. We also prepared a distribution tape so that we could make Sprite available to people outside Berkeley. The first tapes were sent out in late 1989; over the life of the project Sprite has run at about ten different sites. However, installing Sprite from the distribution was never very easy, and this limited usage of the system outside Berkeley. The most significant new development during Sprite's second phase was the LFS implementation. We made preliminary designs and studies in 1988 but didn't solidify the prototype design until 1989 (as part of the newly started RAID project). Coding started in late 1989. By the spring of 1990 LFS was showing signs of life, and it entered production use in the fall of 1990. By late 1991 virtually all of Sprite's dozen disks were using LFS. The final phase of the project started in late 1990 and continued until around 1995. In this phase we initiated several new projects, most of which reflected the increasing focus of the project on issues related to storage management. In the winter of 1990 we began to analyze the behavior of recovery after file server crashes; this led to a series of experiments with better recovery techniques, such as server-driven recovery and the use of non-volatile storage. In 1991 we began a project to see if Sprite could be re-implemented as a user-level server process running under the Mach 3.0 kernel; this project completed in the summer of 1992 with substantial functionality but disappointing performance. In 1991 and 1992 we also developed the Jaquith tape library system, which made robotically-controlled tape systems available for both Sprite and other UNIX systems. During the same period we started projects to experiment with striping files across multiple file servers (Zebra) and to apply the LFS techniques to disk arrays (Sawmill). Like most software, the Sprite kernel became harder and harder to maintain as it aged. Frequent revisions and changes in project personnel led to increases in system complexity, in spite of our best efforts to keep things clean and simple. In addition, we found it harder and harder to keep up with developments in commercial operating systems. By 1990 there were several commercial versions of UNIX with massive support teams, such as System V, Solaris, and OSF. These systems were adding features at a rapid pace and our users wanted access to these features under Sprite. We added new features such as shared libraries and binary compatibility with SunOS and Ultrix, but we found ourselves spending more and more time on tasks that were not research oriented. At the end of 1991 we decided to bring the Sprite project to a gradual close. After that we did not start any major new developments and no new graduate students joined the project. We stopped encouraging new users to work on Sprite, so the user community slowly shrank back to just the Sprite team. Sprite had served us long and well as a research vehicle; it was time to move on to other things. DisappointmentsMy biggest disappointment about the Sprite project is that we weren't able to transfer the Sprite technologies into mainstream usage. We made an open-source distribution of Sprite but it was difficult to install Sprite from the distribution. Furthermore, getting people to switch to a different operating system was hard: commercial Unix systems added features at a rate we couldn't match, and it was difficult to maintain application portability. There was no way to use some of the Sprite technologies (such as its file system) without adopting the entire system. As a result, other file systems such as NFS and AFS became widely used while Sprite's did not. Some of the Sprite ideas are gradually finding their way into wider usage, such as process migration (popularized in the form of virtual machine migration) and LFS (used in commercial products such as NetApp and in control systems for flash memory). However, these ideas had to be reimplemented; it wasn't possible for other people to simply take the Sprite code and use it. Perhaps it would have been better if we had built Sprite as an extension to an existing operating system, rather than building a new operating system from scratch; this would have simplified technology transfer. However, it probably would have kept us from exploring the single-system-image aspects of the system, which would have been difficult to implement in an existing system. Sprite ContributorsMany people have contributed to Sprite over the years. I can't possibly hope to list every significant contribution, but I'll try anyway. The list below summarizes the work of the principal project members (my research students and the staff who reported directly to me). The people are listed in chronological order by the date when they started working on Sprite-related things, and the projects are listed with the most important ones (in my opinion) first.
In addition to the people listed above, there were many others who made significant contributions to Sprite even though they didn't report directly to me. Here are a few of the "outside helpers"; apologies to anyone that I've overlooked. Bob Beck (Sequent port), Ann Chervenak (device drivers), Doug Johnson (SPUR debugging), Ed Lee (RAID striping driver), Dean Long (kernel bug fixing, bootstrapping, SPARCstation port), Ethan Miller (RAID controller support) , Srinivasan Seshan (Ultranet support), Thorsten Von Eicken (X11R4 port), Jay Vosburgh (Sequent port). Note: this page was originally written in the early 1990s; I rediscovered it in February of 2011 and updated it to produce this page. |