|
Current ProjectsNote: I am no longer taking on new PhD students. In my research projects we don't just build prototype systems and write papers; we try to build production-quality software systems that can be used by other people for real work. In most cases we release the software in open-source form. This approach is unusual in academia, but it allows us to do better research: designing for production use forces us to think about important issues that could be ignored otherwise, and measurements of usage allow us to evaluate our ideas more thoroughly. Furthermore, this approach allows students to develop a higher level of system-building maturity than would be possible building toy research prototypes. Granular Computing (2016– )The last decade has seen dramatic reductions in latency for several key technologies in computing. For example, modern datacenter networks now support round-trip times less than 5 μs (versus hundreds of microseconds a decade ago); flash storage devices offer access times of 10–50 μs (vs. 5–10 ms for disks); and new nonvolatile memories such as 3D XPoint could offer access times in the 1 μs range. However, today's software stacks have high overheads that make it impossible to reap the full benefits of these technologies. We are developing new software stack layers that can fully exploit new low-latency technologies. Our overall goal is to enable the efficient execution of very large numbers of very small tasks in a datacenter environment. We call this granular computing, and we envision tasks ranging in length from a few milliseconds down to a few microseconds. Here are a few of our current projects (as of late 2017):
Previous ProjectsThe following sections describe some projects on which I have worked in the past. I am no longer working actively on these projects, though several of them are still active as open-source projects or commercial products. The years listed for each project represent the period when I was involved. RAMCloud (2009–2017)The goal of the RAMCloud project was to create a large-scale datacenter storage system with the lowest possible access latency. RAMCloud does this by keeping all data in DRAM at all times. The system combines three overall attributes: low latency, large scale, and durability. In our 80-node development cluster with QDR Infiniband, a client can read any 100-byte object in less than 5 μs, and durable writes take about 13.5 μs. In a large datacenter with 100,000 nodes, we expect small reads to complete in less than 10 μs, which is 50–1000x faster than existing storage systems. RAMCloud's second attribute is large scale. In order to support future Web applications, we designed RAMCloud to allow clusters to grow to at least 10,000 servers. RAMCloud aggregates all of their memories into a single coherent key-value store. This allows storage capacities of 1 PB or more. The third attribute of RAMCloud is durability. Although RAMCloud keeps all data in DRAM, it also maintains backup copies of data on secondary storage to ensure a high level of durability and availability. This frees application developers from the need to manage a separate durable storage system, or to maintain consistency between in-memory and durable storage. For an overview of RAMCloud, see [TOCS paper]. Here are a few of the interesting projects we have undertaken as part of building RAMCloud:
In January 2014 we officially tagged RAMCloud version 1.0, which meant that the system had reached a point of maturity where it could support real applications. Newer features, such as transactions and secondary indexes, have been added since then. As of early 2017, we are undertaking fewer new research projects on RAMCloud; instead, we are using RAMCloud more as a driving application for projects in granular computing. Raft (2012–2014)As part of implementing RAMCloud, we needed a consensus algorithm in order to maintain replicated cluster configuration data. We initially considered using Paxos, but found it incredibly difficult to understand. Furthermore, the Paxos architecture requires complex changes to support practical systems. As a result, we decided to see if we could design a new consensus algorithm with better properties Paxos. The most important goal was for the algorithm to be easy to understand and reason about; in addition, we wanted a formulation that was practical for real implementations. The result is Raft. [USENIX ATC 2014 paper] [Diego Ongaro PhD thesis] Fiz (2008–2011)The Fiz project explored new frameworks for highly interactive Web applications. The goal of the project was to develop higher-level reusable components in order to encourage reusability and simplify application development by hiding inside the components many of the complexities that bedevil Web developers (such as security issues or using Ajax to enhance interactivity). Related links:
ElectricCommander (2005–2007, Electric Cloud)ElectricCommander is the second major product for Electric Cloud. It addresses the problem of managing software development processes such as nightly builds and automated test suites. Most organizations have home-grown software for these tasks, which is hard to manage and scales poorly as the organization grows. ElectricCommander provides a general-purpose Web-based platform for managing these processes. Developers use a Web interface to describe each job as a collection of shell commands that run serially or in parallel on one or more machines. The ElectricCommander server manages the execution of these commands according to schedules defined by the developers. It also provides a variety of reporting tools and manages a collection of server machines to allow concurrent execution of multiple jobs. ElectricAccelerator (2002–2005, Electric Cloud)
ElectricAccelerator is Electric Cloud's first product; it
accelerates software builds based on the Tcl/Tk (1988–2000, U.C. Berkeley, Sun, Scriptics)Tcl is an embeddable scripting language: it is a simple interpreted language implemented as a library package that can be incorporated into a variety of applications. Furthermore, Tcl is extensible, so additional functions (such as those provided by the enclosing application) can easily be added to those already provided by the Tcl interpreter. Tk is a GUI toolkit that is implemented as a Tcl extension. I initially built Tcl and Tk as hobby projects and didn't think that anyone besides me would care about them. However, they were widely adopted because they solved two problems. First, the combination of Tcl and Tk made it much easier to create graphical user interfaces than previous frameworks such as Motif (and, Tcl and Tk were eventually ported from Unix to the Macintosh and Windows, making Tcl/Tk applications highly portable). Second, Tcl made it easy to include powerful command languages in a variety of applications ranging from design tools to embedded systems. Related links:
Log-Structured File Systems (1988–1994, U.C. Berkeley)A log-structured file system (LFS) is one where all information is written to disk sequentially in a log-like structure, thereby speeding up both file writing and crash recovery. The log is the only structure on disk; it contains index information so files can be read back from the log efficiently. LFS was initially motivated by the RAID project, because random writes are expensive in RAID disk arrays. However, the LFS approach has also found use in other settings, such as flash memory where wear-leveling is an important problem. Mendel Rosenblum created the first LFS implementation as part of the Sprite project; Ken Shirriff added RAID support to LFS in the Sawmill project, and John Hartman extended the LFS ideas into the world of cluster file systems with Zebra. Related links:
Sprite (1984–1994, U.C. Berkeley)
Sprite was a Unix-like network operating system built at U.C. Berkeley
and used for day-to-day computing by about 100 students and staff for
more than five years. The overall goal of the project was to create a
single system image, meaning that a collection of workstations
would appear to users the same as a single time-shared system, except
with much better performance. Among its more notable features were
support for diskless workstations, large client-level file caches
with guaranteed consistency, and a transparent process migration
mechanism. We built a parallel version of Related links:
VLSI Design Tools (1980–1986, U.C. Berkeley)When I arrived at Berkeley in 1980 the Mead-Conway VLSI revolution was in full bloom, enabling university researchers to create large-scale integrated circuits such as the Berkeley RISC chips. However, there were few tools for chip designers to use. In this project my students and I created a series of design tools, starting with a simple layout editor called Caesar. We quickly replaced Caesar with a more powerful layout editor called Magic. Magic made two contributions: first, it implemented a number of novel interactive features for developers, such as incremental design-rule checking, which notified developers of design rule violations immediately while they edited, rather than waiting for a slow offline batch run. Second, Magic incorporated algorithms for operating on hierarchical layout structures, which were much more efficient than previous approaches that "flattened" the layout into a non-hierarchical form. Magic took advantage of a new data structure called corner stitching, which permitted efficient implementation of a variety of geometric operations. We also created a switch-level timing analysis program called Crystal. We made free source-level releases of these tools and many others in what became known as the Berkeley VLSI Tools Distributions; these represented one of the earliest open-source software releases. Magic continued to evolve after the end of the Berkeley project; as of 2005 it was still widely used. Cm* and Medusa (1975–1980, Carnegie Mellon University)As a graduate student at Carnegie Mellon University I worked on the Cm* project, which created the first large-scale NUMA multiprocessor. Cm* consisted of 50 LSI-ll processors connected by a network of special-purpose processors called Kmaps that provided shared virtual memory. I worked initially on the design and implementation of the Kmap hardware, then shifted to the software side and led the Medusa project, which was one of two operating systems created for Cm*. The goal of the Medusa project was to understand how to structure an operating system to run on the NUMA hardware; among other things, Medusa was the first system to implement coscheduling (now called "gang scheduling"). |