Dawson Engler

Associate Professor
Computer Science and Electrical Engineering
Gates Building 3A-314
353 Serra Mall
Stanford University
Stanford, CA 94305-9030

engler WHERE stanford DOM edu

I am a joint EE/CS associate professor. Before that, I was an irresponsible graduate student in Frans Kaashoek's PDOS group at MIT's Lab for Computer Science, where I co-founded the exokernel operating system project, which formed the basis of my thesis work. I now I have my own students, who sometimes listen to me:

I am currently looking a few additional students. If you know how to hack, we should meet.

A few awards:

Two good recent papers:

A few billion lines of code later: using static analysis to find bugs in the real world

Klee: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs.

My research focuses on effective, automatic methods for finding lots of bugs in real code. We have used three main approaches: system-specific static analysis, implementation-level model checking, and most recently, using symbolic execution to automatically generate high-coverage tests. The following papers give a reasonable overview of the first two approaches (the Klee paper above describes that last):

eXplode: a Lightweight, General System for Finding Serious Storage System Errors, Junfeng Yang, Can Sar, and Dawson Engler, Proceedings of the 7th Symposium on Operating System Design and Implementation, 2006. (postscript, PDF)
Bugs as Deviant Behavior: A General Approach to Inferring Errors in Systems Code (postscript) (PDF). Slides: PS, PDF, PPT. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Appeared in: Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, 2001.
This paper has some of my favorite ideas, though unfortunately I can't say the same for the writing. If a static checking person was going to read just one of our papers this should be it: I've used its ideas in every static checker I've built since writing it, typically more than once. I've never seen a checker that could not be improved by using its trick of belief analysis to infer what to check or what the state of a checked system is based on what programmers seem to believe.
Checking System Rules Using System-Specific, Programmer-Written Compiler Extensions (Best Paper) (postscript) (PDF). Slides: PS, PDF, PPT. Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Appeared in: Proceedings of the 4th Symposium on Operating System Design and Implementation.
This paper discusses a set of small extensions that found roughly 500 bugs in Linux, OpenBSD, and the Xok exokernel. The extensions were usually less than 100 lines. Our current static checking systems are much more powerful, but the basic approach is roughly the same. I co-founded a company, Coverity, several years ago with three of my students (Seth Hallem, Andy Chou and Ben Chelf) to commercialize this static checking work. Coverity has 500+ customers and 130+ employees. A free trial of the tool can be obtained here.

This paper describes the main thing I did before bug finding:

Exokernel: An Operating System Architecture for Application-Level Resource Management,
Dawson R. Engler, M. Frans Kaashoek and James W. O'Toole
This paper appeared in SOSP95 ( slides are available here). It is the most detailed description of what an exokernel is, but is rather dense. The SOSP paper from 1997 (below) has real application numbers and several years of experience, plus a cool trick for incrementally verifying untrusted deterministic code.

Some recent talks

EXEcution generated Executions: Automatically generating inputs of death. (postscript, PDF, PPT). A 30 minute talk on our latest project: automatically generating concrete inputs that blow up real code.
Weird things that surprise academics trying to commercialize a static checking tool. (postscript, PDF, PPT). This is part of an invited talk at SPIN05 and CONCUR05.
Using model checking and execution generated tests to find bugs in real code (postscript, PDF, PPT). Invited talk, Usenix Security 2005.
Static analysis versus model checking for bug finding (postscript, PDF, PPT). Keynote, SoftMC 2003.
Finding bugs with system-specific static analysis (postscript, PDF, PPT). Keynote, PASTE 2002.
How to find lots of bugs with system-specific static analysis. (postscript, PDF, PPT). Distinguished lecture, SUNY Stonybrook, 2001.

Below are the rest of our papers...

Automatic generation of high-coverage tests.

Under-constrained execution: making automatic code destruction easy and scalable, Dawson Engler and Daniel Dunbar, International Symposium on Software Testing and Analysis (ISSTA), 2007. (PDF)
EXE: Automatically Generating Inputs of Death, (postscript) (PDF) Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006.
Automatically generating malicious disks using symbolic execution, Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. (PDF) (postscript), IEEE Security and Privacy, 2006. This paper uses a system we built, EXE, to automatically generate disk images of death that, when mounted, will cause a file system to crash. Many systems now let untrusted users mount files as file systems, which opens them up to all sorts of exploits, given the somewhat erratic input checking done by current file system code.
Automatic test case generation by executing code on symbolic inputs. A technical report containing our SOSP05 submission gives preliminary results: Execution generated test cases: how to make systems code crash itself, CSTR-2005-04 (pdf). The trick is simple and cute. A shortened version appeared as an invited paper in SPIN05 (pdf). The two EXE papers given above are probably better to read.

Software model checking

Using Model Checking to Find Serious File System Errors (Best Paper) (postscript) (PDF),
Junfeng Yang, Paul Twohey, Dawson Engler, and Madanlal Musuvathi, Operating System Design and Implementation (OSDI) 2004.
Describes FiSC, a file system model checking tool. We applied it to three widely-used, heavily-tested file systems: ext3, JFS, and ReiserFS. We found serious bugs in all of them, 32 in total. Most have led to patches within a day of diagnosis. For each file system, FiSC found demonstrable events leading to the unrecoverable destruction of metadata and entire directories, including the file system root directory ``/''. The actual bugs we found are here.
Model-checking large network protocol implementations (postscript) (PDF), Dawson Engler and Madanlal Musuvathi. Proceedings of the First Conference on Network System Design and Implementation (NSDI), 2004. Describes the challenges in model checking 50K lines of TCP code. One of the surprising results was that it was easier to run the entire Linux kernel in Madan's CMC model checker than extract out TCP in a stand-alone version.
Static analysis versus software model checking for bug finding (postscript) (PDF). Extended version with better formatting: (postscript) (PDF),
Dawson Engler and Madanlal Musuvathi.
Invited paper for VMCAI04. It describes some of our experiences finding bugs with both model checking and static analysis, including some of the less pleasant aspects of both. Slides are available: postscript, PDF, PPT.

The paper is a reworked version of a invited paper for the SoftMC 2003 workshop: "Some lessons from using static analysis and software model checking for bug finding" (postscript) (PDF), Madanlal Musuvathi and Dawson R. Engler. The writing could be better, but some of the tables are reasonable.
Slides are available: postscript, PDF, PPT.
CMC: A pragmatic approach to model checking real code (postscript) (PDF) ,
Madanlal Musuvathi, David Y.W. Park, Andy Chou, Dawson R. Engler, David L. Dill.
Appeared in OSDI 2002. A model checker that can check C code directly, dispensing with the need to write a specification. We applied it to three different AODV routing protocol implementations, where it found roughly one bug per 300 lines of code.
A Simple Method for Extracting Models from Protocol Code (postscript) (PDF) ,
David Lie, Andy Chou, Dawson Engler, and David Dill
Appeared in ISCA 2001. It shows how to check deeper properties than possible with static analysis by using MC to automatically extract specifications (models) from actual C code and then running these models through a formal verifier.

Using Statistical Analysis for Bug Finding

From Uncertainty to Belief: Inferring the Specification Within, Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, Dawson Engler, Proceedings of the 7th Symposium on Operating System Design and Implementation, 2006. (PDF)
Correlation Exploitation in Error Ranking (postscript) (PDF) ,
Ted Kremenek, Ken Ashcraft, Junfeng Yang and Dawson Engler. To appear: Foundations of Software Engineering (FSE) 2004. This paper demonstrates a error ranking scheme that uses machine learning to reduce the observed false positive rate from 2 to 8 times as compared to random ranking. It exploits the fact that both true error and false positives cluster. I.e., given a group of related error messages, if one is a true error it is likely that the others are bugs as well. Similarly, if one is a false positive, it is likely the others are false positives as well.
Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations (postscript) (PDF) ,
Ted Kremenek and Dawson Engler. Appeared in SAS 2003. A simple technique for detecting when static analysis makes a mistake. The basic intuition: the most trustworthy analysis decisions are those that lead to many successful checks and relatively few errors. Analysis mistakes tend to flag many errors and have few successful checks.

(Reasonably) generic static analysis for bug finding

MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties (PDF)
Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler. Proceedings of the 10th ACM conference on Computer and communication security (ACM CCS), 2003. Flexible annotations plus methods for propagating them in a way that does not require much manual labor nor kill you with false positives.
RacerX: Effective, Static Detection of Race Conditions and Deadlocks (postscript) (PDF)
Dawson Engler and Ken Ashcraft. Appeared in SOSP 2003. Has about 20 tricks or so (unlockset analysis is the coolest) to help find racers and deadlocks. Slides are available: PS, PDF, PPT.
ARCHER: Using Symbolic, Path-sensitive Analysis to Detect Memory Access Errors (postscript) (PDF)
Yichen Xie, Andy Chou, and Dawson Engler. To appear in FSE 2003.
Using Redundancies to Find Errors (Award Paper) (postscript) (PDF)
Yichen Xie and Dawson Engler
To appear in TSE. Finds funny bugs by looking for redundant operations (dead code, unused assignments, etc.). From empirical measurements, code with such redundant errors is 50-100% more likely to have hard errors. Also describes how to check for redundancies to find holes in specifications.

It is an expanded version of an identically titled award paper (postscript) (PDF) that appeared in Foundations of Software Engineering (FSE) 2002.
An Empirical Study of Operating Systems Errors (postscript) (PDF) ,
Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler
Appeared in SOSP 01.

Metacompilation (MC)

A System and Language for Building System-Specific, Static Analyses (postscript) (PDF),
Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler
Appeared in PLDI 2002. The best description of our MC system. Focuses on interprocedural analysis, ranking, and simple path sensitivity to suppress false paths.
Using Programmer-Written Compiler Extensions to Catch Security Holes (postscript) (PDF)
Ken Ashcraft and Dawson Engler
Appeared in IEEE Security and Privacy 2002. Uses metacompilation extensions to find over 100 security holes in Linux and BSD.
How to Write System-specific, Static Checkers in Metal (postscript) (PDF),
Benjamin Chelf, Seth Hallem, and Dawson Engler
Invited paper for PASTE'02. A simple tutorial on writing extensions. Does not introduce new ideas, but gives concrete examples that helps understand some of the previous papers.
Using Meta-level Compilation to Check FLASH Protocol Code (postscript) (PDF) ,
Andy Chou, Benjamin Chelf, Dawson Engler, and Mark Heinrich
This paper appeared in ASPLOS 2000. It is a case study of using MC to check the Stanford FLASH machine's embedded cache coherence protocol code.
Interface Compilation: Steps toward Compiling Program Interfaces as Languages,
by Dawson R. Engler
This paper appeared in Transactions on Software Engineering. It proposes and explores the metaphor of metacompilation, and provides the beginnings of a programming methodology for exploiting it. It is a more developed version of the paper below.
Incorporating application semantics and control into compilation,
by Dawson R. Engler
This paper appeared in the First Conference on Domain-specific Languages in 1997. It contains much of the ideology that drove the later MC work. For an abstract, click here.

Exokernel Papers

Exokernel: An Operating System Architecture for Application-Level Resource Management,
Dawson R. Engler, M. Frans Kaashoek and James W. O'Toole
This paper appeared in SOSP95 ( slides are available here). It is the most detailed description of what an exokernel is, but is rather dense. A more long-winded version of this paper is available in the form of my masters thesis (though the material is about 6 months behind that in the SOSP paper).
Fast and flexible application-level networking on exokernel systems,
by Gregory R. Ganger, Dawson R. Engler, M. Frans Kaashoek, Hector Brice~no, Russell Hunt, Thomas Pinckney. Appeared in TOCS 20(1): 49-83 (2002)
Application Performance and Flexibility on Exokernel Systems,
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, H'ector M. Brice~no, Russell Hunt, David Mazi`eres, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie, Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP). October, 1997.

This paper validates the earlier SOSP paper, which was big on religion and microbenchmarks and a bit light on application numbers. It also has a cool use of deterministic Turing machines that lets an exokernel trust applications to track what they own without understanding how they do so. Slides are available here.
Exterminate All Operating System Abstractions,
Dawson R. Engler and M. Frans Kaashoek
This is a flamy paper describing the philosophical underpinnings of the exokernel approach. It appeared in HotOS-V

Dynamic code generation Papers

The following papers discuss several dynamic code generation systems.

Reverse-Engineering Instruction Encodings
by Wilson C. Hsieh, Dawson R. Engler, and Godmar Back. This paper appeared in the proceedings of the 2001 USENIX Annual Technical Conference. It uses a cool trick to automatically extract instruction encodings from a system assembler. It's genesis was the fact that writing a binary encoder for the x86 is an unredeemably disgusting experience.
vcode: a retargetable, extensible, very fast dynamic code generation system,
by Dawson R. Engler
This paper appeared in PLDI '96. Slides are available here .
vcode is a portable system to generate executable code at runtime. It generates code in approximately 10 instructions per generated instruction, and is easily extendible by clients. A tutorial describing it can be obtained by clicking here. A beta version of the system is also available. To get on the vcode mailing list, please email engler@lcs.mit.edu. Click here for a bit more information.
`C and tcc: A Language and Compiler for Dynamic Code Generation ,
by Massimiliano Poletto, Wilson C. Hsieh, Dawson Engler, and M. Frans Kaashoek.
This paper appeared in ACM Transactions on Programming Languages and Systems, 21(2), March 1999. Reprinted in Taha, W. and T. Sheard: 1997, `Multi-Stage Programming with Explicit Annotations'. in (ACM, 1997), pp. 203--217.
`C: A Language for High-Level, Efficient, and Machine-independent
Dynamic Code Generation,
by Dawson R. Engler, Wilson C. Hsieh and M. Frans Kaashoek.
This paper appeared in POPL 1996. Slides are available here.
tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation
by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
This paper appeared in PLDI 1997. A preliminary release of the tcc compiler can be obtained from the tickc homepage.
tcc: A Template-Based Compiler for `C
by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
This paper appeared in WCSSS 1996. A preliminary release of the tcc compiler can be obtained from the tickc homepage.
DCG: An Efficient, Retargetable Dynamic Code Generator,
by Dawson R. Engler and Todd A. Proebsting.
This paper appeared in ASPLOS 94 ( click here for slides). The system it describes has been superseded by vcode.

Networking papers

The following three papers discuss a network system built on top of the exokernel.

ASHs: Application-specific handlers for high-performance messaging,
by Deborah A. Wallach, Dawson R. Engler, and M. Frans Kaashoek.
The paper appeared in ACM Communication Architectures, Protocols, and Applications (SIGCOMM '96). Portions of its implementation are more thoroughly explored in:
Design and Implementation of a Modular, Flexible, and Fast System for Dynamic Protocol Composition, by Dawson R. Engler, Deborah A. Wallach, and M. Frans Kaashoek.
DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation,
by Dawson R. Engler and M. Frans Kaashoek.
The paper appeared in ACM Communication Architectures, Protocols, and Applications (SIGCOMM '96). Slides from the talk are available here. A beta release of the system is available here. DPF is described more fully here.

.
engler WHERE stanford DOM edu