Dawson Engler
Dawson Engler
Associate Professor
Computer Science and Electrical Engineering
Gates Building 3A-314
353 Serra Mall
Stanford University
Stanford, CA 94305-9030
engler WHERE stanford DOM edu
I am a joint EE/CS associate professor. Before that, I was an
irresponsible
graduate
student in
Frans Kaashoek's PDOS
group at MIT's Lab for Computer Science, where I co-founded
the exokernel operating system
project, which formed the basis of my thesis work.
I now I have my own students,
who
sometimes listen to me:
I am currently looking a few additional students. If you know
how to hack, we should meet.
A few awards:
Two good recent papers:
A few billion lines of code later: using static analysis to find bugs in the real world
.
Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, Dawson Engler. Communications of the ACM archive Volume 53 , Issue 2 (February 2010) Pages: 66-75.
According to ACM this is their most downloaded paper ever.
I wrote it, but everyone else did all (I
mean: ALL) of the technical work. It documents the oft-times bizarre
things that happened when we took our static bug finding research and
commercialized it. It's a light read, full of stuff you are glad happened
to other people. Someone other than me did a final edit pass, doing
odd things to some transitions and neutering some language, if you want
the sharper-edged draft, send
email. (I think the information content is about the same in both.) I have
30min, 60min, and 90min talks based on it that I'm always happy to give to avoid
real work.
My research focuses on effective, automatic methods for finding
lots of bugs in real code. We have used three main approaches:
system-specific static analysis, implementation-level model checking, and
most recently, using symbolic execution to automatically generate
high-coverage tests.
The following papers
give a reasonable overview of the first two approaches (the Klee paper
above describes that last):
-
eXplode: a Lightweight, General System for Finding Serious
Storage System Errors, Junfeng Yang, Can Sar, and Dawson Engler,
Proceedings of the 7th Symposium on Operating System Design and
Implementation, 2006. (postscript,
PDF)
-
Bugs as Deviant Behavior: A General Approach
to Inferring Errors in Systems Code
(postscript)
(PDF).
Slides:
PS,
PDF,
PPT.
Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf.
Appeared in: Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, 2001.
This paper has some of my favorite ideas, though unfortunately I can't
say the same for the writing. If a static checking person was going to
read just one of our papers this should be it: I've used its ideas in
every static checker I've built since writing it, typically more
than once. I've never seen a checker that could not be improved by
using its trick of belief analysis to infer what to check or what the
state of a checked system is based on
what programmers seem to believe.
-
Checking System Rules Using
System-Specific, Programmer-Written Compiler Extensions (Best Paper)
(postscript)
(PDF).
Slides:
PS,
PDF,
PPT.
Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Appeared
in: Proceedings of the 4th Symposium on Operating System Design and
Implementation.
This paper
discusses a set of small extensions that found
roughly 500 bugs in Linux, OpenBSD, and the Xok exokernel. The extensions
were usually less than 100 lines. Our current static checking systems
are much more powerful, but the basic approach is roughly the same.
I co-founded a company, Coverity,
several years ago with three of my students
(Seth Hallem, Andy Chou and Ben Chelf) to commercialize this static
checking work. Coverity has 500+ customers and 130+ employees.
A free trial of the tool can be obtained here.
This paper describes the main thing I did before bug finding:
Some recent talks
-
EXEcution generated Executions: Automatically generating inputs of death.
(postscript,
PDF,
PPT). A 30 minute talk on our latest project:
automatically generating concrete inputs that blow up real code.
-
Weird things that surprise academics trying to commercialize
a static checking tool.
(postscript,
PDF,
PPT). This is part of
an invited talk at SPIN05 and CONCUR05.
-
Using model checking and execution generated tests to find bugs in
real code
(postscript,
PDF,
PPT).
Invited talk, Usenix Security 2005.
-
Static analysis versus model checking for bug finding
(postscript,
PDF,
PPT).
Keynote, SoftMC 2003.
-
Finding bugs with system-specific static analysis
(postscript,
PDF,
PPT). Keynote, PASTE 2002.
-
How to find lots of bugs with system-specific static analysis.
(postscript,
PDF,
PPT). Distinguished lecture, SUNY Stonybrook, 2001.
Below are the rest of our papers...
Automatic generation of high-coverage tests.
-
Under-constrained execution: making automatic code destruction easy and
scalable,
Dawson Engler and Daniel Dunbar,
International Symposium on Software Testing and Analysis (ISSTA),
2007.
(PDF)
-
EXE: Automatically Generating Inputs of Death,
(postscript)
(PDF)
Cristian Cadar,
Vijay Ganesh,
Peter M. Pawlowski,
David L. Dill,
Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006.
-
Automatically generating malicious disks using symbolic execution,
Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler.
(PDF) (postscript),
IEEE Security and Privacy, 2006.
This paper uses a system we built, EXE, to automatically generate disk
images of death that, when mounted, will cause a file system to crash.
Many systems now let untrusted users mount files as file systems, which
opens them up to all sorts of exploits, given the somewhat erratic input
checking done by current file system code.
-
Automatic test case generation by executing code on symbolic inputs.
A technical report containing our SOSP05 submission gives preliminary
results: Execution generated test cases: how to make systems code crash
itself, CSTR-2005-04 (pdf). The trick
is simple and cute. A shortened version appeared as an invited paper
in SPIN05 (pdf). The two EXE papers given above
are probably better to read.
Software model checking
-
Using Model Checking to Find Serious File System Errors (Best Paper)
(postscript)
(PDF),
Junfeng Yang, Paul Twohey, Dawson Engler, and
Madanlal Musuvathi, Operating System Design and Implementation (OSDI) 2004.
Describes FiSC, a file system model checking tool. We applied it to
three widely-used, heavily-tested file systems: ext3, JFS, and ReiserFS.
We found serious bugs in all of them, 32 in total. Most have led to
patches within a day of diagnosis. For each file system, FiSC found
demonstrable events leading to the unrecoverable destruction of metadata
and entire directories, including the file system root directory ``/''.
The actual bugs we found are
here.
-
Model-checking large network protocol implementations
(postscript)
(PDF),
Dawson Engler and
Madanlal Musuvathi. Proceedings of the First Conference
on Network System Design and Implementation (NSDI), 2004.
Describes the challenges
in model checking
50K lines of TCP code. One of the surprising
results was that it was easier to run the entire Linux kernel in Madan's
CMC model checker than extract out TCP in a stand-alone version.
-
Static analysis versus software model checking for bug finding
(postscript)
(PDF). Extended version with better
formatting:
(postscript)
(PDF),
Dawson Engler and
Madanlal Musuvathi.
Invited paper for VMCAI04. It describes some of our experiences finding
bugs with both model checking and static analysis, including some of
the less pleasant aspects of both. Slides are available:
postscript,
PDF,
PPT.
The paper is a reworked version of a invited paper for the SoftMC
2003 workshop:
"Some lessons from using static analysis and software model checking for bug finding"
(postscript)
(PDF), Madanlal Musuvathi and Dawson R. Engler.
The writing could be better, but some of the tables are reasonable.
Slides are available:
postscript,
PDF,
PPT.
-
CMC: A pragmatic approach to model checking real code
(postscript)
(PDF)
,
Madanlal Musuvathi, David Y.W. Park, Andy Chou, Dawson R. Engler,
David L. Dill.
Appeared in OSDI 2002. A model checker that can
check C code directly,
dispensing with the need to write a specification. We applied it
to three
different AODV routing protocol implementations, where it found roughly
one bug per 300 lines of code.
-
A Simple Method for Extracting Models from Protocol Code
(postscript)
(PDF)
,
David Lie, Andy Chou, Dawson Engler, and David Dill
Appeared
in ISCA 2001. It
shows how to check deeper properties than possible with static analysis
by
using
MC to automatically extract specifications (models) from actual C code and then running these models through a formal verifier.
Using Statistical Analysis for Bug Finding
-
From Uncertainty to Belief: Inferring the
Specification Within,
Ted Kremenek, Paul Twohey, Godmar Back, Andrew Ng, Dawson Engler,
Proceedings of the 7th Symposium on Operating System Design and
Implementation, 2006.
(PDF)
-
Correlation Exploitation in Error Ranking
(postscript)
(PDF)
,
Ted Kremenek, Ken Ashcraft, Junfeng Yang and Dawson Engler. To appear:
Foundations of Software Engineering (FSE) 2004. This paper demonstrates a
error ranking scheme that uses machine learning to
reduce the observed
false positive rate from 2 to 8 times as compared to random ranking.
It exploits the fact that both true error and false positives cluster.
I.e., given a group of related error messages, if one is a true error
it is likely that the others are bugs as well. Similarly, if one is a
false positive, it is likely the others are false positives as well.
-
Z-Ranking: Using Statistical Analysis to Counter the Impact of
Static Analysis Approximations
(postscript)
(PDF)
,
Ted Kremenek and Dawson Engler. Appeared in SAS 2003. A simple
technique for detecting when static analysis makes a mistake.
The basic
intuition: the most trustworthy analysis decisions are those
that lead to many successful checks and relatively few errors. Analysis
mistakes tend to flag many errors and have few successful checks.
(Reasonably) generic static analysis for bug finding
-
MECA: an Extensible, Expressive System and Language for Statically Checking Security Properties (PDF)
Junfeng Yang, Ted Kremenek, Yichen Xie, and Dawson Engler. Proceedings of
the 10th ACM conference on Computer and communication security (ACM CCS),
2003. Flexible annotations plus methods for propagating them in a way
that does not require much manual labor nor kill you with false positives.
-
RacerX: Effective, Static Detection of Race Conditions and Deadlocks
(postscript)
(PDF)
Dawson Engler and Ken Ashcraft. Appeared in SOSP 2003. Has about 20 tricks
or so (unlockset analysis is the coolest) to help find racers and deadlocks.
Slides are available:
PS,
PDF,
PPT.
-
ARCHER: Using Symbolic, Path-sensitive Analysis to Detect Memory Access Errors
(postscript)
(PDF)
Yichen Xie, Andy Chou, and Dawson Engler.
To appear in FSE 2003.
-
Using Redundancies to Find Errors (Award Paper)
(postscript)
(PDF)
Yichen Xie and Dawson Engler
To appear in TSE. Finds funny bugs by
looking for redundant operations (dead code, unused assignments, etc.).
From empirical measurements, code with such redundant errors is 50-100%
more likely to have hard errors. Also describes how to check for
redundancies to find holes in specifications.
It is an expanded version of an identically titled award
paper
(postscript)
(PDF) that appeared in Foundations of
Software Engineering (FSE) 2002.
-
An Empirical Study of Operating Systems Errors
(postscript)
(PDF)
,
Andy Chou, Junfeng Yang, Benjamin Chelf, Seth Hallem, and Dawson Engler
Appeared in SOSP 01.
Metacompilation (MC)
-
A System and Language for Building System-Specific, Static Analyses
(postscript)
(PDF),
Seth Hallem, Benjamin Chelf, Yichen Xie, and Dawson Engler
Appeared in PLDI 2002.
The best description of our MC system. Focuses on
interprocedural analysis, ranking, and simple path sensitivity
to suppress false paths.
-
Using Programmer-Written Compiler Extensions to Catch Security Holes
(postscript)
(PDF)
Ken Ashcraft and Dawson Engler
Appeared in IEEE Security and Privacy 2002. Uses metacompilation
extensions to find over 100 security holes in Linux and BSD.
-
How to Write System-specific, Static Checkers in Metal
(postscript)
(PDF),
Benjamin Chelf, Seth Hallem, and Dawson Engler
Invited paper for PASTE'02.
A simple tutorial on writing extensions. Does not introduce new
ideas, but gives concrete examples that helps understand some of
the previous papers.
-
Using Meta-level Compilation to Check FLASH Protocol Code
(postscript)
(PDF)
,
Andy Chou, Benjamin Chelf, Dawson Engler, and Mark Heinrich
This paper appeared in ASPLOS 2000.
It is a case study of using MC to check the Stanford FLASH machine's
embedded cache coherence protocol code.
-
Interface Compilation: Steps toward Compiling Program Interfaces as
Languages,
by Dawson R. Engler
This paper appeared in Transactions on Software Engineering.
It proposes and explores the metaphor of metacompilation, and
provides the beginnings of a programming methodology for exploiting it.
It is a more developed version of the paper below.
-
Incorporating application semantics and control into compilation,
by Dawson R. Engler
This paper appeared in the First Conference on Domain-specific Languages
in 1997.
It contains much of the ideology that drove the later
MC work.
For an abstract, click here.
Exokernel Papers
-
Exokernel: An Operating System Architecture for Application-Level Resource Management,
Dawson R. Engler, M. Frans Kaashoek and James W. O'Toole
This paper appeared in SOSP95 ( slides are
available here). It is the most detailed description of
what an exokernel is, but is rather dense.
A more long-winded version of this paper is available in the form of
my masters thesis (though the material is
about 6 months behind that in the SOSP paper).
-
Fast and flexible application-level networking on exokernel systems,
by Gregory R. Ganger, Dawson R. Engler, M. Frans
Kaashoek, Hector Brice~no, Russell Hunt, Thomas Pinckney. Appeared
in TOCS 20(1): 49-83 (2002)
-
Application Performance and Flexibility on Exokernel Systems,
M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger,
H'ector M. Brice~no, Russell Hunt, David Mazi`eres, Thomas Pinckney,
Robert Grimm, John Jannotti, and Kenneth Mackenzie,
Proceedings of the Fifteenth ACM Symposium on
Operating Systems Principles (SOSP). October, 1997.
This paper validates the earlier SOSP paper, which was big on religion
and microbenchmarks and a bit light on application numbers.
It also has a cool use of deterministic Turing machines that lets an
exokernel trust applications to track what they own without understanding
how they do so. Slides are available
here.
-
Exterminate All Operating System Abstractions,
Dawson R. Engler and M. Frans Kaashoek
This is a flamy paper describing the philosophical underpinnings of the
exokernel approach. It appeared in HotOS-V
Dynamic code generation Papers
The following papers discuss several dynamic code generation systems.
-
Reverse-Engineering Instruction Encodings
by Wilson C. Hsieh, Dawson R. Engler, and Godmar Back. This paper
appeared in the proceedings of the 2001 USENIX Annual Technical
Conference. It uses a cool trick to automatically extract
instruction encodings from a system assembler. It's genesis was
the fact that writing a binary encoder for the x86 is an
unredeemably disgusting experience.
-
vcode: a retargetable, extensible, very fast dynamic code generation system,
by Dawson R. Engler
This paper
appeared in PLDI '96. Slides are available
here .
vcode is a portable system to generate executable code at runtime. It
generates code in approximately 10 instructions per generated instruction,
and is easily extendible by clients. A tutorial describing it can be
obtained by clicking here.
A beta version of the system is also
available.
To get on the vcode mailing list, please email engler@lcs.mit.edu.
Click
here for a bit more information.
-
`C and tcc: A Language and Compiler for Dynamic Code Generation
,
by Massimiliano Poletto, Wilson C. Hsieh, Dawson Engler, and M. Frans Kaashoek.
This paper appeared in ACM Transactions on Programming Languages and Systems, 21(2), March 1999. Reprinted in Taha, W. and T. Sheard: 1997, `Multi-Stage Programming with Explicit Annotations'. in (ACM, 1997), pp. 203--217.
-
`C: A Language for High-Level, Efficient, and Machine-independent
Dynamic Code Generation,
by Dawson R. Engler, Wilson C. Hsieh and M. Frans Kaashoek.
This paper appeared in POPL 1996. Slides are available
here.
-
tcc: A System for Fast, Flexible, and High-level Dynamic Code Generation
by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
This paper appeared in PLDI 1997. A preliminary release of the tcc
compiler can be obtained from
the tickc homepage.
-
tcc: A Template-Based Compiler for `C
by Massimiliano Poletto, Dawson R. Engler and M. Frans Kaashoek.
This paper appeared in WCSSS 1996. A preliminary release of the tcc
compiler can be obtained from
the tickc homepage.
-
DCG: An Efficient, Retargetable Dynamic Code Generator,
by Dawson R. Engler and Todd A. Proebsting.
This paper appeared in ASPLOS 94 (
click here for slides). The system it describes
has been superseded by vcode.
Networking papers
The following three papers discuss a network
system built on top of the exokernel.
-
ASHs: Application-specific handlers for high-performance messaging,
by
Deborah A. Wallach,
Dawson R. Engler,
and
M. Frans Kaashoek.
The paper appeared in
ACM Communication Architectures, Protocols, and Applications (SIGCOMM '96).
Portions of its implementation are more thoroughly explored in:
-
Design and Implementation of a Modular, Flexible, and Fast System for
Dynamic Protocol Composition,
by
Dawson R. Engler,
Deborah A. Wallach,
and
M. Frans Kaashoek.
-
DPF: Fast, Flexible Message Demultiplexing using Dynamic Code Generation,
by
Dawson R. Engler
and
M. Frans Kaashoek.
The paper appeared in
ACM Communication Architectures, Protocols, and Applications (SIGCOMM '96).
Slides from the talk are available here.
A beta release of the system is available here.
DPF is described more fully
here.
.
engler WHERE stanford DOM edu