Raft Project 1 Review/Discussion (Winter 2019)

Lecture Notes for CS 190
Winter 2019
John Ousterhout

Deep Classes

Small project, so not many opportunities for deep classes.
Overall goal: make as much of the code as possible general-purpose
Communication subsystem
- Most common problem: not encapsulating as much functionality as possible, resulting in shallower classes and information leakage.
- Asynchronous I/O vs. multiple threads?
  - Message-based, not streams; streams are awkward, messages make async I/O easier
  - No connections to worry about opening/closing
  - TCP's reliability is neither necessary nor sufficient
  - Only problem with UDP: limited message length
- Messages: one struct for all, or separate structs for each type?
Persistent state (small class)
- Most projects tied to Raft state
- Exceptions: PersistentInt (but problematic for other reasons)
- I used PersistentString in my implementation
State machine
- Separate code for each state
  - Results in a lot of code duplication
  - Awkward: state changes in the middle of processing a message.
- Separate code for each message type

Several projects had a scary number of mutexes
First, ask: what concurrency is needed? What are we trying to accomplish?
Will the choice of synchronization mechanism have a big impact on performance?
What is the simplest synchronization mechanism that will meet needs?
In general, use the coarsest-grain locking that will meet your performance and other needs.

Two ways to implement timers
- Separate timer mechanism in its own thread(s)
- Timeouts on I/O operations
Starting and stopping is awkward
- Can use condition variables with timeouts
Separate timers for elections and heartbeats can also be awkward
- Must keep track of which is running
Awkward interfaces: relative vs. absolute time
My opinion: cleanest to separate the timers from the messaging interface.
Can't use second-granularity timers

Common problems:
- Not enough error checks
- Not handled in the best way
- Not enough info in log messages
In general, unsafe to assume anything about information coming from outside the process
- Contents of files holding persistent data (e.g. std::stoi).
- Message formats
Must check results of every kernel call
What to do when an error occurs? First, think about how it is likely to be handled. See Example E1.
Don't exit in low-level methods
- Limits generality
- Bad for unit testing
- Instead, throw exception
Define specific exception types: don't just use std::exception (consider likely usage)
All threads should have top-level exception handlers: catch, log, exit
Logging is essential:
- Log as often as you can possibly afford
- Include as much information in the log message as possible (Example E2)

It's easy to end up with code that's 5x slower than need be
Suggestion:
- Learn what things are fast and slow.
- Look for code that's simple but uses approaches that are inherently fast
Examples of things that are slow:
- Object copying (especially messages, which can be large)
- Rewriting persistent storage on every message
- Opening a new socket for each message
- Storage allocation (e.g. for buffer copies)
- Unordered map vs. vector
- String parsing and formatting

Interface comments:
- In header file vs. code file
- Private vs. public methods
- Multiple headers (interface vs. implementation?)
Documentation: see examples D*
Bad variable names: see examples V*