Data Management

Lecture Notes for CS349W
Fall Quarter 2008
John Ousterhout

Traditional GUI applications:
- Application-specific file format on disk
  - Often binary (not a good idea)
  - Recently, movement towards XML
- Application reads file into memory during startup
- Data kept entirely in memory while application runs
  - Application-specific data structures provide fast access
  - Data in memory is volatile: can be lost after a crash
- Making data durable:
  - Explicit user "save" operation: regenerate entire disk file
  - Auto-save at regular intervals
  - Undo-redo log:
    - Flush to disk after every operation
    - Delete log during normal application exit
    - Check for log during startup: presence of log means application didn't exit normally
    - Replay log to recover work lost during crash
- Problems with this approach for Web applications:
  - Designed for single-user applications
  - Stateful (vs. stateless trend for Web applications)
  - How does this scale over multiple servers?
  - How does it handle concurrency from multiple users?
Web applications today use relational databases for back-end storage:
- Automatically get ACID properties: durability, support for concurrent access by multiple users, etc.
- Relational model is general enough to support virtually anything.
- Web servers remain stateless:
  - Each request executes database queries to retrieve the data it needs.
  - If a request makes updates they are committed to the database by the end of the request.
  - Easily scales to multiple servers, each communicating with the database via a network.
- Problems with RDBMS: performance and scalability
  - Performance:
    - RDBMS may have to fetch data from disk for every request.
      - Possible solution: caching in main memory
        By database itself
        By special caching service such as memcached
    - Database storage structures and queries are very flexible and general, but not optimized for any particular application or algorithm:
      - Pointer-based structures (lists, trees, etc.) seem particularly difficult to implement efficiently
    - RDBMS's grew up in a world of forms where response time of 0.5-5 seconds was fine and complex operations could take much longer
    - Highly interactive applications need response time in 10's of milliseconds; is this beyond the capability or RDBMSs?
      - ElectricCommander experience:
        Data retrieval time for not request almost always >= 250ms.
        Complex requests take 1-30 seconds.
        Significant effort spent tuning queries to reduce response times >> 10 seconds.
        Example: scrolling the jobs page.
  - Scalability: as the number of front-end servers scales, the throughput of the RDBMS must scale to match
    - If the RDBMS runs on a single machine, it will eventually become the bottleneck
    - RDBMS don't appear to scale across networked machines very well: perhaps 5-10 machines, not 100-1000?
    - The relational model tends to fragment data across multiple rows and tables, making it hard to achieve locality
      - Every server involved in every request?
  - In practice, every large-scale Web application has scalability problems with its database backend (Ebay, Facebook, Amazon, etc.)
    - Partition database manually into multiple instances
    - Hard to manage
    - Example: in early 2008 Facebook had 100,000 logical databases on 1800 MySQL instances to support 35000 page views/sec (doubling every few months)
- There seems to be general agreement that RDBMSs aren't working well for large-scale Web applications, as well as talk that RDBMSs are running out of steam overall.
  - A technology that has passed its peak?
  - If so, what will replace it?
  - Alternatives beginning to appear:
    - Amazon Dynamo: key-value store
    - Google BigTable
Some thoughts on the ideal storage system:
- Scalability:
  - Based on commodity parts (e.g., 1RU rackmounts)
  - Scale to 1000's of networked servers
- Should be main-memory based:
  - Cost today: $100/GByte marginal cost (cheaper than disk 10 years ago)
  - 1000 servers @ 32 GBytes each:
    - 32 TBytes!
    - $4-5M
  - Example estimate: Amazon
    - 400M orders/year? ($4B revenue/$40 average order size)
    - 1000 bytes/order? (wild guess)
    - 400 GBytes of order data per year
  - Example estimate: United Airlines
    - 200M passengers/year? (3,500 flights/day?, 150 passengers/flight?)
    - 1000 bytes/passenger/flight? (wild guess)
    - 200 GB passenger data/year
  - Counter-example: Facebook photos
    - Photo uploads/day: 24M
    - Size of photo: 100K-1Mbyte?
    - Total photo bytes/year: 1000-10000 Terabytes
  - Change in philosophy:
    - Data quote lives" in main memory
    - Disk is for backup only, not for normal accesses. Use undo/redo log for durability?
    - Build in replication across servers
    - Design system for delivering data at memory speeds
    - Performance will be limited by server's basic network packet processing costs
- Need some way to induce locality:
  - Can't afford for every Web page to reference all 1000 data servers
Latency:
- For widely available Web applications, an issue regardless of back-end storage:
  - Still have to communicate between browser and Web server.
  - Cross-country round trip time is 60-70ms best case (speed of light).
  - Can reduce latency with distributed data centers
- Techniques for hiding latency:
  - Move functionality to browser
    - Only works if all relevant data is on the browser
    - Best for micro-interactions such as dragging and dropping (lots of interaction, small amount of data)
  - Prefetch data (e.g., Google Maps)
    - Increases server load to reduce latency
    - Works best if space of likely next data accesses is small
- In general, across all of Computer Systems, latency is a bigger problem than bandwidth.