Demand Paging

Lecture Notes for CS 140
Winter 2012
John Ousterhout

Readings for this topic from Operating System Concepts: Sections 9.1-9.2, Sections 9.4-9.5.

Demand paging: not all of a process's virtual address space needs to be loaded in main memory at any given time. Each page can be either:
- In memory (physical page frame)
- On disk (backing store)

Page Faults

What happens when a process references a page that is in the backing store?
- For pages in the backing store, clear the exists bit in the page table entries.
- If exists is not set, then a reference to the page causes a trap to the operating system.
- These traps are called page faults.
- To handle a page fault, the operating system
  - Finds a free page frame in memory
  - Reads the page in from backing store to the page frame
  - Updates the page table entry, setting exists
  - Resumes execution of the thread
How does the OS figure out which page generated the fault?
- x86: hardware saves the virtual address that caused the fault (CR2 register)
- On some platforms OS gets only address of faulting instruction, must simulate the instruction and try every address to find the one that generated the fault
Restarting process execution after a page fault is tricky, since the fault may have occurred in the middle of an instruction.
- If instructions are idempotent, just restart the faulting instruction (hardware saves instruction address during page fault).
- Non-idempotent instructions are more difficult to restart:
```
MOV +(SP), R2
```
- Without hardware support it may be impossible to resume a process safely after a page fault. Hardware must keep track of side effects:
  - Undo all side effects during a page fault?
  - Save info about side effects, use it to restart instruction "in the middle"
- Instruction set design for restartability:
  - Simple instructions are easier to restart.
  - Delay side effects until after all memory references.
  - Difficult to add restartability after the fact.

Page Fetching

Once the basic page fault mechanism is working, the OS has two scheduling decisions to make:
- Page fetching: when to bring pages into memory.
- Page replacement: which pages to throw out of memory.
Overall goal: make physical memory look larger than it is.
- Locality: most programs spend most of their time using a small fraction of their code and data.
- Keep in memory the information that is being used.
- Keep unused information on disk.
- Ideally: paging produces a memory system with the performance of main memory and the cost/capacity of disk!
Most modern OSes use demand fetching:
- Start process with no pages loaded, don't load a page into memory until it is referenced.
- The pages for a process divide into three groups:
  - Read-only code pages: read from the executable file when needed.
  - Initialized data pages: on first access, read from executable file. Once loaded, save to a separate paging file since contents may have changed.
  - Uninitialized data pages: on first access, just clear memory to all zeros. When paging out, save to the paging file.
Prefetching: try to predict when pages will be needed and load them ahead of time to avoid page faults.
- Requires predicting the future, so hard to do.
- One approach: when taking a page fault, read many pages instead of just one (wins if program accesses memory sequentially).

Page Replacement

Once all of memory is in use, will need to throw out one page each time there is a page fault.
Random: pick any page at random (works surprisingly well!)
FIFO: throw out the page that has been in memory longest.
MIN: The optimal algorithm requires us to predict the future.
Least Recently Used (LRU): use the past to predict the future.
Strange but true: for some placement policies, such as FIFO, adding more memory can sometimes cause paging performance to get worse. This is called "Belady's Anomaly" after Les Belady, who was the first person to notice it.
Implementing LRU: need hardware support to keep track of which pages have been used recently.
- Perfect LRU?
  - Keep a register for each page, store system clock into that register on each memory reference.
  - To choose page for placement, scan through all pages to find the one with the oldest clock.
  - Hardware costs would have been prohibitive in the early days of paging; also, expensive to scan all pages during replacement.
- In practice nobody implements perfect LRU. Instead, we settle for an approximation that is efficient. Just find an old page, not necessarily the oldest.
Clock algorithm (called second chance algorithm in Silberschatz et al.): keep reference bit for each page frame, hardware sets the reference bit whenever a page is read or written. To choose page for placement:
- Start with FIFO approach: cycle through pages in order circularly.
- If the next page has been referenced, then don't replace it; just clear the reference bit and continue to the next page.
- If the page has not been referenced since the last time we checked it, then replace that page.
Dirty bit: one bit for each page frame, set by hardware whenever the page is modified. If a dirty page is replaced, it must be written to disk before its page frame is reused.
The clock algorithm typically gives additional preference to dirty pages. For example, if the reference bit for a page is clear, but the dirty bit is set, don't replace this page now, but clear the dirty bit and start writing the page to disk.
Free page pool: some systems keep a small list of clean pages that are available immediately for replacement.
- During replacement, take the page that has been in the free pool the longest, then run the replacement algorithm to add a new page to the free pool.
- Pages in the free pool have their exists bit off, so any references to those pages cause a page fault
- If a page fault occurs for a page in the free pool, remove it from the free pool and put it back in service; much faster than reading from disk.
- Provides an extra opportunity for recovery if we make a poor page replacement decision.
How to implement page replacement when there are multiple processes running in the system?
- Global replacement: all pages from all processes are lumped into a single replacement pool. Each process competes with all the other processes for page frames.
- Per-process replacement: each process has a separate pool of pages. A page fault in one process can only replace one of that process's frames. This relieves interference from other processes.
- Unfortunately, per-process replacement creates a new scheduling dilemma: how many page frames to allocate to each process? If this decision is made incorrectly, it can result in inefficient memory usage.
- Most systems use global replacement.

CS 140: Operating Systems (Winter 2012)

Demand Paging

Page Faults

Page Fetching

Page Replacement