Virtual Memory

Lecture Notes for CS 140
Winter 2012
John Ousterhout

  • Readings for this topic from Operating System Concepts: Sections 8.1-8.7.
  • How can one memory be shared among several concurrent processes?
  • Simple uniprogramming (no sharing):
    • Highest memory holds OS.
    • Process is allocated memory starting at 0, up to the OS area.
    • When loading a process, just bring it in at 0.
    • Examples: early batch monitors where only one job ran at a time. It could corrupt the OS, which would be rebooted by an operator. Some early personal computers were similar.
  • Goals for sharing memory:
    • Multiprogramming: allow multiple processes to be memory-resident at once.
    • Transparency: no process should need to be aware of the fact that memory is shared. Each must run regardless of the number and/or locations of processes.
    • Isolation: processes mustn't be able to corrupt each other.
    • Efficiency (both of CPU and memory) shouldn't be degraded badly by sharing.
  • Load-time relocation:
    • Highest memory holds OS.
    • First process loaded at 0; others fill empty spaces.
    • When a process is loaded, relocate it so that it can run in its allocated memory area, similar to linking:
      • Linker outputs relocation records in executable files
      • OS modifies addresses when it loads process.
    • What are the problems with this approach?

Dynamic Memory Relocation

  • Instead of relocating a program statically when it is loaded, add hardware (memory management unit) that changes addresses dynamically during every memory reference.
  • Each address generated by a process (called a virtual address) is translated in hardware to a physical address. This happens during every memory reference.
  • Results in two views of memory, called address spaces:
    • Virtual address space is what the program sees
    • Physical address space is the actual allocation of memory

Base and Bounds Relocation

  • Two hardware registers:
    • Base: physical address corresponding to virtual address 0.
    • Bounds: virtual addresses >= this are invalid.
  • On each memory reference, virtual address is compared to the bounds register, then added to the base register to produce a physical address. A bounds violation results in a trap to the operating system.
  • Each process appears to have a completely private memory of size equal to the bounds register.
  • Processes are isolated from each other and OS.
  • No address relocation is necessary when a process is loaded.
  • Each process has its own base and bounds values, which are saved in the process control block.
  • OS runs with relocation turned off, so it can access all of memory (a bit in the processor status word controls relocation).
    • Must prevent users from turning off relocation or modifying the base and bound registers (another bit in PSW for user/kernel mode).
  • Problem: how does OS regain control once it has given it up?
  • Base & bounds is cheap (only 2 hardware registers) and fast: the add and compare can be done in parallel.
  • Problem with base and bounds relocation: only one contiguous region per program.
    • How to manage separate stack?
    • How to share memory between processes?

Multiple segments

  • Each process is split among several variable-size areas of memory, called segments.
    • E.g. one segment for code, one segment for heap, one segment for stack.
  • Separate base and bound registers for each segment.
  • Also, add protection bit for each segment: read-write versus read-only.
  • Each memory reference must indicate a segment number and offset:
    • Top bits of address select segment, low bits the offset. This is the most common, and the best.
    • Example: PDP-10 with high and low segments selected by high-order address bit.
    • Or, segment can be selected implicitly by the instruction (e.g. code vs. data, stack vs. data, or 8086 prefixes).
  • Segment table holds the bases and bounds for all the segments of a process.
  • Memory mapping procedure consists of table lookup + add + compare.
  • Advantages of segmentation:
    • More flexibility in managing virtual address spaces (e.g. code and stack)
    • Can swap segments to disk individually.
    • Can share segments between processes (e.g., shared code).
    • Can move segments to compact memory and eliminate fragmentation.
  • Problems with segmentation:
    • Each segment must be allocated contiguously, which results in memory fragmentation.
    • Fixed number of segments still creates limits.

Paging

  • Divide virtual and physical memory into fixed-sized chunks called pages. The most common size today is 4 Kbytes or 8 Kbytes.
  • For each process, a page table defines the base address of each of that process' pages along with read-only and existence bits.
  • Translation process: page number always comes directly from the address. Since page size is a power of two, no comparison or addition is necessary. Just do table lookup and bit substitution.
  • Easy to allocate: keep a free list of available pages and grab the first one. Easy to swap since everything is the same size, which is usually the same size as disk blocks.
  • Problem: for modern machines, page tables can be very large:
    • Consider x86-64 addressing architecture: 64-bit addresses, 4096-byte pages.
    • Most processes are small, so most page table entries are unused.
    • Even large processes use their address space sparsely (e.g., code at the bottom, stack at the top)
  • Solution: multi-level page tables. Intel x86-64 addressing architecture:
    • 64-bit virtual addresses, but only the lower 48 bits are actually used.
    • 4 Kbyte pages: low-order 12 bits of virtual address hold off set within page.
    • 4 levels of page table, each indexed with 9 bits of virtual address.
    • Each page table fits in one page.
    • Can omit empty page tables.
  • Next problem: page tables are too large to load into fast memory in the relocation unit.
    • Page tables kept in main memory
    • Relocation unit holds base address for top-level page table
    • With x86-64 architecture, must make 4 memory references to translate a virtual address!

Translation Lookaside Buffers (TLBs)

  • Solution to page translation overhead: create a small hardware cache of recent translations.
    • Each cache entry stores the page number portion of a virtual address (36 bits for x86-64) and the corresponding physical page number (40 bits for x86-64).
    • Typical TLB sizes: 64-2048 entries.
    • On each memory reference, compare the page number from the virtual address with the virtual page numbers in every TLB entry (in parallel).
    • If there is a match, use the corresponding physical page number.
    • If no match, perform the full address translation and save the information in the TLB (replace one of the existing entries).
    • TLB "hit rates" typically 95% or more.
  • TLB complications:
    • When context switching, must invalidate all of the entries in the TLB (mappings will be different for the next process). Chip hardware does this automatically when the page table base register is changed.
    • If virtual memory mappings change for the current process (e.g. page moved), must invalidate some TLB entries. Special hardware instruction for this.

Miscellaneous Topics

  • How does the operating system get information from user memory? E.g. I/O buffers, parameter blocks. Note that the user passes the OS a virtual address.
    • In some systems the OS just runs unmapped.
      • In this case it reads page the tables and translates user addresses software.
      • Addresses that are contiguous in the virtual address space may not be contiguous physically. Thus I/O operations may have to be split up into multiple blocks.
    • Most newer systems include kernel and user memory in same virtual address space (but kernel memory not accessible in user mode). This makes life easier for the kernel, although it doesn't solve the I/O problem.
  • Another issue with paging: internal fragmentation.
    • Can't allocate partial pages, so for small chunks of information only part of the page will be used
    • Result: wasted space at the ends of some pages
    • Not much of a problem in today's systems:
      • The objects (such as code or stack) tend to be much larger than a page.
      • Percentage wasted space from fragmentation is small.
    • What happens if page sizes grow?