Virtual Machines
Optional readings for this topic from Operating Systems: Principles and Practice: Section 10.2.
What is the abstraction provided by an OS to a process?
- (Virtual) memory
- A subset of the instruction set of the underlying machine
- Most (but not all) of the hardware registers
- A set of kernel calls with particular arguments for file I/O, etc.
- Overall: a subset of the facilities of the underlying machine, augmented with extra mechanisms implemented by the operating system.
What if we implemented a different abstraction for a process, which looks exactly like the underlying hardware:
- The complete instruction set of the underlying machine
- Physical memory
- Memory management unit (page maps, etc.)
- I/O devices
- Traps and interrupts
- No predefined system calls
This abstraction is called a virtual machine:
- To a "process", it appears that it has its own private machine.
- Multiple "processes" can share a single machine, each thinking it's running on its own private machine.
- The operating system for this is called a hypervisor.
- Can run a complete operating system inside a virtual machine: called a guest operating system.
- Each virtual machine can run a different guest operating system.
Implementing hypervisors
One approach: simulation
- Write program that simulates instruction execution.
- Simulate memory, I/O devices also.
- Examples:
- Use one large file to hold contents of a "disk"
- Simulate kernel/user bit, interrupt vectors, etc.
- Problem: too slow
- 100x slowdown for CPU/memory
- 2x slowdown for I/O
Better approach: use CPU to simulate itself.
- Run guest OS in user mode.
- Most instructions execute at the full speed of the CPU.
- Anything "unusual" causes a trap into the hypervisor, which simulates the appropriate behavior.
Special cases:
- Privileged instructions (e.g. HALT):
- Since virtual machine runs in user mode, these cause "illegal instruction" traps into hypervisor.
- Hypervisor catches these traps, simulates appropriate behavior.
- Kernel calls in guest OS (both guest user and guest OS run in user mode):
- User program running under guest OS issues kernel call instruction.
- Traps always go to hypervisor (not guest OS).
- Hypervisor analyzes trapping instruction, simulates
system call to guest OS:
- Move trap info from hypervisor stack to stack of guest OS
- Find interrupt vector in memory of guest OS
- Switch simulated mode to "kernel"
- Return out of hypervisor to interrupt handler in guest OS.
- When guest OS returns from system call, this traps to hypervisor also (illegal instruction in user mode); hypervisor simulates return to guest user level.
- I/O devices:
- Guest OS reads/writes virtual I/O device register
- Hypervisor has arranged for the containing page to fault
- Hypervisor takes page fault, recognizes address as I/O device register
- Hypervisor simulates instruction and its impact on the simulated I/O device
- When actual I/O operation completes, hypervisor simulates interrupt into the guest OS
- For better performance, write new device drivers that call directly into the hypervisor (using system calls): paravirtualization.
- Virtual memory: hypervisor uses page maps to simulate virtual
memory mapping in guest OS.
- Three levels of memory:
- Guest virtual address space
- Guest physical address space
- Machine physical memory: hypervisor must have total control over this
- Today's solution: extended page maps:
- Another layer of address translation.
- Translates from physical addresses (guest-specific) to machine addresses (real memory)
- Hypervisor controls all of the extended page maps, while guest OS controls normal page maps.
- Much simpler and more efficient than shadow page maps.
- Original solution: shadow page maps
- Guest OS creates page maps, but these aren't used by actual hardware.
- Hypervisor manages the real page maps; these are called shadow page maps.
- Hypervisor traps instruction to set the page map base, records info about the guest OS page maps.
- On page faults, hypervisor updates shadow page maps using info from guest OS pages tables and its knowledge of physical memory.
- When guest OS modifies its page maps, guest OS must trap the updates and reflect the changes in the shadow page maps.
- Two kinds of page faults:
- Page not in guest physical memory: hypervisor must pass through to guest OS
- Page in guest physical memory, but not in machine physical memory: hypervisor just updates shadow page map (fault invisible to guest OS)
- Quite tricky, and potentially slow.
- Three levels of memory:
Potential problem:
- Hypervisor must trap any behavior that requires simulation.
- Special memory locations (e.g. page maps)? Use page faults.
- Special instructions? Must trap
- Pathological case:
- Instruction that is valid in both user mode and kernel mode
- But, behaves differently in user mode
- Example: "read processor status" (where kernel/user mode bit is in the status word)
- Virtualizable: a machine with no such special cases
- Until recently, very few machines were completely virtualizable (e.g. x86 wasn't until recently)
Dynamic binary translation: solution for older machines that are not virtualizable:
- Hypervisor analyzes all code executed in virtual machine
- Replaces non-virtualizable instructions with traps
- Very tricky: how to find all code?
- Can use this to run hypervisor as a user-level program
In practice, how much overhead do hypervisors add?
- CPU-bound applications: < 5%
- I/O-bound applications: ~30%
History/usage of virtual machines
Invented by IBM in late 1960's
Original usage:
- One VM per user
- Each user ran a different single-user guest OS
- Single shared hardware platform
Interest waned in the 1980's and 1990's:
- Each user had a private machine
Reinvented, made practical by Mendel Rosenblum and graduate students at Stanford, formed VMware.
Software development:
- Need to test software on different OS versions:
- Keep one VM for each OS version.
- Use a single machine to test all versions.
Datacenters:
- Problem: many machines, each running a single application
- Need separate machines for isolation: application crash could bring down the entire machine
- Most applications only need a fraction of machine's resources.
- Solution: datacenter consolidation
- One VM per application
- Run several VM's on a single machine
- Reduce # of machines
Encapsulation, restart:
- Hypervisor can encapsulate entire state of a VM in a file.
- Can save, continue, restore old state.
- Datacenter example:
- Can migrate VM's between machines to balance load
- Software development:
- Tests may corrupt the state of the machine
- Solution:
- Run tests in a VM
- Always start tests from a saved VM configuration
- Discard VM state after tests
- Results: reproducible tests
Heavily used in cloud computing (e.g. Amazon Web Services, Google Cloud).