Assignment 5: Memory-Mapped Encrypted Files
In this assignment you will implement memory-mapped access to data that is stored in encrypted files. Here's how this will work:
- A region of the virtual address space of a process will be allocated, large enough to hold the contents of a file.
- The file won't initially be read into memory, and there won't even be physical pages assigned to this region.
- If the process attempts to read or write from the file's region in virtual memory, page faults will occur.
- Your code will catch the page faults, allocate physical pages, and read in the file. Only the pages of the file that are actually accessed will be read into memory. Once a page has been loaded into memory, the process will be able to access data in that page without additional faults.
- The process can also modify the file's bytes in memory. When this happens, your code will remember which pages are dirty and write those pages back to memory when the memory-mapped file is closed (or "flushed"). Only pages that were actually modified will be written.
- As an additional twist, the data in the file is stored in encrypted form to thwart eavesdroppers. Information is decrypted when read into memory and re-encrypted when flushed back to disk.
Normally, code like this would be written
inside the operating system kernel. For this assignment, we have used the
Linux mmap
facility to emulate features that are typically available
in the kernel (such as page faults), so you can write your code at user level.
The goal of this assignment is to teach you
about the following concepts:
- How demand paging works
- How page protection can be used to emulate hardware features such as dirty bits
Getting Started
Login to the myth cluster and clone the starter repo with this command:
git clone /afs/ir/class/archive/cs/cs111/cs111.1246/repos/assign5/$USER assign5
This will create a new directory assign5
in your current
directory and clone a Git starter repository into that
directory. Do your work for the assignment
in this directory.
The files mcryptfile.hh
and mcryptfile.cc
contain a skeleton for all
of the code you need to write. You will add to the declarations in
mcryptfile.hh
and fill in the bodies of the methods in
mcryptfile.cc
to implement
the facilities described below. You can also create additional methods or
classes as needed to implement your solution.
The directory also contains a Makefile
; if you type make
, it
will compile your code for both problems together with
a test program test.cc
, producing an executable file
test
. You can invoke ./test
with an argument giving the
name of a test to run (invoke it with no arguments to see a list
of available tests). The same test program is used for both this
assignment and Assignment 6; this assigment will use only the
tests up through update_three_files
.
You can also invoke the command tools/sanitycheck
to run a series of basic
tests on your solution.
Try this now: the starter code should compile but almost all the tests will
fail.
As usual, we do not guarantee that the tests we have provided are exhaustive, so passing all of the tests is not necessarily sufficient to ensure a perfect score (CAs may discover other problems in reading through your code).
Assignment Overview
You will implement a class MCryptFile
that provides the following
methods:
-
MCryptFile(Key key, std::string path)
Constructs anMCryptFile
object that can be used to access data in the file given bypath
. The file is encrypted. When data is read from the file into memory, it will be decrypted usingkey
; when data is written from memory back to the file, it will be encrypted usingkey
.Key
objects can be constructed from strings. If the file doesn't currently exist, a new file will be created. Throws astd::system_error
exception if the file cannot be opened or created. Most of the functionality of this constructor (such as throwing exceptions) is actually implemented in theCryptFile
superclass constructor described below. -
char *map_file(size_t min_size = page_size)
Maps the associated file into a contiguous region of virtual memory and returns the virtual address of the first byte of that region. Bytei
of the decrypted file contents will henceforth be directly accessible at offseti
into the region. Themin_size
argument can be used to create a new file or grow an existing file. The actual size of the virtual region will be the larger ofmin_size
and the original file length; if bytes are written in the region beyond the original file length, the file will be extended whenflush_file
is called. If you want to grow a file beyond the current size of the region, invokeunmap_file
to unmap the file, then invokemap_file
to map it again with a larger size. Note: file sizes are always rounded up to page boundaries. This method does not allocate physical memory for the region or read information in from the file; that should happen later, on demand, as pages in the region are accessed. -
void flush_file()
Encrypts all pages that have been modified since the last call toflush_file
and writes them back to the associated file. All pages currently in memory should remain in memory. This operation may grow the size of the encrypted file in the file system. -
void unmap_file()
Flush any dirty pages and remove the mapping created bymap
. After this method returns, the caller must no longer use any references into the previously mapped region. -
char *map_base()
Returns the address of the first byte of the memory mapped file, ornullptr
if the associated file is not currently mapped. -
size_t map_size()
Returns the current size of the mapped region, or 0 if the associated file is not currently mapped. -
static void set_memory_size(size_t npages)
Invoked to specify how many pages should be in the pool of physical memory that is shared by allMCryptFile
objects. This method should only be invoked before the firstMCryptFile
is created; it will have no effect after that. The test infrastructure will invoke this method as appropriate for the tests being run.
Supporting Code
We have written several classes for you to use in implementing MCryptFile:
CryptFile
The CryptFile
class provides basic mechanisms for reading and writing
encrypted files (but not for memory-mapping them). Your MCryptFile
class
will be a subclass of CryptFile
.
The CryptFile
class has the following methods:
-
CryptFile(Key key, std::string path)
The API for this method is identical to that for theMCryptFile
constructor (see above). -
size_t file_size()
Returns the number of bytes in the file (which is the same as the number of bytes required to hold the decrypted file in memory). -
int aligned_pread(void *dst, size_t len, size_t offset)
Reads information from the file into memory. More precisely, readslen
bytes of data at positionoffset
in the file, decrypts it, and stores the unencrypted information atdst
. Bothlen
andoffset
must be multiples of the AES encryption algorithm's block size (16), which is accessible viaCrypteFile::blocksize
. The block size will not be an issue for this assignment, because you will only be reading and writing full pages aligned on page-size boundaries. Returns the number of bytes read or -1 on error. -
int aligned_pwrite(const void *src, size_t len, size_t offset)
Write information from memory to the associated file. Encryptslen
bytes starting atsrc
and writes them to positionoffset
in the associated file at positionoffset
. Returnslen
on success and -1 on error. Bothlen
andoffset
must be multiples ofCryptFile::blocksize
.
VMRegion
The VMRegion
class provides basic mechanisms for mapping pages into
a region of virtual memory, taking page faults, and managing permissions.
A VMRegion
corresponds roughly to a contiguous range of page map
entries for one process in an operating system.
You will create one VMRegion
object for each mapped MCryptFile.
The VMRegion
class is defined in the header file vm.hh
and
has the following methods:
-
VMRegion(size_t nbytes, std::function<void(char *fault_address)> handler)
Constructs aVMRegion
object and allocates a region of virtual memory in the process that is currently unused. The size of the region will benbytes
; ifnbytes
isn't a multiple of the page size, theVMRegion
will behave as ifnbytes
were rounded up to the next higher multiple of the page size. Ifnbytes
is zero, it will be rounded up to one full page. Thehandler
parameter specifies a function that will be invoked whenever a page fault occurs in the region. Page faults occur whenever an unmapped page is referenced or an attempt is made to write a page that is currently read-only.handler
is invoked with a single argument,fault_address
, giving the virtual address that triggered the page fault. -
VPage get_base()
Returns the address of the first page in the virtual region.VPage
is a type that refers to the first byte of a virtual page in aVMRegion
. It is equivalent tochar *
, so you can add an offset to it to get the address of a value in the middle of a page. You can also add multiples of the page size to the value returned byget_base
to produceVPage
s for other pages in the virtual region. -
size_t get_size()
Returns the total number of bytes in the virtual region. -
void map_page(VPage va, PPage pa, Prot prot)
Sets the mapping for a particular VPage inside a VMRegion, so that accesses to that page will be directed topa
(PPage
s are obtained using thePhysMem
class discussed below). If a different page was previously mapped atVPage
, the old mapping is removed. Theprot
argument specifies what sort of accesses are allowed; it should be eitherPROT_NONE
to prohibit both loads and stores,PROT_READ
to allow loads but not stores, orPROT_READ|PROT_WRITE
(bitwise OR of two values) to allow both loads and stores. This function's behavior is equivalent to setting the contents of a page map entry. -
void unmap_page(VPage va)
Removes the mapping forva
, if there is one; future references to the page will cause page faults. This function's behavior is roughly equivalent to clearing thepresent
bit in a page map entry.
In addition to these methods, VMRegion
also exports a variable
page_size
, which contains the number of bytes in each page on
the current machine. Page sizes are 4096 bytes on the myth
cluster
as well as Windows or MacBook laptops, but you should use the
page_size
variable to ensure portability; you can assume
page_size
will always be an even power of two.
PhysMem and PPages
The PhysMem
class provides a mechanism for allocating and freeing
pages of physical memory. Each physical page is identified with a
PPage
object, which you can pass to methods such as VMRegion::map_page
and VMRegion::unmap_page
.
In addition, a PPage
is a valid virtual address (it is a char *
pointer), which you can use to access the bytes of the page.
Unlike VPage
s, a PPage
is always accessible and writable; references
to it will never generate page faults.
This is useful because it allows your MCryptFile
implementation
to access physical pages that are not currently be mapped as VPages
,
such as when transferring page contents to or from encrypted files.
PPage
s are mapped into virtual memory by the PhysMem
class,
at virtual addresses different from those in VMRegion
s.
This is an example of aliasing, where a single physical page
appears at multiple virtual addresses. The first alias for each
physical page is its PPage
; the second alias is the corresponding
VPage
(if the page has been mapped). In principle you could map a
single PPage
as multiple different VPage
s, but we won't do that
for this assignment: there will be only one VPage
per PPage
.
The PhysMem
class is defined in the header file vm.hh
and
has the following methods:
-
PhysMem(size_t npages)
Allocatesnpages
physical memory pages, each of which may be mapped into anyVMRegion
. -
PPage page_alloc()
Allocates a page and returns itsPPage
, ornullptr
if there are no free pages. -
void page_free(PPage p)
Returnsp
to the free page pool for thisPhysMem
. The caller must ensure that this page is not mapped in anyVMRegion
. -
size_t npages()
Returns the total number of pages in this object (free or allocated). -
size_t nfree()
Returns the number of pages that are not currently allocated. -
PPage pool_base()
Returns the address of the first page in the pool (the pages in the pool occupy a range of contiguous PPage addresses).
Exercise 1: page_fault and map_page
(You will go through most of this exercise in section with your CA)
The file page_fault.cc
contains a simple program that illustrates
how to create a VMRegion
and then take a page fault in the region,
but it doesn't actually allocate physical memory or set up a
virtual-to-physical mapping.
The file map_page.cc
adds functionality to allocate a physical
page when the page fault occurs and map it into the VMRegion
, so
that memory accesses to the VMRegion
can complete.
Read through the code of both files to familiarize yourself with them, then run the programs and observe their output:
make
./page_fault
./map_page
Once you have run the programs, answer the questions in questions.txt
.
The fault_handler
function in map_page
is currently a bit hacky,
in that it uses region.get_base()
to determine the VPage
at which
to map the PPage
. This only works because the VMRegion
in this example
contains only a single page. A better approach is to compute the VPage
from the faulting address: this is just the first byte of the page
containing fault_addr
(in general, fault_addr
could point anywhere
in a page, but the VPage
must refer to the first byte of the page).
This change would allow the fault handler to work with
regions containing multiple pages. Modify map_page.cc
so that fault_handler
computes a VPage
rather than calling region.get_base()
and
make sure that the program still runs.
C++ Proficiency: Deleting While Iterating
At some point in this assignment you will need to scan a C++ container and
delete some of the entries in it. This is tricky in C++ because object
deletion is not generally safe while iterating. For example, suppose
you try to iterate over an std::unordered_map
and delete some of its
entries using code like this:
std::unordered_map<int, Foo*> foo_map;
for (auto it = foo_map.begin(); it != foo_map.end(); ++it) {
if (...) {
foo_map.erase(it);
}
}
This code is unsafe, because deleting the element leaves the iterator
it
in an undefined state; bad things will happen if you keep
using that iterator.
However, the following code is safe:
std::unordered_map<int, Foo*> foo_map;
for (auto it = foo_map.begin(); it != foo_map.end(); ) {
if (...) {
it = foo_map.erase(it);
} else {
++it;
}
}
The erase
method returns a new iterator that refers to the next element
of the map after the deleted one, so it is safe to continue iterating.
Notice in this case that the for
statement no longer increments
it
: that happens only if the element isn't deleted.
Implementation Milestones
Milestone 1: Faults Forever
Implement the MCryptFile
constructor and map_file
method. You will
need to create a VMRegion
object in the map_file
method to manage the
virtual addresses for this mapped file. We have defined a skeleton page
fault handler function fault_handler
in the MCryptFile
class,
which you should pass to the VMRegion
constructor as the handler
argument.
In the starter repo, fault_handler
just prints the virtual address that
caused the page fault (you will replace this body in a later milestone).
In addition, you should implement the map_base
and map_size
methods.
Be sure to change the return value of the map_file
method (it should not
return nullptr
). The map_size
test should now pass.
Now run the read
test (./test read
). You should see that the file is
successfully mapped, but the test will hang because fault_handler
doesn't actually make the page accessible; thus page faults will
happen repeatedly (as soon as fault_handler
returns, the
application retries the faulting instruction, which causes another
page fault).
Milestone 2: Map Pages
Create enough new functionality to load pages into memory on demand.
First, add code to allocate a PhysMem
object during the first
call to map_file
(don't allocate the PhysMem
until the first call
to map_file
). A single PhysMem
will be shared across all MCryptFile
s and
used to allocate physical memory pages, just as an operating system
uses a single physical memory to allocate pages for all processes.
MCryptFile::set_memory_size
may have been invoked to specify
how large the pool of physical memory should be. If
MCryptFile::set_memory_size
has not been invoked by the time
the PhysMem
is created, use 1000 pages for the PhysMem
object.
Then replace the code in fault_handler
with code to allocate a
physical page, fill it with the appropriate information from the
file, and make it accessible at the correct virtual address.
If physical memory runs out, PhysMem::page_alloc
will return
an error; if this happens you can print an error message and
exit the application (once you implement page replacement in Assignment
6 this error should never occur).
For now, set the permissions
on each page to be PROT_READ|PROT_WRITE
. If you run the read
test
again, you should see that 3 pages are successfully read, but the test
will generate an error because pages are not being unmapped.
Milestone 3: Supplemental Page Map, Destructor, and Unmap
Implement the destructor and the unmap_file
method. In order to do this, you
will need to unmap all of the VPages
that have been mapped for
that file and return their PPages
to the PhysMem
. This will
require you to define an additional data
structure called a supplemental page map. The supplemental page
map will provide information for each VPage
, such as whether
it is mapped and, if so, the associated PPage
. As you work through the
assignment you'll discover other information that needs to be
stored in the supplemental page map. It's up to you to determine
the structure of the supplemental page map; it should be implemented
so that you can easily look up the information for a page given
its VPage (such as when a page fault occurs for that page).
Once you have an initial implementation of the supplemental page map,
you should be able to implement the destructor and the unmap_file
method. At this point, the read
test will pass except for a mismatch
in protections (this will be fixed in Milestone 5).
Milestone 4: Flush
Implement the flush_file
method. For now, flush all pages in physical memory that
belong to the MCryptFile
without considering whether they are dirty.
Remember to flush in the unmap_file
method.
All the tests should now complete, but you will get errors because
pages that aren't dirty are getting written back to the file.
Milestone 5: Tracking Dirty Pages
Make flush_file
more efficient by keeping track of the dirty pages
and only writing dirty pages back to the file (clean pages should
not be written back).
Paging hardware usually provides a "dirty" bit in page map entries,
which is set by the hardware when a page is written.
Unfortunately, this information is not passed through by the mmap
mechanism we are using for this assignment, so you will have to
use clever software to emulate a dirty bit for each page.
You can use page protections for this:
if a page's protection is set to PROT_READ
, then a page fault will
occur the first time the page is written (and a page fault will not
occur unless the page is written). Given this, you should be able
to emulate dirty bits for each page. Use the emulated dirty bits to avoid
writing back clean pages during flushes.
Page faults may now happen multiple times on the same virtual page, but you should only read each page from the file once (unless the file is unmapped and re-mapped).
Your dirty bit should behave like a dirty bit in a page map, i.e.
it should be reset when pages are flushed (and therefore no longer
dirty). Since you are notified about writes by page faults, this
implies something about the permissions after flush_file
is called.
At this point all of the tests should pass. Congratulations!
Milestone 6: Odds and Ends
If you haven't already done so, implement set_memory_size
. Also,
go over the Miscellaneous Notes below and implement anything else
that is needed.
Miscellaneous Notes
-
For this assignment you may assume that there are enough physical pages to accommodate all of the pages in all of the open
MCryptFiles
: you need not worry about page replacement. You can exit the program (with an informative message, of course) ifPhysMem::page_alloc
returnsnullptr
. -
Load pages into memory on demand, so that no memory is wasted on pages that are never accessed. You should only read pages from disk when responding to page faults.
-
You may assume that your code is used only in single-threaded environments; you do not need to worry about synchronization for this assignment.
-
Your solution must support multiple
MCryptFile
objects mapped at the same time, with oneVMRegion
perMCryptFile
. All of theMCryptFile
s must share the samePhysMem
. -
If you use
gdb
to debug your assignment, you will notice that it catches the SIGSEGV signals used to signal page faults and stops the application before it can handle those page faults. If you typecontinue
then the signal will be transmitted to your application so the page fault will be handled. If you get tired of typingcontinue
you can changegdb
's behavior with the following command:handle SIGSEGV noprint nostop pass
The arguments indicate that, when SIGSEGV signals occur, they should be passed to the application;
gdb
will not stop the application or print any indication that the signal occurred. You may find other combinations of argument values convenient as well.
Submitting Your Work
Once you are finished working and have saved all your changes, submit by
running tools/submit
. Make sure that you have answered the
questions in questions.txt
before submitting.
We recommend you do a trial submission in advance of the deadline to allow time to work through any snags. You may submit as many times as you like; we will grade the latest submission. Submitting a stable but unpolished/unfinished version is like an insurance policy. If the unexpected happens and you miss the deadline to submit your final version, the earlier submit will earn points. Without a submission, we cannot grade your work. You can confirm the timestamp of your latest submission in your course gradebook.
Grading
Here is a recap of the work that will be graded on this assignment:
questions.txt
: answer all of the questions.map_page.cc
: modifyfault_handler
as described in Exercise 1.mcryptfile.hh
andmcryptfile.cc
: flesh out theMCryptFile
class.
We will grade your code using the provided sanity check tests and possible additional autograder tests. We will also review your code for additional errors as well as style and complexity. Check out our course style guide for tips and guidelines for writing code with good style!