Additional Information about the Journaling File System
This page contains more details about how the journaling file system for Assignment 8 is implemented, including the overall disk layout, the structure of the log, and how the file system ensures the integrity of log entries. This material is not strictly required to complete Assignment 8, but you may find it useful and/or interesting.
We also recommend that you read some or all of the code that implements the V6 file system; it's a great example of a professional-quality C++ project and it uses many advanced C++ features.
Journaling V6 File System Layout
The on-disk layout of the journaling V6 file system consists of the following sections:
- The boot block (1 sector)
- The superblock (1 sector)
- Inodes (variable-length)
- Data blocks (variable-length)
- The log header (1 sector)
- The free-block bitmap or freemap (variable-length)
- The log (variable-length)
Sections 1-4 are identical or nearly identical to the original V6 file
system layout, which you used in Project 7. Their layout is defined by
data structures and constants in the header file layout.hh
.
Sections 5-7 were added for this assignment. Their layout is defined by
the logentry.hh
header file. Since nobody runs the V6 file system
on raw hard disks any more---V6 file system images exist only as files
on other file systems---we simply extend the file containing a V6 file
system image to accommodate the new areas required by journaling.
The boot block
This block is essentially unused, but the first two bytes must be 4
and 7 or mountv6
will not recognize the disk image.
The superblock
The superblock is essentially the same as in 1975, except that we've coopted two of the unused bytes to add two more fields:
struct filsys {
...
uint8_t s_uselog; // Use log
uint8_t s_dirty; // File system was not cleanly shut down
...
};
The s_uselog
byte, when non-zero, indicates that our new
21st-century logging functionality has been enabled for this
particular file system volume.
The s_dirty
field is set to 1 whenever the file system is opened,
and set to 0 when it is closed cleanly. If your file system crashes,
s_dirty
will be 1, and mountv6
will refuse to mount it again until
you have repaired the damage. (If you want to see how bad a corrupt
file system can be in action, you can forcibly mount a dirty volume
with the --force
flag, as in ./mountv6 --force v6.img mnt
.)
There's one more change to the superblock. Originally, V6 stored free
blocks in a linked list 100-wide. The superblock stored s_nfree
free blocks (0--100) in an array s_free
, with s_free[0]
(if
non-zero) pointing to a disk block containing 100 pointers to other
free blocks and so on. This format is not convenient for journaling,
so we set s_nfree
to zero and instead dedicate a portion of the disk
to a bitmap, containing one bit per data block, where 1 indicates the
block is free and 0 indicates it is allocated. For brevity, we refer
to the "free-block bitmap" as the freemap and store it in a
structure field called freemap_
.
V6 also stores a vector of up to 100 free inodes in the superblock,
called s_inode
. Since the superblock is kept in memory during normal
operation, this speeds up inode allocation (no need to search the inode
array on disk to find a free one, as long as there are free inodes
cached in the superblock). The number of valid entries is s_ninode
, and when
it reaches zero the V6 code simply re-scans the inode area to find 100
unallocated inodes. When repairing an unclean volume, we can avoid
inconsistencies in inode allocation by just setting s_ninode
to 0.
As long as we have properly repaired the inodes and the IALLOC
flags
are all correct in the i_mode
fields, the file system will just
re-scan the inode area the next time it needs to allocate an inode.
Inodes
There is no change to the inode section. It's a giant array of
inode
structures, where the entry 0 of block INODE_START_SECTOR
(2) of the file system corresponds to inode number 1.
Data blocks
There is no change to this section of the file system.
Log header
The log header is the first block of the log area of disk.
Its contents are defined by the loghdr
structure in logentry.hh
:
struct loghdr {
uint32_t l_magic; // LOG_MAGIC_NUM
uint32_t l_hdrblock; // Block number containing loghdr structure
uint16_t l_logsize; // Total size (in blocks) of loghdr, freemap, and log area
uint16_t l_mapsize; // Number of blocks in freemap
uint32_t l_checkpoint; // Disk offset of log checkpoint
lsn_t l_sequence; // Expected LSN at checkpoint
uint32_t mapstart() const { return l_hdrblock + 1; }
uint32_t logstart() const { return mapstart() + l_mapsize; }
uint32_t logend() const { return logstart() + l_logsize; }
uint32_t logbytes() const {
return SECTOR_SIZE * (l_logsize - l_mapsize - 1);
}
};
The first two fields are sanity checks to ensure you really have a
loghdr
. The l_magic
field contains the constant LOG_MAGIC_NUM
(0x474c0636
--a randomly-generated value), while l_hdrblock
is the
actual block number storing the loghdr
structure. The first
constant is unlikely to appear in random files, since it was randomly
generated. The second constant ensures that if you copy a valid
loghdr
into a file, it still won't look like a valid loghdr
since
it will be at the wrong disk location.
l_logsize
specifies how many blocks have been added to the V6 file
system for the new feature set. This size includes sections 5--7 of
the file system (the log header, freemap, and journal). With the new
feature set enabled, the total number of 512-byte blocks of the disk
image should now be s_fsize
(from the superblock) plus l_logsize
.
l_mapsize
specifies how many blocks are used for the freemap. This
must be large enough to contain at least 1 bit for each block number
from the start of data blocks (at INODE_START_SECTOR+s_isize
) to the
end of the data blocks (at s_fsize
).
Finally, l_checkpoint
and l_sequence
tell you where to start
reading the log and what log sequence number to expect.
l_checkpoint
is the byte offset of the first log record you should
read. (Because it's a byte offset, it should reside somewhere in the
half-open interval [logstart()*SECTOR_SIZE, logend()*SECTOR_SIZE)
.)
Each log entry has a consecutively-assigned 32-bit log sequence
number or LSN of type lsn_t
. l_sequence
tells you the LSN of the
first log record you should expect to read at offset l_checkpoint
.
If you see the wrong LSN, even if it appears to be a valid log record,
you must stop processing the log as you may be looking at an old log
entry.
Freemap (free-block bitmap)
The free-block bitmap has one bit for every block in section 4 of the file system (data blocks). If the bit is 1, the block is free. If the bit is 0, the block is in use.
The log
The log consists of a series of log entries, each of which begins with a header, followed by a payload, and then a footer. Conceptually, the three fields look like this:
struct Header {
lsn_t sequence; // First copy of the LSN
uint8_t type; // What type this is (entry_type index)
} header;
entry_type payload;
struct Footer {
uint32_t checksum; // CRC-32 of header and payload
lsn_t sequence; // Another copy of the LSN
} footer;
It's important to note that, while the beginning of the log will generally be valid, there isn't a clean end. The log just turns to garbage after the last successful write; the information after the last good record might not even be recognizable as a log record.
To help identify the end of the log, the header contains a log sequence number, or LSN. This is a consecutively assigned counter that is unique for each log entry. LSNs are very important because the log area is repeatedly recycled. Hence, whatever bytes are left over from the last iteration may be old log entries or may look like valid log entries. These old entries must not be applied to the file system; doing so could damage the file system. Even a valid old entry might overwrite file data in a current file data block that was previously a directory or indirect block.
The header also contains a type
integer, which specifies the type of
log entry, and hence how to parse the payload. The payload section
depends on type
, and is one one of the structures defined later on
in this writeup. The payload immediately follows the type.
Finally, the log entry ends with a footer. The footer contains another copy of the sequence number. The second copy helps detect situations in which a log entry spans a sector boundary and the part of the log entry in the first sector got written while the part in the second contains information from an old log record. There's still the possibility that the "right" log sequence number for the footer could appear in arbitrary payload data from an older log record. Hence, as an additional check, the footer also includes a checksum over the header and payload. The check reduces the probability of interpreting garbage as a log record by an additional factor of 2-32.
Log Entries
There are 6 different types of log entry that you must process. Three
of them, LogPatch
, LogBlockAlloc
, and LogBlockFree
, are discussed
in the main assignment writeup. The others are described below.
Transaction begin
struct LogBegin {
};
Ensuring file system consistency often requires writing to multiple disk locations atomically. For example, creating a file requires writing a directory entry and initializing the inode. Appending a block to a file requires writing a block pointer in the inode or indirect block and marking the block as no longer free in the freemap. Doing only one of these writes would leave the file system in an inconsistent state. For example, if an inode points to a block that is still marked free, the block may get cross-allocated to another inode. If a directory entry is written but the inode is not initialized, the file system will assume that garbage data in the inode is meaningful, which is likely to cause misbehavior (e.g. the inode could appear to contain random disk blocks).
To avoid such problems, log entries that modify file system state are
grouped together in atomic transactions for which either all the log
entries are applied or none are applied. Transactions are bracketed
by a LogBegin
at the beginning and LogCommit
entry at the end.
The LogBegin
payload itself contains no data. However, you
must not apply any subsequent records unless there is also a
matching LogCommit
entry. If the LogCommit
entry is missing, it
means the system crashed in the middle of writing to the log, and
hence applying subsequent records will likely leave the file system in
an inconsistent state.
Note, even though the LogBegin
payload contains zero bytes, the
entry still has a header and footer. The header contains an LSN and a
type byte signifying that this is a LogBegin
, while the trailer ensures
integrity of the log entry as usual.
In the assignment, we do the check for a LogCommit
for you and
don't even call your apply
methods unless the log entries are part
of a complete transaction.
Transaction commit
struct LogCommit {
uint32_t sequence; // LSN of LogBegin
};
The LogCommit
entry indicates the end of an atomic transaction,
consisting of all log entries since the previous LogBegin
. Note
that the sequence
field must include the LSN of the corresponding
LogBegin
log entry. Otherwise, it indicates the log has been
corrupted and you must stop processing it.
Log rewind
struct LogRewind {
};
This indicates that the log has wrapped around and the next entry is back at the beginning of the log area. (An alternative is always to write to the very last byte of the log before wrapping around, but this would result in log entries that are not contiguous on disk. Contiguous entries facilitate debugging: you can always start examining the log from the beginning of the log area and see some valid operations that happened, whether before or after the latest checkpoint.)
Tools Reference
Here is a full list of the tools included in this assignment's starter code. Not all are required to be used for this project, but they offer cool ways to explore logging file systems!
-
mkfsv6
image-file [num-blocks [num-inodes [num-journalblocks]]] -- This tool creates a new file system. It uses the maximum size of 65,535 blocks (~32 MiB) and one inode per 2KiB by default, but you can specify different parameters if you prefer. By default it does not create a log area, expecting you to do this separately withmountv6 -j
. However, you can specify a number of journal blocks, and it will create the journal area. If you specify 0, it will use a default size for the journal. -
dumplog
image-file [offset |c
] -- Pretty-prints the log from the beginning, or from a specified numeric file offset. If you use the letterc
instead of a numeric offset, prints from the checkpoint. -
v6
-- A tool with various subcommands for examining the file system state. It expects your disk image to be calledv6.img
. You can change this with theV6IMG
environment variable (e.g.,V6IMG=test.img ./v6 ls /
for a single invocation orexport V6IMG=test.img
for all future invocations). Some useful subcommands:-
v6 dump
-- look at the superblock and log header. -
v6 stat
{path |#
i-number}... -- shows the contents of particular inodes. You can specify inodes as pathnames or as#
prefixed to the i-number. Be aware that#
is a comment character for most shells, so you may need to quote it. (Example:./v6 stat /
or./v6 stat '#1'
to see the root directory inode.) -
v6 ls
[-a
] path... -- lists a file or directory in a format similar tols -alni
. With the-a
flag, shows time of last access instead of modification (similar tols -alniu
). -
v6 cat
path... -- look at the contents of a file. -
v6 iblock
block-number... -- dumps the contents of one or more indirect blocks, showing the non-zero block pointers. -
v6 block
block-number ... -- dumps the raw contents of one or more file system blocks in a side-by-size hex and ASCII format with 16 bytes per line. This is convenient for directories because thedirentv6
structure is exactly 16 bytes. (If you find this format generally useful, you can achieve something similar with the unix shell commandod --endian=big -t x4z -N 512 -j
block-numberb
. However,v6 block
replaces unprintable characters with space instead of dot, making it easier to identify the.
and..
entries in directories.) -
v6 usedblocks
-- lists all allocated block numbers. If you are leaking blocks, this can help you identify which blocks you are failing to mark free. -
v6 usedinodes
-- likeusedblocks
, but lists all allocated inodes. Useful if you are leaking inodes.
-
-
fsckv6
[-y
] image-file -- Reports what's wrong with the file system. If you use the-y
flag, it will bring the file system back to a consistent state. However, it doesn't have any support for logging and will disable the log. (You can always recreate the log by supplying the-j
flag tomountv6
.)