Computer Science 161 (2014)

Final Exam

24 hours between May 6, 2014 and May 10, 2014 PM

240 minutes/240 points

Instructions

The points allocated to each problem correspond to how much time we think the problem should take. Although you have 24 hours in which to take the exam, we do not recommend that you pull an all-nighter and spend 24 hours completing the exam.

All questions should be directed to me (Margo) via email. I will check mail periodically, but do not expect an instant response. I strongly encourage you to read through the exam and fire off any questions that immediately come to mind. I will check mail regularly, but will not be watching constantly. When in doubt, document assumptions.

Here is my advice on how to complete this exam:

Read through the entire exam. Send me any questions.
Go to lunch or dinner or take a walk and think about the exam.
Spend a couple of hours on the first day/evening writing up answers the questions that you know. Sketch our your thoughts about some of the longer design problems.
Get a good night's sleep.
Spend another couple of hours on day 2 working on the remaining problems and fleshing out the details of your designs.
Read over your answers.
Email them to me at margo@eecs.harvard.edu.
On Monday, May 12, show up to the 2nd floor Lobby of Maxwell Dworkin to consume waffles and ice cream, pick up your T-shirt, and optionally iron on a team patch (if you have fancier animal artwork than my scanned images, please send them to me before the 12th and I'll have them available for you to iron on).

Please send your answers to me in a PLAIN TEXT file as an attachment (in a perfect world, your text file would not have lines longer than 80 characters, but I'll reformat it if it does). Label the answers clearly so it is obvious what answers correspond to what questions. Leave space between each question. Please place your answers in order in the file. Send your completed exam to margo@eecs.harvard.edu. I will return the graded exam to you; I tend to engage in a dialog as much as possible, so there is some educational value in reading more than just the final number. This exam is open text book, open source code, and open notes. You may not use computers (cell phones, PDAs, or any other device that can talk to a network) during the exam (other than to record your answers, to view OS/161 or System/161 code, or to examine your notes or the course web site). Please do not access the Piazza group during the exam. You must complete the exam on your own.

Exam Policy Statement

Please do not forget to do this!

as stated above, please write the following statement and then place your name after it, asserting that you have followed the rules of the exam. "I have neither given nor received help on this exam and have followed the rules as stated." If this statement is absent, you will hear from me.

1. Retake Concept Inventory (5 points)

This is a free 5 points -- all you need to is complete the concept inventory that you took at the beginning of the year to the best of your ability. You will be graded for completion, not based upon the correctness of your answers, but if you do worse than you did in February, we'll be sad.

2. OS Abstractions (45 points)

The main learning object of this coures is for you to be able to explain how an operating system provides the abstractions with which programmers are familiar. To that end, for each of the three items listed below, explain to a smart student who has completed CS50, is reasonably comfortable programming to the POSIX API, but has not taken CS61 (nor CS161) and doesn't know how the operating system provides each abstraction. (More concretely, your audience knows how to program, has heard of VM but has no idea what it means, knows that computers consist of a processor, memory, and a disk, and knows how to write programs in C that use POSIX system calls like fork, exec, open, close, read, write.)

Please limit your answer to roughly 100-200 words (I won't count, but I want more than 3 sentences, and less than a page).

Processes
Virtual Memory
Files

3. Pot Pourri (35 points)

(15 points) We know that operating systems enforce security on files by means of access control lists and capabilities, but operating systems must be security conscious about other things as well. Think about the three A's, and list 3 things an operating system does to provide security. For each item, if possible, describe which of the A's it addresses.
(20 points) Discuss how virtualization and time-sharing are similar/different. Again, one paragraph for each of the similar and different should be sufficient.

4. File System Design: Inode location management (15 points)

LFS keeps track of the locations of inodes by storing a large array in the ifile. The array uses the inode number as an index into this array containing the disk address for the corresponding inode.

In contrast, ZFS stores all dnodes directly in an object set (objset). Each dnode lives in a precise location in the objset (e.g., dnode N lives at offset sizeof(dnode) * N). So updating a dnode is just like updating any other part of an object or objset.

Both systems use a no-overwrite storage mechanism, so inodes and dnodes move around the disk. Explain in one to two paragraphs how these two approaches to tracking the locations of file metadata are similar and dissimilar. Think about how many I/Os are required to update files in each system.

5. File System Design: Journaling user data

This is one of those long design questions. Read it through first, but then move on to some of the short answer questions, before diving into this one.

Today's modern file systems, like ZFS, support many features, such as:

Deduplication: finding blocks containing identical data, storing only one copy of such blocks, and then letting multiple files reference them.
Snapshots: read-only versions of a file system as of a point in time.
Clones: These begin as a snapshot, but allow subsequent updates.

These features result in data blocks being referenced from multiple places (e.g., two different files; two different snapshots).

Your job is to design a scheme for efficiently journaling user data in such a system. You should take advantage of the fact that blocks can be referenced from multiple locations. You should assume the following about your system (many of these ideas are taken from ZFS):

Meta-data updates are journaled. That means that you have a journaling and recovery mechanism, so you only need talk about adding new records and possibly changing how recovery might work.
Your system takes periodic checkpoints (on the order of every 5-10 seconds, and you are guaranteed that the disk is 100% consistent after every checkpoint.
It is OK to delay all file system operations, except those that explicitly require synchronous writes (e.g., sync, fsync). However, it is not OK to lose all updates between checkpoints, so your log does need to be written more frequently than every checkpoint; it just need not be flushed on every transaction.

Part I (50 points)

In this part, you will explain your mechanism for journaling user data. The key design constraint that you must satisfy is that file data may only be written to persistent storage once. You should include a description of any log record (s) you wish to use. You should explain when these log records are created and when they must be written to persistent storage. You should also explain if the addition of these log records introduces any dependencies in your buffer management. Finally, discuss the advantages and disadvantages of your approach.

We suggest that you begin approaching this problem by figuring out everything that you need to do to handle a write system call. Then consider how you will handle the following two scenarios (where BLOCKSIZE is the size of a block in the file system):

Scenario 1
	char buf[BLOCKSIZE]

	write(fd, buf, BLOCKSIZE);
Scenario 2
	char buf[BLOCKSIZE]

	write(fd, buf, BLOCKSIZE/2);
	write(fd, buf+BLOCKSIZE/2, BLOCKSIZE/2);

Next start figuring out log records you will need and any dependencies in your buffer cache. Write that all up and then step back and assess it critically to discuss the advantages and disadvantages.

Part II (10 points)

Now consider any modifications that you might need to make to the system's free space management.

6. Designing for Byte-Addressable Persistent Memory (80 points)

For this question, you will need to read this paper from the beginning up until section 4.1 (You will want to read about PCM, but you can stop after that.) The paper presents a way to build a file system on byte-addressable persistent memory. (You are, of course, free to read the entire paper, but I wanted to keep the portion necessary for the exam within reason.) You will undoubtedly find it useful to read the questions before you start reading.

In your own words, describe short-circuit shadow paging.
Could you use short-circuit shadow paging in a more conventional block-based file system? If so, explain how you would use it. If not, explain why it's not possible.
The paper introduces the problem caused by the fact that processors are allowed to reorder when data gets flushed to main memory. They introduce epoch barriers to address this. Disks also reorder requests. Explain how the situation between disks and memory are the same or different; if they are the same, explain what mechanism we use with disk drives to ensure proper ordering.
Explain how BPFS is well-suited to PCM as opposed to other, more traditional block-based systems that we've studied.
It is likely that some form of byte-addressable persistent memory will be available on systems within the next few years. Propose a different way you might use this technology (i.e., something other than a file system).