Computer Science 161 (2014)
Final Exam
24 hours between May 6, 2014 and May 10, 2014 PM
240 minutes/240 points
Instructions
The points allocated to each problem correspond to how much time
we think the problem should take. Although you have 24 hours in
which to take the exam, we do not recommend that you pull an
all-nighter and spend 24 hours completing the exam.
All questions should be directed to me (Margo) via email. I will
check mail periodically, but do not expect an instant response.
I strongly encourage you to read through the exam and fire off
any questions that immediately come to mind. I will check
mail regularly, but will not be watching constantly. When in
doubt, document assumptions.
Here is my advice on how to complete this exam:
- Read through the entire exam. Send me any questions.
- Go to lunch or dinner or take a walk and think about the exam.
- Spend a couple of hours on the first day/evening writing up answers the
questions that you know. Sketch our your thoughts about some of the longer
design problems.
- Get a good night's sleep.
- Spend another couple of hours on day 2 working on the remaining
problems and fleshing out the details of your designs.
- Read over your answers.
- Email them to me at margo@eecs.harvard.edu.
- On Monday, May 12, show up to the 2nd floor Lobby of Maxwell Dworkin
to consume waffles and ice cream, pick up your T-shirt,
and optionally iron on a team patch (if you have fancier animal artwork
than my scanned images, please send them to me before the 12th and I'll
have them available for you to iron on).
Please send your answers to me in a PLAIN TEXT file as an attachment
(in a perfect world, your text file would not have lines longer than
80 characters, but I'll reformat it if it does).
Label the answers clearly so it is
obvious what answers correspond to what questions. Leave space
between each question. Please place your answers in order in the
file. Send your completed exam to margo@eecs.harvard.edu.
I will return the graded exam to you; I tend to engage in a dialog as
much as possible, so there is some educational value in reading more
than just the final number.
This exam is open text book, open source code, and open notes. You
may not use computers (cell phones, PDAs, or any other device that can
talk to a network)
during the exam (other than to record your answers, to view OS/161
or System/161 code, or to examine your notes or the course web site).
Please do not access the Piazza group during the exam.
You must complete the exam on your own.
Exam Policy Statement
Please do not forget to do this!
as stated above, please write the following statement and
then place your name after it, asserting that you have followed the
rules of the exam. "I have neither given nor received help on this exam
and have followed the rules as stated."
If this statement is absent, you will hear from me.
1. Retake Concept Inventory (5 points)
This is a free 5 points -- all you need to is complete the
concept inventory that you took at the beginning of the year to the best
of your ability.
You will be graded for completion, not based upon the correctness of your answers,
but if you do worse than you did in February, we'll be sad.
2. OS Abstractions (45 points)
The main learning object of this coures is for you to be able to explain how
an operating system provides the abstractions with which programmers are
familiar.
To that end, for each of the three items listed below, explain to a smart
student who has completed CS50, is reasonably comfortable programming to the
POSIX API, but has not taken CS61 (nor CS161) and doesn't know how the operating
system provides each abstraction.
(More concretely, your audience knows how to program, has heard of VM but has
no idea what it means, knows that computers consist of a processor, memory,
and a disk, and knows how to write programs in C that use POSIX system calls
like fork, exec, open, close, read, write.)
Please limit your answer to roughly 100-200 words (I won't count, but I want more
than 3 sentences, and less than a page).
- Processes
- Virtual Memory
- Files
3. Pot Pourri (35 points)
- (15 points) We know that operating systems enforce security on files by means of
access control lists and capabilities, but operating systems must be
security conscious about other things as well.
Think about the three A's, and list 3 things an operating system does to
provide security. For each item, if possible, describe which of the A's it
addresses.
- (20 points) Discuss how virtualization and time-sharing are similar/different.
Again, one paragraph for each of the similar and different should be sufficient.
4. File System Design: Inode location management (15 points)
LFS keeps track of the locations of inodes by storing a large array
in the ifile. The array uses the inode number as an index into
this array containing the disk address for the corresponding inode.
In contrast, ZFS stores all dnodes directly in an object set (objset). Each
dnode lives in a precise location in the objset (e.g., dnode N lives
at offset sizeof(dnode) * N). So updating a dnode is just like updating
any other part of an object or objset.
Both systems use a no-overwrite storage mechanism, so inodes and
dnodes move around the disk. Explain in one to two paragraphs
how these two approaches
to tracking the locations of file metadata are similar and dissimilar.
Think about how many I/Os are required to update files in each system.
5. File System Design: Journaling user data
This is one of those long design questions. Read it through first,
but then move on to some of the short answer questions, before
diving into this one.
Today's modern file systems, like ZFS, support many features, such as:
- Deduplication: finding blocks containing identical data,
storing only one copy of such blocks, and then letting multiple
files reference them.
- Snapshots: read-only versions of a file system as of a point
in time.
- Clones: These begin as a snapshot, but allow subsequent updates.
These features result in data blocks being referenced from multiple
places (e.g., two different files; two different snapshots).
Your job is to design a scheme for efficiently journaling user data in such a
system. You should take advantage of the fact that blocks can be referenced from
multiple locations.
You should assume the following about your system (many of these ideas are taken
from ZFS):
- Meta-data updates are journaled. That means that you have a journaling and
recovery mechanism, so you only need talk about adding new records and possibly
changing how recovery might work.
- Your system takes periodic checkpoints (on the order of every 5-10 seconds,
and you are guaranteed that the disk is 100% consistent after every checkpoint.
- It is OK to delay all file system operations, except those that explicitly
require synchronous writes (e.g., sync, fsync). However, it is not OK to lose
all updates between checkpoints, so your log does need to be written more
frequently than every checkpoint; it just need not be flushed on every transaction.
Part I (50 points)
In this part, you will explain your mechanism for journaling user data.
The key design constraint that you must satisfy is that file data may
only be written to persistent storage once.
You should include a description of any log record (s) you wish to use.
You should explain when these log records are created and when they must
be written to persistent storage.
You should also explain if the addition of these log records introduces
any dependencies in your buffer management.
Finally, discuss the advantages and disadvantages of your approach.
We suggest that you begin approaching this problem by figuring out everything
that you need to do to handle a write system call.
Then consider how you will handle the following two scenarios (where BLOCKSIZE is
the size of a block in the file system):
Scenario 1
char buf[BLOCKSIZE]
write(fd, buf, BLOCKSIZE);
Scenario 2
char buf[BLOCKSIZE]
write(fd, buf, BLOCKSIZE/2);
write(fd, buf+BLOCKSIZE/2, BLOCKSIZE/2);
Next start figuring out log records you will need and any
dependencies in your buffer cache.
Write that all up and then step back and assess it critically
to discuss the advantages and disadvantages.
Part II (10 points)
Now consider any modifications that you might need to make to the
system's free space management.
6. Designing for Byte-Addressable Persistent Memory (80 points)
For this question, you will need to read
this paper
from the beginning up until section 4.1 (You will want to read about PCM, but you can
stop after that.)
The paper presents a way to build a file system on
byte-addressable persistent memory.
(You are, of course, free to read the entire
paper, but I wanted to keep the portion necessary for the exam
within reason.)
You will undoubtedly find it useful to read the questions
before you start reading.
- In your own words, describe short-circuit shadow paging.
- Could you use short-circuit shadow paging in a more
conventional block-based file system? If so, explain how
you would use it. If not, explain why it's not possible.
- The paper introduces the problem caused by the fact that
processors are allowed to reorder when data gets flushed to main
memory. They introduce epoch barriers to address this.
Disks also reorder requests. Explain how the situation between
disks and memory are the same or different; if they are the same,
explain what mechanism we use with disk drives to ensure proper
ordering.
- Explain how BPFS is well-suited to PCM as opposed to other, more
traditional block-based systems that we've studied.
- It is likely that some form of byte-addressable persistent memory
will be available on systems within the next few years.
Propose a different way you might use this technology (i.e., something
other than a file system).