Michael Mitzenmacher: Recent Projects
I'm an applied algorithms person. I look for problems in the
intersection of algorithms, networks, and information theory, but in
practice I work on problems in all three areas. The Internet is a
great source of problems; you need algorithms for effectively
transmitting information over a complex network.
If you're interested in applied algorithms, the Internet is a great
source of problems. Most of my recent work in this area focuses on
applications of Low-Density Parity-Check codes (see more on codes
below) and Bloom filters.
A Bloom filter is a data structure designed to succinctly represent a
set of objects so that you can queries of the form "Is this an element
of the set?" very quickly. The price you pay for the gain in speed
and space is that sometimes the data structure will tell you an
element is in the set when it is not. That is, there are false
positives. Whenever you want to use or send a set, but the space
requirements are too high, you should think about using a Bloom filter
instead. This data structure has recently been suggested for several
network applications, as Andrei Broder and I detail in our survey of
Bloom filters in network
applications.
I had the opportunity to combine codes and Bloom filters while working
with John Byers and the people on the Shapeshifter project
at Boston University.
Our SIGCOMM 2002 paper
discusses how to use codes and Bloom filters to achieve
informed content delivery across adaptive overlay networks. A
technical report gives
details about a new data structure we developed that uses a Bloom filter
for fast approximate reconciliation, a useful primitive in overlay networks.
My most recent endeavor in this area has been to consider how to
combine rankings from independent sources into a global ranking, which
has applications to search and meta-search. I was introduced to this
area by my esteemed collaborator Cynthia Dwork (Microsoft Research,
San Jose). We should have a draft up shortly of our latest results.
With Michael Luby, Amin Shokrollahi, and Dan Spielman, I co-developed
Tornado codes, a particularly efficient type of low-density parity-check code
for erasures. These ideas have been further developed and
commercialized by the company Digital Fountain. I worked
with Digital Fountain colleagues Michael Luby and John Byers on
network applications for these codes, which led to some related work
on congestion-control protocols for reliable multicast.
Our work also led to analyses for low-density parity-check codes over
erasure channels. Richardson and Urbanke extended our analysis,
leading to low-density parity-check codes that provably come very,
very close to the Shannon capacity for basic channels. For this work,
I am told the six of us are sharing the IEEE Information Theory
Society Paper Award, but I haven't received a written confirmation
yet.
I currently work on coding problems with Alek Kavcic, of the Harvard
EE department, extending these ideas to more complex channels. I am
also working on a new low-density parity-check code variation called
Verification Codes that combine the benefits of the error and erasure
codes developed previously.
I serve as an algorithmic consultant for the Human-Guided Search
project at Mitsubishi Electronic Research Labs. The project is led by
the truly outstanding Neal
Lesh, and more information can be found on the Human-Guided Search project
page.
Interactive, or human-in-the-loop, optimization systems leverage
people's abilities in areas in which they outperform computers, such
as visual and strategic thinking. Users can steer interactive
optimization systems towards solutions which satisfy real-world
constraints. Furthermore, people can better understand, justify, and
modify solutions if they have participated in their construction.
The Human-Guided Search (HuGS) framework and Java toolkit allows rapid
development of interactive optimization systems. The framework and code
include visual metaphors for focusing and constraining optimization
algorithms. The user can select from different algorithms, including
a human-guidable version of a powerful heuristic, called tabu search.
We have developed a wide variety of applications with the HuGS
toolkit, including interactive systems to solve scheduling, vehicle
routing, layout, and protein-folding problems.
My Ph.D. thesis was on balls and bins problems, and every time I tell
myself I've written my last paper on this theme, I find I'm mistaken.
One model I've been exploring is when there are just two bins, and the
probability that the next ball lands in a bin is proportional to some
function of the number of balls in the bins. Such models are used to
describe positive feedback, which some economists claim really happens
in the real world. Our results show that monopoly happens, quickly!
The first work in this area was done with my graduate student Eleni
Drinea and Alan Frieze.
Another model I've recently worked on with Balaji Prabhakar and
Devavrat Shah from Stanford is balls and bins games with memory.
Suppose you throw n balls into n bins, with each ball (after the
first) getting to choose the shortest of two bins as follows: for each
ball, one bin is chosen independently and uniformly at random; the
other choice is the best of the two bins left over after the previous
ball. This yields a "better" maximum load than choosing two bins
independently and uniformly at random for each ball. (Well, sort of...
check out the paper.)