CS 281r

Planning: without and with Uncertainty

Harvard University

Fall 2006


Teaching Staff

Professor: Avi Pfeffer
Office: Maxwell Dworkin 251
Email: avi@eecs.harvard.edu
Office hours: Friday 1-2:30
TF: Chih-han Yu
Office: Maxwell Dworkin 238
Email: chihanyu@gmail.com
Office Hours: TBA

The class web page is at http://www.eecs.harvard.edu/~avi/CS281r/F06

You may email the teaching staff at cs281r@eecs.harvard.edu


Course Description

Planning is one of the central challenges of artificial intelligence. The task is to get from an initial condition to a goal condition via a sequence of actions. In this course, we will explore a variety of approaches to planning. We will begin by exploring traditional planning approaches, that do not consider uncertainty about an agent's knowledge of the world and the effects of its actions. Topics in this section will include: planning as theorem proving, non-linear planning, planning as graph search, planning as satifiability, hierarchical planning, and planning with time and resource constraints. The second section of the course will cover planning under uncertainty. Topics here include: Markov decision processes (MDPs), factored MDPs, first-order MDPs, partially observable MDPs, and reinforcement learning. The course will conclude with a brief look at robot motion planning.

Prerequisite

CS 181 or CS 182 or consent of instructor.

Requirements and Grading

This class will be driven by research papers and will largely be discussion based. There are three requirements.

Course Plan and Readings (subject to change)

Mon Sep 18: Introduction and historical beginnings: [Newell and Simon 1963]
Wed Sep 20: Planning as theorem proving [McCarthy and Hayes 69]
Mon Sep 22: Solving the frame problem and goal regression [Reiter 91]
Wed Sep 24: Nonlinear planning [McAllester and Rosenblitt 91, Peot and Smith 92]
Wed Oct 1: Graphplan [Blum and Furst 97]
Satplan [Kautz and Selman 92, Ernst et al. 97]
Hierarchical planning [Erol et al. 96, Yang 90]
Planning with time and resource constraints [Wilkins 1990, Haslum and Geffner 01]
Simple temporal problems [Dechter et al. 91]
Markov decision processes (MDPs) and structure (3 lectures) [Boutilier et al. 99]
Factored MDPs (2 lectures) [Guestrin et al. 03
First-order MDPs [ Yoon et al 02, Sanner and Boutilier 06 ]
Partially observable MDPs [Kaelbling et al. 98a]
Reinforcement learning (RL) [ Kaelbling et al. 98b]
RL with function approximation: [ Boyan and Moore 95, Gordon 95]
Policy search and hierarchical RL [ Ng and Jordan 00, Andre and Russell 02]
Applications of RL [Tesauro 95, web version available here, Ng et al. 04]
Robot motion planning [Latombe 91, chapter 1]
Probabilistic roadmaps [Kavraki et al. 96]
Rapidly-exploring random trees [Lavalle and Kuffner 01, Yu et al. 04]

Please note that there will be no class on Oct 2, due to Yom Kippur, and on Oct 9, due to Columbus Day. The last few classes will be devoted to project presentations. Papers not available on the web can be picked up from MaryFran Mitrano, Maxwell Dworkin 247.

References

AAAI = National Conference on Artificial Intelligence.
IJCAI = International Joint Conference on Artificial Intelligence.
UAI = Conference on Uncertainty in Artificial Intelligence.
AIJ = Artificial Intelligence (journal).
JAIR = Journal of Artificial Intelligence Research.
A. Newell and H. A. Simon (1963). "GPS: A Program that Simulates Human Thought", in E. A. Feigenbaum & J. Feldman (eds.), Computers and Thought, pp. 279-293, R. Oldenbourg KG.
J. McCarthy and P. J. Hayes (1969). "Some Philosophical Problems from the Standpoint of Artificial Intelligence", Machine Intelligence 4, pp. 463-502.
R. Reiter (1991). "The Frame Problem in the Situation Calculus: A Simple Solution (Sometimes) and a Completeness Result for Goal Regression". In V. Lifschitz (ed.), Artificial Intelligence and Mathematical Theory of Computation: Papers in Honor of John McCarthy, pp. 359-370, Academic Press, New York.
D. A. McAllester and D. Rosenblitt (1991). "Systematic nonlinear planning". AAAI.
M. A. Peot and D. E. Smith (1992). "Conditional Nonlinear Planning". International Conference on Artificial Intelligence Planning Systems.
A. L. Blum and M. Furst (1997). "Fast Planning Through Planning Graph Analysis". AIJ, 90(1-2), pp. 281-300.
H. Kautz and B. Selman (1992). "Planning as satisfiability". European Conference on Artificial Intelligence.
M. D. Ernst, T. D. Millstein and D. S. Weld (1997). "Automatic SAT Compilation of Planning Problems". IJCAI.
K. Erol, J. Hendler and D. S. Nau (1996). "Complexity Results for HTN Planning". Annals of Mathematics and Artificial Intelligence, 18(1) pp. 69-93.
Q. Yang (1990). "Formalizing Planning Knowledge for Hierarchical Planning". Computational Intelligence 6, pp. 12-24.
D. E. Wilkins (1990). "Can AI Planners Solve Practical Problems?". Computational Intelligence 6(4), 232-246.
P. Haslum and H. Geffner (2001). "Heuristic Planning with Time and Resources". IJCAI-01 Workshop on Planning with Resources.
R. Dechter, I. Meiri and J. Pearl (1991). "Temporal Constraint Networks". AIJ 49, pp. 61--95.
C. Boutilier, T. Dean and S. Hanks (1999). "Decision-Theoretic Planning: Structural Assumptions and Computational Leverage". JAIR 11, pp. 1-94.
C. Guestrin, D. Koller, R. Parr and S. Venkataraman (2003). "Efficient Solution Algorithms for Factored MDPs". JAIR 19, pp. 399-468.
S. W. Yoon, A. Fern and R. Givan (2002). "Inductive policy selection for first-order MDPs". UAI.
S. Sanner, C. Boutilier (2006). "Practical linear value-approximation techniques for first-order MDPs". UAI.
L. Pack Kaelbling, M. L. Littman and A. R. Cassandra (1998a). "Planning and Acting in Partially Observable Stochastic Domains". AIJ 101, pp. 99-134.
L. Pack Kaelbling, M. L. Littman and A. W. Moore (1998b). "Reinforcement Learning: A Survey". JAIR 4, pp. 237-285.
J. A. Boyan and A. M. Moore (1995). "Generalization in reinforcement learning: safely approximating the value function". Advances in Neural Information Processing Systems (NIPS).
G. J. Gordon (1995). "Stable Function Approximation in Dynamic Programming". Technical report CMU-CS-95-103, Carnegie Mellon University, Computer Science Department.
A. Y. Ng and M. Jordan (2000). "PEGASUS: A policy search method for large MDPs and POMDPs". UAI.
D. Andre and S. J. Russell (2002). "State Abstraction for Programmable Reinforcement Learning Agents". AAAI.
G. Tesauro (1995). "Temporal Difference Learning and TD-Gammon". Communications of the ACM, 38(3).
A. Y. Ng, A. Coates, M. Diel, V. Ganapathi, J. Schulte, B. Tse, E. Berger and E. Liang (2004). "Inverted autonomous helicopter flight via reinforcement learning". International Symposium on Experimental Robotics.
J.-C. Latombe (1991). Robot Motion Planning. Kluwer.
L. E. Kavraki, P. Svestka, J.-C. Latombe and M. H. Overmars (1996). "Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces". IEEE Trans. on Robotics and Automation, 12(4), pp. 566-580.
S. M. LaValle and J. J. Kuffner, Jr. (2001). "Randomized Kinodynamic Planning". International Journal of Robotics Research, 20(5), pp. 378-400.
C.-H. Yu, J. Chuang, B. Gerkey, G. J. Gordon, A. Ng (2004). "Open Loop Plans in Multi-Robot POMDPs". Technical report, Stanford University, Department of Computer Science.