Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Aijun Bai; Feng Wu; Xiaoping Chen

doi:10.1145/2717316

Markov decision processes (MDPs) provide a rich framework for planning under uncertainty. However, exactly solving a large MDP is usually intractable due to the ''curse of dimensionality'' - the state space grows exponentially with the number of state variables. Online algorithms tackle this problem by avoiding computing a policy for the entire state space. On the other hand, since online algorithm has to find a near-optimal action online in almost real time, the computation time is often very limited. In the context of reinforcement learning, MAXQ is a value function decomposition method that exploits the underlying structure of the original MDP and decomposes it into a combination of smaller subproblems arranged over a task hierarchy. In this article, we present MAXQ-OP --- a novel online planning algorithm for large MDPs that utilizes MAXQ hierarchical decomposition in online settings. Compared to traditional online planning algorithms, MAXQ-OP is able to reach much more deeper states in the search tree with relatively less computation time by exploiting MAXQ hierarchical decomposition online. We empirically evaluate our algorithm in the standard Taxi domain --- a common benchmark for MDPs --- to show the effectiveness of our approach. We have also conducted a long-term case study in a highly complex simulated soccer domain and developed a team named WrightEagle that has won five world champions and five runners-up in the recent 10 years of RoboCup Soccer Simulation 2D annual competitions. The results in the RoboCup domain confirm the scalability of MAXQ-OP to very large domains.

Citation

Aijun Bai, Feng Wu, Xiaoping Chen. Online Planning for Large Markov Decision Processes with Hierarchical Decomposition. ACM Transactions on Intelligent Systems and Technology (ACM TIST), 6(4)(45)August 2015.

BibTex

Save as file

@article{BWCtist15,
 author = {Aijun Bai and Feng Wu and Xiaoping Chen},
 doi = {10.1145/2717316},
 journal = {ACM Transactions on Intelligent Systems and Technology (ACM TIST)},
 month = {August},
 number = {45},
 title = {Online Planning for Large Markov Decision Processes with Hierarchical Decomposition},
 volume = {6(4)},
 year = {2015}
}

External Links

Google Scholar — Cited by 31
Crossref — DOI: 10.1145/2717316
Engineering Village — Accession Number: 20153101079597
Web of Science — Accession Number: WOS:000360007500003

Online Planning for Large Markov Decision Processes with Hierarchical Decomposition

Aijun Bai, Feng Wu, Xiaoping Chen

Abstract

Citation

BibTex

External Links