Sample-Based Policy Iteration for Constrained DEC-POMDPs

, ,

We introduce constrained DEC-POMDPsCan extension of the standard DEC-POMDPs including additional constraints to the optimality of the long-term reward. Constrained DEC-POMDPs present natural framework for modeling cooperative multi-agent problems with limited resources or multiple objectives. To solve constrained DEC-POMDPs, we propose a novel sample-based policy iteration algorithm. The algorithm builds up on multi-agent dynamic programming and benefits from several advantages of recentdeveloped DEC-POMDP algorithms. It improves the joint policy by solving a serial of standard nonlinear programs and thereby lends itself the power of existing NLP solvers. The experimental results confirm that the algorithm can efficiently solve constrained DECPOMDPs while the general DEC-POMDP algorithms fail. It outperforms the leading DEC-POMDP method with higher value and less chance of constraint violation.

» Read on
 address = {Montpellier, France},
 author = {Feng Wu and Nicholas R. Jennings and Xiaoping Chen},
 booktitle = {Proceedings of the 20th European Conference on Artificial Intelligence (ECAI)},
 doi = {10.3233/978-1-61499-098-7-858},
 pages = {858-863},
 title = {Sample-Based Policy Iteration for Constrained {DEC-POMDPs}},
 year = {2012}