Offline Meta-Reinforcement Learning with Contrastive Prediction

Xu Han; Feng Wu

doi:10.3778/j.issn.1673-9418.2203074

Traditional reinforcement learning algorithms require lots of online interaction with the environment for training and cannot effectively adapt to changes in the task environment, making them difficult to apply to real-world problems. Offline meta-reinforcement learning provides an effective way to quickly adapt to a new task by using replay datasets of multiple tasks for offline policy learning. Applying offline meta-reinforcement learning to complex tasks will face two challenges. Firstly, reinforcement learning algorithms overestimate the value of state-action pairs not contained in the dataset and thus select non-optimal actions, resulting in poor performance. Secondly, meta-reinforcement learning algorithms need not only to learn the policy but also to have robust and efficient task inference capabilities. To address the above problems, this paper proposes an offline meta-reinfor-cement learning algorithm based on contrastive prediction. To cope with the problem of overestimation of value functions, the proposed algorithm uses behavior cloning to encourage policy to prefer actions included in the dataset. To improve the task inference capability of meta-learning, the proposed algorithm uses recurrent neural networks for task inference on the contextual trajectories of the agents and uses contrastive learning and prediction networks to analyze and distinguish potential structures in different task trajectories. Experimental results show that the agents trained by the proposed algorithm can score more than 25 percentage points when faced with unseen tasks, and it has higher meta-training efficiency and better generalization performance compared with existing methods.

Citation

Xu Han, Feng Wu. Offline Meta-Reinforcement Learning with Contrastive Prediction. Journal of Frontiers of Computer Science and Technology (FCST), 17(8):1917-1927, 2023.

BibTex

Save as file

@article{HWjfcst23,
 author = {Xu Han and Feng Wu},
 doi = {10.3778/j.issn.1673-9418.2203074},
 journal = {Journal of Frontiers of Computer Science and Technology (FCST)},
 number = {8},
 pages = {1917-1927},
 title = {Offline Meta-Reinforcement Learning with Contrastive Prediction},
 volume = {17},
 year = {2023}
}

Offline Meta-Reinforcement Learning with Contrastive Prediction

Xu Han, Feng Wu

Abstract

Citation

BibTex

External Links