Reinforcement learning (RL) has emerged as a promising approach for optimizing traffic signal control (TSC) to ensure the efficient operation of transportation networks. However, the traditional trial-and-error technique in RL is usually impractical in real-world applications. Offline RL, which trains models using pre-collected datasets, is a more practical approach. However, this presents challenges such as suboptimal datasets and limited generalization of pre-trained models. To address this, we propose an offline-to-online RL framework for TSC that pre-trains a generalized model and quickly adapts to new traffic scenarios through online refinement. In the offline stage, we augment the pre-collected datasets to cover a diverse set of possible scenarios and use an offline RL method to pretrain a control model. To ensure generalization, we use FRAP-like network as our base model, which is designed to learn the basic logic for signal control. In the online stage, we introduce a discrepancy measure to tackle inconsistencies between offline pre-trained models and online scenarios and prioritize samples based on it. In the experiments, the proposed approach achieves competitive performance and reduces the training time needed for learning in new scenarios, compared to several baselines.
» Read on@inproceedings{MWiros23,
address = {Detroit, USA},
author = {Jinming Ma and Feng Wu},
booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
month = {October},
pages = {5567-5573},
title = {Effective Traffic Signal Control with Offline-to-Online Reinforcement Learning},
year = {2023}
}