报告题目:Queueing Network Controls via Deep Reinforcement Learning
报 告 人:戴建岗 教授 Cornell University and 香港中文大学(深圳)
报告时间:2023年6月27日星期二下午3:00-4:00
报告地点:腾讯会议 ID:354-247-607,会议密码:41819
校内联系人:曹春玲 caocl@jlu.edu.cn
报告摘要:Conservative update methods such as Trust Region policy optimization and Proximal policy optimization (PPO) have become the dominant reinforcement learning algorithms because of their ease of implementation and good practical performance. A conventional setup for notoriously difficult queueing network control problems is a Markov decision problem (MDP) that has three features: infinite state space, unbounded costs, and long-run average cost objective. We extend the theoretical framework of these conservative update methods for such MDP problems. The resulting PPO algorithm is tested on a parallel-server system and large-size multiclass queueing networks. The algorithm consistently generates control policies that outperform state-of-art heuristics in literature in a variety of load conditions from light to heavy traffic. These policies are demonstrated to be near-optimal when the optimal policy can be computed.A key to the successes of our PPO algorithm is the use of three variance reduction techniques in estimating the relative value function via sampling. First, we use a discounted relative value function as an approximation of the relative value function. Second, we propose regenerative simulation to estimate the discounted relative value function. Finally, we incorporate the approximating martingale-process method into the regenerative estimator. This is joint work with Mark Gluzman at Meta.
报告人简介:Jiangang Dai is the Leon C. Welch Professor of Engineering in the School of Operations Research and Information Engineering. He also serves as the Dean of School of Data Science at The Chinese University of Hong Kong, Shenzhen.He received his BS and MS in Mathematics from Nanjing University and PhD in Mathematics from Stanford University.Prior joining Cornell, he held the Chandler Family Chair Professorship in the School of Industrial and Systems Engineering at Georgia Institute of Technology, where he was a faculty member from 1990 to 2012. He is an elected fellow of Institute of Mathematical Statistics and an elected fellow of Institute for Operations Research and the Management Sciences (INFORMS). He has received a number of awards for research contributions including the Best Publication Award twice, in 1997 and 2017, The Erlang Prize in 1998, all from the Applied Probability Society of INFORMS, and the ACM SIGMETRICS Achievement Award in 2018. He served as the Editor-In-Chief for\emph{Mathematics of Operations Research} (MOR) from 2012 to 2019.
Jiangang Dai's research interests include stochastic processing networks, fluid and diffusion models of queueing networks, reflecting Brownian motions, Stein's method, reinforcement learning, data center modeling, hospital inpatient flow management, and airline revenue management.