当前位置: 首 页 - 科学研究 - 学术报告 - 正文

必威、所2020年系列学术活动(第272场):冯兴东 教授 上海财经大学

发表于: 2020-11-16   点击: 

报告题目:Deep Reinforcement Learning via Noncrossing Quantile Regression

报 告 人:冯兴东 教授 上海财经大学

报告时间:2020年11月19日 10:10-11:10

报告地点:腾讯会议 ID:241 771 974 会议密码:123456

校内联系人:朱复康  fzhu@jlu.edu.cn


报告摘要:

      Distributional reinforcement learning (DRL) estimates the distribution over future returns instead of the mean to more efficiently capture the intrinsic uncertainty of MDPs. However, batch-based DRL algorithms cannot guarantee the non-decreasing property of learned quantile curves especially at the early training stage, leading to abnormal distribution estimates and reduced model interpretability. To address these issues, we introduce a general DRL framework by using non-crossing quantile regression to ensure the monotonicity constraint within each sampled batch, which can be incorporated with some well-known DRL algorithm. We demonstrate the validity of our method from both the theory and model implementation perspectives. Experiments on Atari 2600 Games show that some state-of-art DRL algorithms with the non-crossing modification can significantly outperform their baselines in terms of faster convergence speeds and better testing performance. In particular, our method can effectively recover the distribution information and thus dramatically increase the exploration efficiency when the reward space is extremely sparse.


报告人简介:

       冯兴东,上海财经大学统计与管理学院经理、统计学教授、博士生导师。研究领域为数据降维、稳健方法、分位数回归以及在经济问题中的应用、大数据统计计算等,在国际顶级统计学期刊JASA、AoS、JRSSB、Biometrika上发表论文多篇。2018年入选国际统计学会推选会员(Elected member),2019年全国青年统计学家协会副会长,2019年全国统计教材编审委员会第七届委员会专业委员(数据科学与大数据技术应用组),2020年担任国务院学科评议组(统计学)成员。