awesome-offline-rl
This is a collection of research and review papers for offline reinforcement learning (offline rl). Feel free to star and fork.
Maintainers:
- Haruka Kiyohara (Cornell University)
- Yuta Saito (Hanjuku-kaso Co., Ltd. / Cornell University)
We are looking for more contributors and maintainers! Please feel free to pull requests.
format:
- [title](paper link) [links]
- author1, author2, and author3. arXiv/conferences/journals/, year.
For any questions, feel free to contact: hk844@cornell.edu
Table of Contents
- Papers
- Open Source Software/Implementations
- Blog/Podcast
- Related Workshops
- Tutorials/Talks/Lectures
Papers
Review/Survey/Position Papers
Offline RL
- Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
- Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Bıyık, Anca Dragan, David Krueger, Dorsa Sadigh, and Dylan Hadfield-Menell. arXiv, 2023.
- A Survey on Offline Model-Based Reinforcement Learning
- Haoyang He. arXiv, 2023.
- Foundation Models for Decision Making: Problems, Methods, and Opportunities
- Sherry Yang, Ofir Nachum, Yilun Du, Jason Wei, Pieter Abbeel, Dale Schuurmans. arXiv, 2023.
- A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems
- Rafael Figueiredo Prudencio, Marcos R. O. A. Maximo, and Esther Luna Colombini. arXiv, 2022.
- Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
- Sergey Levine, Aviral Kumar, George Tucker, and Justin Fu. arXiv, 2020.
Off-Policy Evaluation and Learning
- A Review of Off-Policy Evaluation in Reinforcement Learning
- Masatoshi Uehara, Chengchun Shi, and Nathan Kallus. arXiv, 2022.
Related Reviews
- On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems
- Xiaocong Chen, Siyu Wang, Julian McAuley, Dietmar Jannach, and Lina Yao. arXiv, 2023.
- Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to Proximal Policy Optimization
- Mohamed-Amine Chadi and Hajar Mousannif. arXiv, 2023.
- Offline Evaluation for Reinforcement Learning-based Recommendation: A Critical Issue and Some Alternatives
- Romain Deffayet, Thibaut Thonet, Jean-Michel Renders, and Maarten de Rijke. arXiv, 2023.
- A Survey on Transformers in Reinforcement Learning
- Wenzhe Li, Hao Luo, Zichuan Lin, Chongjie Zhang, Zongqing Lu, and Deheng Ye. arXiv, 2023.
- Deep Reinforcement Learning: Opportunities and Challenges
- Yuxi Li. arXiv, 2022.
- A Survey on Model-based Reinforcement Learning
- Fan-Ming Luo, Tian Xu, Hang Lai, Xiong-Hui Chen, Weinan Zhang, and Yang Yu. arXiv, 2022.
- Survey on Fair Reinforcement Learning: Theory and Practice
- Pratik Gajane, Akrati Saxena, Maryam Tavakol, George Fletcher, and Mykola Pechenizkiy. arXiv, 2022.
- Accelerating Offline Reinforcement Learning Application in Real-Time Bidding and Recommendation: Potential Use of Simulation
- Haruka Kiyohara, Kosuke Kawakami, and Yuta Saito. arXiv, 2021.
- A Survey of Generalisation in Deep Reinforcement Learning
- Robert Kirk, Amy Zhang, Edward Grefenstette, and Tim Rocktäschel. arXiv, 2021.
Offline RL: Theory/Methods
- Value-Aided Conditional Supervised Learning for Offline RL
- Jeonghye Kim, Suyoung Lee, Woojun Kim, and Youngchul Sung. arXiv, 2024.
- Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning
- Lanqing Li, Hai Zhang, Xinyu Zhang, Shatong Zhu, Junqiao Zhao, and Pheng-Ann Heng. arXiv, 2024.
- DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
- Guanghe Li, Yixiang Shan, Zhengbang Zhu, Ting Long, and Weinan Zhang. arXiv, 2024.
- Deep autoregressive density nets vs neural ensembles for model-based offline reinforcement learning
- Abdelhakim Benechehab, Albert Thomas, and Balázs Kégl. arXiv, 2024.
- Context-Former: Stitching via Latent Conditioned Sequence Modeling
- Ziqi Zhang, Jingzehua Xu, Zifeng Zhuang, Jinxin Liu, and Donglin wang. arXiv, 2024.
- Adversarially Trained Actor Critic for offline CMDPs
- Honghao Wei, Xiyue Peng, Xin Liu, and Arnob Ghosh. arXiv, 2024.
- Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
- Yuanzhao Zhai, Yiying Li, Zijian Gao, Xudong Gong, Kele Xu, Dawei Feng, Ding Bo, and Huaimin Wang. arXiv, 2024.
- Solving Continual Offline Reinforcement Learning with Decision Transformer
- Kaixin Huang, Li Shen, Chen Zhao, Chun Yuan, and Dacheng Tao. arXiv, 2024.
- MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
- Mao Hong, Zhiyue Zhang, Yue Wu, and Yanxun Xu. arXiv, 2024.
- Reframing Offline Reinforcement Learning as a Regression Problem
- Prajwal Koirala and Cody Fleming. arXiv, 2024.
- Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback
- Yinglun Xu and Gagandeep Singh. arXiv, 2024.
- Policy-regularized Offline Multi-objective Reinforcement Learning
- Qian Lin, Chao Yu, Zongkai Liu, and Zifan Wu. arXiv, 2024.
- Differentiable Tree Search in Latent State Space
- Dixant Mittal and Wee Sun Lee. arXiv, 2024.
- Learning from Sparse Offline Datasets via Conservative Density Estimation
- Zhepeng Cen, Zuxin Liu, Zitong Wang, Yihang Yao, Henry Lam, and Ding Zhao. ICLR, 2024.
- Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
- Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, and Jingjing Liu. ICLR, 2024.
- PDiT: Interleaving Perception and Decision-making Transformers for Deep Reinforcement Learning
- Hangyu Mao, Rui Zhao, Ziyue Li, Zhiwei Xu, Hao Chen, Yiqun Chen, Bin Zhang, Zhen Xiao, Junge Zhang, and Jiangjin Yin. AAMAS, 2024.
- Critic-Guided Decision Transformer for Offline Reinforcement Learning
- Yuanfu Wang, Chao Yang, Ying Wen, Yu Liu, and Yu Qiao. AAAI, 2024.
- CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning
- Chenyu Sun, Hangwei Qian, and Chunyan Miao. AAAI, 2024.
- Neural Network Approximation for Pessimistic Offline Reinforcement Learning
- Di Wu, Yuling Jiao, Li Shen, Haizhao Yang, and Xiliang Lu. AAAI, 2024.
- A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
- Yinmin Zhang, Jie Liu, Chuming Li, Yazhe Niu, Yaodong Yang, Yu Liu, and Wanli Ouyang. AAAI, 2024.
- The Generalization Gap in Offline Reinforcement Learning
- Ishita Mediratta, Qingfei You, Minqi Jiang, and Roberta Raileanu. arXiv, 2023.
- Decoupling Meta-Reinforcement Learning with Gaussian Task Contexts and Skills
- Hongcai He, Anjie Zhu, Shuang Liang, Feiyu Chen, and Jie Shao. arXiv, 2023.
- MICRO: Model-Based Offline Reinforcement Learning with a Conservative Bellman Operator
- Xiao-Yin Liu, Xiao-Hu Zhou, Guo-Tao Li, Hao Li, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, and Zeng-Guang Hou. arXiv, 2023.
- Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization
- Carlos E. Luis, Alessandro G. Bottero, Julia Vinogradska, Felix Berkenkamp, and Jan Peters. arXiv, 2023.
- Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning
- Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, and Javier Del Ser. arXiv, 2023.
- Projected Off-Policy Q-Learning (POP-QL) for Stabilizing Offline Reinforcement Learning
- Melrose Roderick, Gaurav Manek, Felix Berkenkamp, and J. Zico Kolter. arXiv, 2023.
- Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
- Yifei Zhou, Ayush Sekhari, Yuda Song, and Wen Sun. arXiv, 2023.
- Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
- Qinjie Lin, Han Liu, and Biswa Sengupta. arXiv, 2023.
- Hierarchical Decision Transformer
- André Correia and Luís A. Alexandre. arXiv, 2023.
- Prompt-Tuning Decision Transformer with Preference Ranking
- Shengchao Hu, Li Shen, Ya Zhang, and Dacheng Tao. arXiv, 2023.
- Context Shift Reduction for Offline Meta-Reinforcement Learning
- Yunkai Gao, Rui Zhang, Jiaming Guo, Fan Wu, Qi Yi, Shaohui Peng, Siming Lan, Ruizhi Chen, Zidong Du, Xing Hu, Qi Guo, Ling Li, and Yunji Chen. arXiv, 2023.
- Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization
- Kun Lei, Zhengmao He, Chenhao Lu, Kaizhe Hu, Yang Gao, and Huazhe Xu. arXiv,