Awesome LLM Reasoning
Curated collection of papers and resources on how to unlock the reasoning ability of LLMs and MLLMs.
🗂️ Table of Contents
Also check out <a href=https://github.com/atfortes/Awesome-Controllable-Generation>Awesome-Controllable-Generation.
Survey
-
Reasoning with Language Model Prompting: A Survey.
ACL 2023
Shuofei Qiao, Yixin Ou, Ningyu Zhang, Xiang Chen, Yunzhi Yao, Shumin Deng, Chuanqi Tan, Fei Huang, Huajun Chen. [Paper] [Code], 2022.12
-
Towards Reasoning in Large Language Models: A Survey.
ACL 2023 Findings
-
Large Language Models for Mathematical Reasoning: Progresses and Challenges.
ACL 2024
Janice Ahn, Rishu Verma, Renze Lou, Di Liu, Rui Zhang, Wenpeng Yin. [Paper], 2024.2
-
Puzzle Solving using Reasoning of Large Language Models: A Survey.
Preprint
Panagiotis Giadikiaroglou, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou. [Paper] [Code], 2024.2
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey.
Preprint
Xun Liang, Shichao Song, Zifan Zheng, Hanyu Wang, Qingchen Yu, Xunkai Li, Rong-Hua Li, Feiyu Xiong, Zhiyu Li. [Paper] [Code], 2024.7
Analysis
-
Can language models learn from explanations in context?
EMNLP 2022
Andrew K. Lampinen, Ishita Dasgupta, Stephanie C. Y. Chan, Kory Matthewson, Michael Henry Tessler, Antonia Creswell, James L. McClelland, Jane X. Wang, Felix Hill. [Paper], 2022.4
-
Emergent Abilities of Large Language Models.
TMLR 2022
Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus. [Paper] [Blog], 2022.6
-
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them.
ACL 2023 Findings
Mirac Suzgun, Nathan Scales, Nathanael Schärli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, Jason Wei. [Paper] [Code], 2022.10
-
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters.
ACL 2023
Boshi Wang, Sewon Min, Xiang Deng, Jiaming Shen, You Wu, Luke Zettlemoyer, Huan Sun. [Paper] [Code], 2022.12
-
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning.
ACL 2023
Omar Shaikh, Hongxin Zhang, William Held, Michael Bernstein, Diyi Yang. [Paper], 2022.12
-
Dissociating language and thought in large language models: a cognitive perspective.
ICBINB NeurIPS Workshop 2023
Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko. [Paper], 2023.1
-
Large Language Models Can Be Easily Distracted by Irrelevant Context.
ICML 2023
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed Chi, Nathanael Schärli, Denny Zhou. [Paper], 2023.1
-
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity.
AACL 2023
Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung. [Paper], 2023.2
-
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting.
NeurIPS 2023
Miles Turpin, Julian Michael, Ethan Perez, Samuel R. Bowman. [Paper] [Code], 2023.5
-
Faith and Fate: Limits of Transformers on Compositionality.
NeurIPS 2023
Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Peter West, Chandra Bhagavatula, Ronan Le Bras, Jena D. Hwang, Soumya Sanyal, Sean Welleck, Xiang Ren, Allyson Ettinger, Zaid Harchaoui, Yejin Choi. [Paper], 2023.5
-
Measuring Faithfulness in Chain-of-Thought Reasoning.
Preprint
Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson Kernion, Kamilė Lukošiūtė, Karina Nguyen, Newton Cheng, Nicholas Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson, Sam McCandlish, Sandipan Kundu, Saurav Kadavath, Shannon Yang, Thomas Henighan, Timothy Maxwell, Timothy Telleen-Lawton, Tristan Hume, Zac Hatfield-Dodds, Jared Kaplan, Jan Brauner, Samuel R. Bowman, Ethan Perez. [Paper], 2023.7
-
Large Language Models Cannot Self-Correct Reasoning Yet.
Preprint
Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou. [Paper], 2023.10
-
The Impact of Reasoning Step Length on Large Language Models.
Preprint
Mingyu Jin, Qinkai Yu, Dong shu, Haiyan Zhao, Wenyue Hua, Yanda Meng, Yongfeng Zhang, Mengnan Du. [Paper], 2024.1
-
Premise Order Matters in Reasoning with Large Language Models.
ICML 2024
Xinyun Chen, Ryan A. Chi, Xuezhi Wang, Denny Zhou. [Paper], 2024.2
-
Do Large Language Models Latently Perform Multi-Hop Reasoning?
Preprint
Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel. [Paper], 2024.2
-
How Far Are We from Intelligent Visual Deductive Reasoning?
ICLR 2024 AGI workshop
Yizhe Zhang, He Bai, Ruixiang Zhang, Jiatao Gu, Shuangfei Zhai, Josh Susskind, Navdeep Jaitly. [Paper], 2024.3
-
A Peek Into Token Bias: Large Language Models Are Not Yet Genuine Reasoners.
ICML Workshop 2024
Bowen Jiang, Yangxinyu Xie, Zhuoqun Hao, Xiaomeng Wang, Tanwi Mallick, Weijie J. Su, Camillo J. Taylor, Dan Roth. [Paper] [Code], 2024.6
Technique
🔤 Reasoning in Large Language Models - An Emergent Ability
-
Chain of Thought Prompting Elicits Reasoning in Large Language Models.
NeurIPS 2022
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, Denny Zhou. [Paper] [Blog], 2022.1
-
Self-consistency improves chain of thought reasoning in language models.
ICLR 2023
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou. [Paper], 2022.3
-
Iteratively Prompt Pre-trained Language Models for Chain of Thought.
EMNLP 2022
-
Least-to-most prompting enables complex reasoning in large language models.
ICLR 2023
Denny Zhou, Nathanael Schärli, Le Hou, Jason Wei, Nathan Scales, Xuezhi Wang, Dale Schuurmans, Claire Cui, Olivier Bousquet, Quoc Le, Ed Chi. [Paper], 2022.5
-
Large Language Models are Zero-Shot Reasoners.
NeurIPS 2022
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, Yusuke Iwasawa. [Paper], 2022.5
-
Making Large Language Models Better Reasoners with Step-Aware Verifier.
ACL 2023
Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen. [Paper], 2022.6
-
Large Language Models Still Can't Plan.
NeurIPS 2022
Karthik Valmeekam, Alberto Olmo, Sarath Sreedharan, Subbarao Kambhampati. [Paper] [Code], 2022.6
-
Solving Quantitative Reasoning Problems with Language Models.
NeurIPS 2022
Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, Yuhuai Wu, Behnam Neyshabur, Guy Gur-Ari, Vedant Misra. [Paper] [Blog], 2022.6
-
Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning.
ICLR 2023
Pan Lu, Liang Qiu, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, Tanmay Rajpurohit, Peter Clark, Ashwin Kalyan. [Project] [Paper] [Code], 2022.9
-
Ask Me Anything: A simple strategy for prompting language models.
ICLR 2023
Simran Arora, Avanika Narayan, Mayee F. Chen, Laurel Orr, Neel Guha, Kush Bhatia, Ines Chami, Frederic Sala, Christopher Ré.