Skip to content

wjn1996/Awesome-LLM-Reasoning-Openai-o1-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 

Repository files navigation

Awesome LLM Reasoning Openai-o1 Survey

Awesome License Visitors Stars Forks

The related works and background techniques about OpenAI o1, including LLM reasoning, self-play reinforcement learning, complex logic reasoning, scaling law, etc.

Introduction

Survey Papers

  • A Survey on Self-play Methods in Reinforcement Learning [Paper] (2024)
    • Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang
    • Tencent, Tsinghua

Related Papers

Complex Logical Reasoning

  • Generative Language Modeling for Automated Theorem Proving [Paper] (2020)
    • Stanislas Polu, Ilya Sutskever
    • OpenAI
  • Hypothesis Search: Inductive Reasoning with Language Models [Paper] (ICLR 2024)
    • Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman
    • Stanford, Autodesk Research
  • Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement [Paper] (ICLR 2024)
    • Linlu Qiu, Liwei Jiang, Ximing Lu, Melanie Sclar, Valentina Pyatkin, Chandra Bhagavatula, Bailin Wang, Yoon Kim, Yejin Choi, Nouha Dziri, Xiang Ren
    • MIT, Allen AI, UW, USC
  • Training Verifiers to Solve Math Word Problems [Paper] (2021)
    • Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman
    • OpenAI
  • To CoT or not to CoT? Chain-of-thought Helps Mainly on Math and Symbolic Reasoning [Paper] (2024.9)
    • Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett
    • The University of Texas at Austin, Johns Hopkins University, Princeton University

Reasoning Bootstrapping

  • STaR: Self-Taught Reasoner Bootstrapping Reasoning With Reasoning [Paper] [Github] (NeurIPS 2022)
    • Eric Zelikman, Yuhuai Wu, Jesse Mu, Noah D. Goodman
    • Stanford, Google
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking [Paper] [Github] (2022)
    • Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, Noah D. Goodman
    • Stanford, Notbad AI
  • Training Chain-of-thought via Latent-variable Inference [Paper] (NeurIPS 2023)
    • Du Phan, Matthew D. Hoffman, David Dohan, Sholto Douglas, Tuan Anh Le, Aaron Parisi, Pavel Sountsov, Charles Sutton, Sharad Vikram, Rif A. Saurous
    • Google
  • Chain-of-thought Reasoning without Prompting [Paper] (2024)
    • Xuezhi Wang, Denny Zhou
    • Google DeepMind
  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers [Paper] [Github] (2024)
    • Zhenting Qi, Mingyuan Ma, Jiahang Xu, Li Lyna Zhang, Fan Yang, Mao Yang
    • MSRA, Harvard University

Reasoning Scaling Law

  • Large Language Monkeys: Scaling Inference Compute with Repeated Sampling [Paper] (2024)
    • Bradley Brown, Jordan Juravsky, Ryan Ehrlich, Ronald Clark, Quoc V. Le, Christopher Ré, Azalia Mirhoseini
    • Stanford, Oxford, Google DeepMind
  • Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [Paper] (2024)
    • Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar
    • UC Berkeley, Google DeepMind
  • An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models [Paper] (2024)
    • Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang
    • Tsinghua, CMU
  • Training Language Models to Self-Correct via Reinforcement Learning [Paper] (2024)
    • Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang
    • Google DeepMind
  • From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond [[https://arxiv.org/abs/2411.03590]] (2024)
    • Harsha Nori, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, Eric Horvitz
    • Microsoft, OpenAI

Self-play Learning

  • Mastering Chess and Shogi by Self-play with a General Reinforcement Learning Algorithm [Paper] (2017)
    • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez,Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
    • Google DeepMind
  • Language Models Can Teach Themselves to Program Better [Paper] [Github] (ICLR 2023)
    • Patrick Haluptzok, Matthew Bowers, Adam Tauman Kalai
    • Microsoft Research, MIT
  • Large Language Models Can Self-Improve [Paper]
    • Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, Jiawei Han
    • University of Illinois at Urbana-Champaign, Google
  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models [Paper] [Github] (ICML 2024)
    • Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
    • UCLA
  • Self-Play Preference Optimization for Language Model Alignment [Paper] [Github] (2024)
    • Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu
    • UCLA
  • Scalable Online Planning via Reinforcement Learning Fine-Tuning [Paper] (NeurIPS 2021)
    • Arnaud Fickinger, Hengyuan Hu, Brandon Amos, Stuart Russell, Noam Brown
  • Generative Verifiers: Reward Modeling as Next-Token Prediction [Paper] (2024)
    • Lunjun Zhang, Arian Hosseini, Hritik Bansal, Mehran Kazemi, Aviral Kumar, Rishabh Agarwal
    • Google DeepMind
  • Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B [Paper] (2024)
    • Di Zhang, Xiaoshui Huang, Dongzhan Zhou, Yuqiang Li, Wanli Ouyang
    • Fudan University, Shanghai AI Lab
  • Interpretable Contrastive Monte Carlo Tree Search Reasoning [Paper] (2024)
    • Zitian Gao, Boye Niu, Xuzheng He, Haotian Xu, Hongzhang Liu, Aiwei Liu, Xuming Hu, Lijie Wen
    • The University of Sydney, Peking University, Xiaohongshu, Shanghai AI Lab, Tsinghua, HKUST

Step-wise and Process-based Optimization

  • Solving Math Word Problems with Process-and Outcome-based Feedback [Paper] (2022)
    • Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah Siegel, Lisa Wang, Antonia, Creswell, Geoffrey Irving, Irina Higgins
    • Google DeepMind
  • Thinking Fast and Slow With Deep Learning and Tree Search [Paper] (NeurIPS 2017)
    • Thomas Anthony, Zheng Tian, David Barber
    • University College Londo, Alen
  • Let’s Verify Step by Step [Paper] (2023)
    • Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe
    • OpenAI
  • LLM Critics Help Catch LLM Bugs [Paper] (2024)
    • Nat McAleese, Rai Michael Pokorny, Juan Felipe Ceron Uribe, Evgenia Nitishinskaya, Maja Trebacz, Jan Leike
    • OpenAI
  • Self-critiquing Models for Assisting Human Evaluators [Paper] (2022)
    • William Saunders, Catherine Yeh, Jeff Wu, Steven Bills, Long Ouyang, Jonathan Ward, Jan Leike
    • OpenAI
  • Improve Mathematical Reasoning in Language Models by Automated Process Supervision [Paper] (2024)
    • Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav Rastogi
    • Google DeepMind
  • Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [Paper] (2024)
    • Chaojie Wang, Yanchen Deng, Zhiyi Lyu, Liang Zeng, Jujie He, Shuicheng Yan, Bo An
    • Skywork AI, NTU
  • Math-shepherd: Verify and Reinforce LLMs step-by-step without Human Annotations [Paper] (ACL 2024)
    • Peiyi Wang, Lei Li, Zhihong Shao, Runxin Xu, Damai Dai, Yifei Li, Deli Chen, Yu Wu, Zhifang Sui
    • Peking University, DeepSeek AI, HKU, Tsinghua University, The Ohio State University

Social News

Open-source Projects

Communication Groups

Contributions

We welcome every researcher who contributes to this repository.

About

The related works and background techniques about Openai o1

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •