转载:2025 我最喜欢的 LLM x AI 论文集

2025 我最喜欢的 LLM x AI 论文集
2025年快过去了。
这一年,我几乎每天都有坚持读论文和分享论文, 获益匪浅。
下面是我整理的个人年度最喜欢论文集:
1、Test Time Scaling
- “Thinking” 到底是什么?
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
2. Efficient Reasoning
- “如果思考一种budget”
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Scalable Chain of Thoughts via Elastic Reasoning
3. Reasoning Analysis
- 该如何更好理解 “Reasoning” ?
(How) Do reasoning models reason?
DeepSeek-R1 Thoughtology: Let’s about LLM reasoning
Rethinking Reflection in Pre-Training
Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction
4. CLI Agent
- 2024年文章,但深度影响了2025。
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
5. LLM X RL Agentic
- 为此,我创造了一个词,“协议token”
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning
6. Parallel Reasoning
- 并行 reasoning,超越单线程的想象!
Learning Adaptive Parallel Reasoning with Language Models
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
7. RL X Reasoning,
- RLVR 这四篇,给 RL X Reasoning 极速升温+降温!
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Learning to Reason without External Rewards
Spurious Rewards: Rethinking Training Signals in RLVR
8. Agent, interaction Deep research agent
- 经验,记忆,交互,workflow,围绕
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving
Alita-G: Self-Evolving Generative Agent for Agent Generation
Sleep-time Compute: Beyond Inference Scaling at Test-time
WebThinker: Empowering Large Reasoning Models with Deep Research Capability
WebDancer: Towards Autonomous Information Seeking Agency
9. Risk modeling X LLM
- “让贝叶斯再次伟大!”
Model Predictive Task Sampling for Efficient and Robust Adaptation
Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?
10. Multi-Agentic
- Multi Agent该怎么用?
Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
CodeContests+: High-Quality Test Case Generation for Competitive Programming
11. Sentient Agent
- AI + 人文,情怀满满!
RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models
12: LLM security and alignment
- 想要懂你真的不容易, LLM!
Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
Natural Emergent Misalignment from Reward Hacking in Production RL
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs
13. Model Steering
- 我要控制我自己,LLM!
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
Persona Vectors: Monitoring and Controlling Character Traits in Language Models
How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models
最后,这个 list 是我的个人选择,因为时间精力有限,很多很好的工作,或许我都没有机会读到,感谢论文的作者们,带我领略智慧的风光!
2025年快过去了,我很怀念它!
2026年,会继续每天把我看到的最优秀的工作介绍给大家,很多深度的分析会专门放在订阅内容中,欢迎你成为我的订阅用户,我们一起成长!