转载：2025 我最喜欢的 LLM x AI 论文集

SincereCSL 收录于 LLM Study AI Study

2025-12-25 10:00:00 约 829 字预计阅读 2 分钟次阅读

2025 我最喜欢的 LLM x AI 论文集

2025年快过去了。

这一年，我几乎每天都有坚持读论文和分享论文，获益匪浅。

下面是我整理的个人年度最喜欢论文集：

1、Test Time Scaling

“Thinking” 到底是什么？

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

s1: Simple test-time scaling

2. Efficient Reasoning

“如果思考一种budget”

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Scalable Chain of Thoughts via Elastic Reasoning

3. Reasoning Analysis

该如何更好理解 “Reasoning” ？

(How) Do reasoning models reason?

DeepSeek-R1 Thoughtology: Let’s about LLM reasoning

Rethinking Reflection in Pre-Training

Thinking vs. Doing: Agents that Reason by Scaling Test-Time Interaction

4. CLI Agent

2024年文章，但深度影响了2025。

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

5. LLM X RL Agentic

为此，我创造了一个词，“协议token”

ReTool: Reinforcement Learning for Strategic Tool Use in LLMs

ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning

ReCall: Learning to Reason with Tool Call for LLMs via Reinforcement Learning

6. Parallel Reasoning

并行 reasoning，超越单线程的想象！

Learning Adaptive Parallel Reasoning with Language Models

Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

7. RL X Reasoning,

RLVR 这四篇，给 RL X Reasoning 极速升温+降温！

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Learning to Reason without External Rewards

Spurious Rewards: Rethinking Training Signals in RLVR

8. Agent, interaction Deep research agent

经验，记忆，交互，workflow，围绕

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Alita-G: Self-Evolving Generative Agent for Agent Generation

Sleep-time Compute: Beyond Inference Scaling at Test-time

WebThinker: Empowering Large Reasoning Models with Deep Research Capability

WebDancer: Towards Autonomous Information Seeking Agency

9. Risk modeling X LLM

“让贝叶斯再次伟大！”

Model Predictive Task Sampling for Efficient and Robust Adaptation

Can Prompt Difficulty be Online Predicted for Accelerating RL Finetuning of Reasoning Models?

10. Multi-Agentic

Multi Agent该怎么用？

Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning

CodeContests+: High-Quality Test Case Generation for Competitive Programming

11. Sentient Agent

AI + 人文，情怀满满！

RLVER: Reinforcement Learning with Verifiable Emotion Rewards for Empathetic Agents

Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models

12: LLM security and alignment

想要懂你真的不容易, LLM!

Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models

Natural Emergent Misalignment from Reward Hacking in Production RL

Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs

13. Model Steering

我要控制我自己，LLM!

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

Persona Vectors: Monitoring and Controlling Character Traits in Language Models

How to Train Your Advisor: Steering Black-Box LLMs with Advisor Models

最后，这个 list 是我的个人选择，因为时间精力有限，很多很好的工作，或许我都没有机会读到，感谢论文的作者们，带我领略智慧的风光！

2025年快过去了，我很怀念它！

2026年，会继续每天把我看到的最优秀的工作介绍给大家，很多深度的分析会专门放在订阅内容中，欢迎你成为我的订阅用户，我们一起成长！