Scaling Long-Horizon LLM Agent
via Context-Folding

Weiwei Sun1,2 Miao Lu1,3 Zhan Ling1 Kang Liu1 Xuesong Yao1 Yiming Yang2 Jiecao Chen1,†
1ByteDance Seed 2Carnegie Mellon University 3Stanford University

Abstract

Large language model (LLM) agents have shown remarkable capabilities in tackling complex, long-horizon problems. However, scaling agents to even longer horizons is fundamentally constrained by the design of agentic frameworks that linearly accumulate the entire interaction history into a single, ever-expanding context.

We propose Context-Folding, an agentic mechanism that allows the model to actively manage its working context. The agent can create temporary sub-trajectories for localized subtasks (branch), then summarize and rejoin the main thread (return), with intermediate steps being "folded" away. We also introduce FoldGRPO, a novel RL framework with dense, token-level process rewards that trains agents to effectively acquire this capability.

Our Folding Agent achieves 62.0% on BrowseComp-Plus and 58.0% on SWE-Bench Verified using only a 32K token budget, surpassing baselines requiring 327K contexts and significantly outperforming summarization-based methods.

Key Idea

Context Folding Example

Example of an agent folding context in long-context tasks.

Method

Method Overview

Overview of our Context-Folding framework and FoldGRPO training.

Results

Performance on BrowseComp-Plus (N=150) and SWE-Bench Verified (N=500)

Model Peak Length Max Length BrowseComp-Plus
Pass@1
SWE-Bench
Pass@1
ReAct Agent with 100B+ LLM
GPT-5 327K 327K 79.3% 71.8%
GPT-4.1 327K 327K 64.0% 48.6%
DeepSeek-V3.1 327K 327K 61.3% 61.0%
GLM-4.5-Air 327K 327K 56.6% 57.6%
Qwen3-235B-A22B 327K 327K 56.0% 34.4%
ReAct Agent
Seed-OSS-36B 32K 32K 28.6% 43.6%
  + RL (GRPO) 32K 32K 44.6% 48.0%
Seed-OSS-36B (327K) 327K 327K 47.8% 55.2%
  + RL (GRPO) 327K 327K 54.0% 57.4%
Summary Agent
Seed-OSS-36B 32K 32K × 10 38.6% 48.8%
  + RL (GRPO) 32K 32K × 10 52.7% 55.0%
Folding Agent (Ours)
Seed-OSS-36B 32K 32K × 10 42.0% 49.2%
  + RL (GRPO) 32K 32K × 10 56.7% 56.4%
  + RL (FoldGRPO) 32K 32K × 10 62.0% 58.0%

Case Study

Citation

@article{sun2025scaling, title={Scaling Long-Horizon LLM Agent via Context-Folding}, author={Sun, Weiwei and Lu, Miao and Ling, Zhan and Liu, Kang and Yao, Xuesong and Yang, Yiming and Chen, Jiecao}, journal={arXiv preprint}, year={2025} }