💤Quietscore 71.6May 15, 2026·2605.16233cs.AIcs.CLcs.LGcs.MAeess.SY

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, Adrian Taylor, Marzia Zaman

Narrative

FORGE is a training-free memory evolution protocol for LLM agents that runs a population of instances in parallel, has a reflection agent convert failed trajectories into reusable text artifacts (rules, few-shot examples, or both), then broadcasts the best-performing instance's memory to all others between stages. Tested on CybORG CAGE-2, a stochastic partial-observability network defense benchmark where all four tested LLM families start with deeply negative zero-shot rewards, FORGE delivers 1.7–7.7× improvement over zero-shot and 29–72% over single-stream Reflexion across all 12 model-representation combinations, with major failure rates dropping to ~1%. The key mechanism is the population broadcast itself — graduation just trims compute.

No production traction yet. The GitHub repos referencing it are all arXiv aggregators and RSS scrapers, none implementing or extending the method. Zero citations on Semantic Scholar. The evaluation is also single-environment (CAGE-2 B-line only), so generalization claims are explicitly directional — builders should treat this as a promising pattern for adversarial, long-horizon agent tasks rather than a validated framework ready to drop into production.

Abstract

Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected natural-language memory for hierarchical ReAct agents. FORGE wraps a Reflexion-style inner loop, where a dedicated reflection agent (using the same underlying LLM, no distillation from a stronger model) converts failed trajectories into reusable knowledge artifacts: textual heuristics (Rules), few-shot demonstrations (Examples), or both (Mixed), with an outer loop that propagates the best-performing instance's memory to the population between stages and freezes converged instances via a graduation criterion. We evaluate on CybORG CAGE-2, a stochastic network-defense POMDP at a 30-step horizon against the B-line attacker, where all four tested LLM families (Gemini-2.5-Flash-Lite, Grok-4-Fast, Llama-4-Maverick, Qwen3-235B) exhibit strongly negative, heavy-tailed zero-shot rewards. Compared against both a zero-shot baseline and a Reflexion baseline (isolated single-stream learning), FORGE improves average evaluation return by 1.7-7.7$\times$ over zero-shot and by 29-72% over Reflexion in all 12 model-representation conditions, reducing major-failure rates (below $-100$) to as low as $\sim$1%. We find that (1) population broadcast is critical mechanism, with a no-graduation ablation confirming that broadcast carries the performance gains while graduation primarily saves compute; (2) Examples achieves the strongest returns for three of four models, Rules offers the best cost-reliability profile with $\sim$40% fewer tokens; and (3) weaker baseline models benefit disproportionately, suggesting FORGE may mitigate capability gaps rather than amplify strong models. All evidence is confined to CAGE-2 B-line; cross-family findings are directional evidence.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 75
Repos: 10
Citations: 0
Velocity: 0.00/d

GitHub repos (10)

CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16238', 'title': 'Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search', 'authors': 'Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, Nicholas G. Reich, Zahra Shamsi', 'link': 'https://arxiv.org/a”
ehijano/rss_fetch⭐ 11
“ </item> <item> <title>FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast</title> <link>https://arxiv.org/abs/2605.16233</link> <description>arXiv:2605.16233v1 Announce Type: cross Abstract: Can LLM agents improve decision-”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-11-FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast {% endnote %} **链接**: https://arxiv.org/abs/2605.16233 **作者**: Igor Bogdanov,Chung-Horng Lung,Thomas Kunz,Jie Gao,Adrian Taylor,Marzia Zaman **”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 2. FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast - **Authors**: Igor Bogdanov , Chung-Horng Lung , Thomas Kunz , Jie Gao , Adrian Taylor , Marzia Zaman - **URL**: [https://arxiv.org/abs/2605.16233](https://arxiv.org/abs/2605.16233) - **Abst”
noCharger/ai-dashboard⭐ 1
“ "Chung-Horng Lung", "Thomas Kunz" ], "url": "https://arxiv.org/abs/2605.16233", "categories": [ "cs.AI", "cs.CL",”
ValoraY/arXiv-daily⭐ 1
“<hr /> <h4 id="abstract_37">📄 Abstract</h4> <p>Deploying compound LLM agents in adversarial, partially observable sequential environments requires navigating several design dimensions: (1) what the agent sees, (2) how it reasons, and (3) how tasks are decomposed across component”
dexhunter/yanhua.ai⭐ 0
“ <div class="paper"> <h3>FORGE: Self-Evolving Agent Memory With No Weight Updates (2605.16233)</h3> <p><strong>Focus:</strong> Population-based memory broadcast for zero-shot evolution.</p> <p><a href="https://arxiv.org/abs/2605.16233">Read Paper</a></p”
daoyuly/new-blog⭐ 0
“ - **arXiv ID**: [2605.16233](https://arxiv.org/abs/2605.16233) - **研究方向**: memory”
brianbaldock/aigregator⭐ 0
“<h2 id="research">🔬 Research <a class="permalink" href="#research" title="Permalink">#</a></h2> <p><em>5 stories · cred-weighted 🟡 +0.0</em></p> <ul> <li>4 🔬 🟡 ▤×1 🏷️ agents, memory <strong>FORGE: self-evolving agent memory via population broadcast, no weight updates.</stron”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16217": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16223": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16232": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16233": "2026-05-18T07:51:44”