What do these badges mean?
- 🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
- 📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
- 💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
- 🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.
- 🚀Shipping2605.18747·May 18, 2026·~13 mincs.CLcs.AI
Code as Agent Harness
Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, +38
⭐ 1.3k stars / 9 repos📚 0 citesELI5Instead of treating code as just the output LLMs produce, this survey shows how code can be the central operating system for AI agents—the glue that lets them think, act, remember, and verify their work in a way humans can actually understand and check.
Problem solvedCurrent AI agents are hard to make reliable, debuggable, and controllable. Using code as the core infrastructure lets you write agent logic you can read, test, and fix—solving the black-box nature of pure neural approaches and making agents deployable in real systems.
- 🚀Shipping2605.16238·May 15, 2026·~9 mincs.AI
Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search
Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, +2
⭐ 171 stars / 10 repos📚 0 citesELI5An AI system uses a language model to automatically design and test disease forecast models by searching through combinations of mathematical approaches, then picks the best ones to predict flu, COVID, and RSV—matching expert predictions without needing humans to build the models.
Problem solvedDisease forecasting currently requires expert teams to manually build and tune models for each pathogen and location, which is slow and doesn't scale. This system automates that work so forecasts can be deployed quickly for new diseases or regions without waiting for scarce modeling expertise.
- 🚀Shipping2605.16217·May 15, 2026·~13 mincs.CLcs.AIcs.IR
Argus: Evidence Assembly for Scalable Deep Research Agents
Zhen Zhang, Liangcai Su, Zhuo Chen, Xiang Lin, +6
⭐ 123 stars / 23 repos📚 0 citesELI5A research AI system where one agent searches for evidence pieces while another agent tracks what's been found, spots what's missing, and assembles everything into a final answer—like coordinating a team to complete a jigsaw puzzle instead of having everyone solve it separately.
Problem solvedCurrent AI research agents waste compute by running parallel searches that duplicate effort instead of finding new information, and they struggle to fit all the results into context windows. This system makes parallel searching actually efficient by tracking what's been gathered and targeting searches at gaps.
- 🚀Shipping2605.16207·May 15, 2026·~8 mincs.AIcs.CL
Confirming Correct, Missing the Rest: LLM Tutoring Agents Struggle Where Feedback Matters Most
Tahreem Yasir, Wenbo Li, Sam Gilson, Sutapa Dey Tithi, +2
⭐ 439 stars / 22 repos📚 0 citesELI5Researchers tested whether AI tutors can actually tell the difference between correct answers, partially correct answers, and wrong answers—and found they're surprisingly bad at catching subtle mistakes that real tutors should catch.
Problem solvedSchools and education platforms are replacing human tutors with AI, but we didn't know if these AI tutors could actually diagnose student mistakes well enough to give useful feedback. This matters because bad diagnosis leads to bad teaching.
- 🚀Shipping2605.16205·May 15, 2026·~14 mincs.AIcs.CLcs.LG
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, +2
⭐ 348 stars / 28 repos📚 0 citesELI5Researchers tested different ways to build AI agents that play a cyber defense game where they can't see the full situation. They compared three design choices: what information to show the agent, how much the agent should think things through, and whether to use one big agent or split it into smaller specialist agents. They found that clean data representation and task splitting work best, but adding too much internal reasoning actually makes things worse.
Problem solvedTeams building AI agents for complex, partial-information tasks don't know which design patterns actually improve performance versus just burning compute. This study quantifies the cost-benefit tradeoffs of context, reasoning depth, and hierarchical decomposition so builders can stop guessing and start optimizing.
- 🚀Shipping2605.16142·May 15, 2026·~15 mincs.AIcs.LG
Property-Guided LLM Program Synthesis for Planning
Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
⭐ 156 stars / 10 repos📚 0 citesELI5Instead of telling an AI program-writer 'your code got 3 out of 10 tests right, try again,' this method checks if the code breaks a specific rule and shows exactly where it fails. The AI learns faster because it gets concrete feedback on what's wrong, not just a score.
Problem solvedLLMs waste compute by generating and testing many program candidates blindly. This approach provides early stopping and targeted feedback—when a program violates a formal property, evaluation halts immediately and the LLM sees a concrete counterexample, cutting candidate generation 7x and evaluation cost by orders of magnitude.
- 🚀Shipping2605.16117·May 15, 2026·~9 mincs.CL
SGR: A Stepwise Reasoning Framework for LLMs with External Subgraph Generation
Xin Zhang, Yang Cao, Baoxing Wu, Kai Song, +1
⭐ 199 stars / 10 repos📚 0 citesELI5A system that helps AI language models answer tricky questions by first building a small, focused map of relevant facts from a knowledge base, then walking through that map step-by-step to reach a reliable answer.
Problem solvedLanguage models often hallucinate or give inconsistent answers on complex reasoning tasks because they're working from just their training data. This grounds them in real, structured facts and makes their reasoning process traceable and verifiable.