What do these badges mean?
- 🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
- 📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
- 💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
- 🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.
- 13 min read🚀Shipping2605.18747·May 18, 2026cs.CLcs.AI
Code as Agent Harness
Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, +38
⭐ 1.3k stars / 9 repos📚 0 citesELI5Instead of treating code as just the output LLMs produce, this survey shows how code can be the central operating system for AI agents—the glue that lets them think, act, remember, and verify their work in a way humans can actually understand and check.
Problem solvedCurrent AI agents are hard to make reliable, debuggable, and controllable. Using code as the core infrastructure lets you write agent logic you can read, test, and fix—solving the black-box nature of pure neural approaches and making agents deployable in real systems.
- 13 min read🚀Shipping2605.16215·May 15, 2026cs.AIcs.CL
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs
Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan, Sahaj Vaidya, +4
⭐ 434 stars / 25 repos📚 0 citesELI5Researchers built the first completely transparent medical AI model where you can see everything: what data it learned from, how it was cleaned, how it was trained, and how it works. They combined medical question datasets, added clinician-verified practice guidelines, and had doctors validate every step.
Problem solvedMedical AI systems need to be trustworthy and auditable for doctors to use them, but most 'open' models hide their training data and methods. This makes it impossible to validate they're safe or understand why they give certain answers—a critical problem in healthcare.
- 14 min read🚀Shipping2605.16205·May 15, 2026cs.AIcs.CLcs.LG
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, +2
⭐ 348 stars / 28 repos📚 0 citesELI5Researchers tested different ways to build AI agents that play a cyber defense game where they can't see the full situation. They compared three design choices: what information to show the agent, how much the agent should think things through, and whether to use one big agent or split it into smaller specialist agents. They found that clean data representation and task splitting work best, but adding too much internal reasoning actually makes things worse.
Problem solvedTeams building AI agents for complex, partial-information tasks don't know which design patterns actually improve performance versus just burning compute. This study quantifies the cost-benefit tradeoffs of context, reasoning depth, and hierarchical decomposition so builders can stop guessing and start optimizing.
- 🚀Shipping2605.27366·May 26, 2026·~8 mincs.AIcs.CLcs.LG
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Huawei Lin, Peng Li, Jie Song, Fuxin Jiang, +1
⭐ 1.4k stars / 53 repos📚 0 citesLarge language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-cen…
- 🚀Shipping2605.27365·May 26, 2026·~10 mincs.CVcs.AIcs.LG
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Shihao Wang, Shilong Liu, Yuanguo Kuang, Xinyu Wei, +9
⭐ 239 stars / 51 repos📚 0 citesVision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practi…
- 🚀Shipping2605.27360·May 26, 2026·~12 mincs.NIcs.AI
GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing
Tamerlan Aghayev, Maxime Elkael, Michele Polese, Minh Dat Nguyen, +10
⭐ 1.3k stars / 33 repos📚 0 citesCellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration: (i) synthesizing new features from standards or research papers into production code; (ii) conformance and interoperability testing; (iii) hardening against field anomalies…
- 🚀Shipping2605.27358·May 26, 2026·~10 mincs.LGcs.AIcs.CL
MobileMoE: Scaling On-Device Mixture of Experts
Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, +4
⭐ 121 stars / 40 repos📚 0 citesMixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters…
- 🚀Shipping2605.27355·May 26, 2026·~10 mincs.AIcs.CLcs.LG
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
⭐ 126 stars / 37 repos📚 0 citesReinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behavio…
- 🚀Shipping2605.27354·May 26, 2026·~9 mincs.LGcs.AIcs.CL
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, +3
⭐ 500 stars / 38 repos📚 0 citesModel internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement learnin…