Create Next App

All 50 🚀 Shipping 4 📈 Climbing 0 💤 Quiet 46 Unscored 0

What do these badges mean?

🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.

🚀Shipping2606.17053·Jun 15, 2026·~11 mincs.CLcs.CV
Context-Aware RL for Agentic and Multimodal LLMs
Peiyang Xu, Bangzheng Li, Sijia Liu, Karthik R. Narasimhan, +3
⭐ 373 stars / 10 repos📚 0 cites
ELI5This method trains AI models to better spot the exact pieces of evidence they need in long documents or images by making them practice picking the right supporting context from two similar options — like learning to find the one detail that actually matters.
Problem solvedLLMs struggle to locate key evidence buried in long tool traces or subtle image details, causing reasoning failures. This trains models to ground their answers in specific, relevant context rather than guessing.
🚀Shipping2606.17043·Jun 15, 2026·~12 mincs.ROcs.LG
Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes
Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao, +5
⭐ 551 stars / 15 repos📚 0 cites
ELI5When a robot learns to do tasks through trial-and-error, each attempt only tells you if it succeeded or failed. This paper teaches the robot to separate two learning goals—first get good at completing the task, then get fast at completing it—and smartly switches between them as it improves.
Problem solvedRobot fine-tuning from sparse outcomes conflates success with efficiency, wasting learning signal once basic success happens. Mixing autonomous and intervention segments causes wrong credit assignment. HABC separates viability and efficiency learning, doubling success rates on real contact-heavy manipulation tasks.
🚀Shipping2606.17029·Jun 15, 2026·~12 mincs.CL
DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents
Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, +2
⭐ 502 stars / 14 repos📚 0 cites
ELI5Instead of asking an AI to guess what criteria should evaluate a research report, this method builds a tree of questions and evidence first, then creates matching evaluation rubrics—like writing the answer key before the test. It trains research agents 13x faster this way.
Problem solvedTraining research agents with reinforcement learning is slow and unreliable when rubrics don't match what the query actually needs. This method ensures rubrics align perfectly with the information requested, cutting training time dramatically while keeping quality high.
🚀Shipping2606.17024·Jun 15, 2026·~12 mincs.LG
ExpRL: Exploratory RL for LLM Mid-Training
Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, +1
⭐ 458 stars / 13 repos📚 0 cites
ELI5Instead of manually teaching language models intermediate reasoning skills before doing reinforcement learning, this paper uses reference answers as a grading rubric to reward partial progress and good reasoning steps—letting the model learn useful strategies automatically from question-answer pairs.
Problem solvedCurrent RL for LLMs requires expensive manual curation of reasoning traces to teach primitive skills, and it's unclear if these skills are enough for hard problems. This automates that prep stage by extracting signal from existing Q&A data to better prime models before sparse-reward RL.

Context-Aware RL for Agentic and Multimodal LLMs

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

ExpRL: Exploratory RL for LLM Mid-Training