Create Next App

All 50 🚀 Shipping 3 📈 Climbing 0 💤 Quiet 47 Unscored 0

What do these badges mean?

🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.

💤Quiet2607.09546·Jul 10, 2026·~5 mincs.LGmath.NAmath.OC
Graph-Regularized Low-Rank Matrix Completion by Variable Projection
Benoît Loucheur, P. -A. Absil, Michel Journée
⭐ 0 stars / 0 repos📚 0 cites
ELI5When you have a matrix with missing values, this method fills them in by assuming the data is low-rank (simple) and by using the graph structure of how rows and columns relate to each other—like knowing which items are similar helps you guess missing ratings better.
Problem solvedMatrix completion (filling in missing data) often ignores relationships between rows/columns. By incorporating graph structure, this approach recovers missing values more accurately when data has natural groupings or correlations, useful for recommender systems and sensor networks.
💤Quiet2607.08581·Jul 9, 2026·~7 mincs.LGmath.SP
Spectral Stability of Pseudoinverse-Based Extreme Learning Machine
Bich Van Nguyen, Ngoc Anh Khong
⭐ 0 stars / 0 repos📚 0 cites
ELI5Extreme Learning Machines train super fast by using matrix math shortcuts, but they break when the data gets messy. This paper figured out exactly which mathematical properties cause the breakdowns and tested ways to fix them.
Problem solvedELM training is quick but fragile—when input data has certain bad mathematical properties, tiny errors blow up into wildly wrong predictions. This paper shows which properties cause failure and which computational methods stay reliable.
💤Quiet2607.07702·Jul 8, 2026·~11 mincs.CL
From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization
Ying Chang, Jiahang Xu, Xuan Feng, Chenyuan Yang, +2
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI agents fail at tasks, you get messy logs full of irrelevant steps. This method automatically finds the actual root causes by filtering out noise and tracing what actually caused each failure, so the agent can learn from the real problem instead of random junk.
Problem solvedLLM-based agents get stuck on tasks but their failure logs are huge, redundant, and full of irrelevant details—making it hard to figure out what actually went wrong and fix it. Naive log cleanup loses important clues. This makes learning from failures slow and unreliable.
💤Quiet2607.07682·Jul 8, 2026·~10 mincs.LG
Neural Operator-enabled Topology-informed Evolutionary Strategy for PDE-Constrained Optimization
Xiangming Huang, Guannan Zhang, Lu Lu, Raphaël Pestourie
⭐ 0 stars / 0 repos📚 0 cites
ELI5A method that combines neural networks trained on physics equations with evolutionary algorithms to quickly design better physical objects (like nanophotonic devices or mechanical structures) by searching through fewer design possibilities.
Problem solvedInverse design of physics-governed systems is slow and brittle—generative models fail on new conditions while traditional optimization struggles with huge design spaces. This approach makes it both faster and more reliable by using learned physics shortcuts.
💤Quiet2607.07637·Jul 8, 2026·~11 mincs.LGmath.NAmath.OC
An optimal control approach for neural network architecture adaptation with a posteriori error estimation
C G Krishnanunni, Thomas Scott, Tan Bui-Thanh
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of guessing where to add layers in a neural network, this method mathematically calculates which parts of the network are making the biggest errors and adds layers there. It treats network training like a physics optimization problem, letting you know exactly where your network is struggling.
Problem solvedBuilding neural networks requires tedious trial-and-error to decide depth and layer placement. This approach removes guesswork by mathematically pinpointing where a network needs more capacity, saving time and improving accuracy on complex tasks like fluid dynamics simulation.
💤Quiet2607.06532·Jul 7, 2026·~11 mincs.LGmath.OC
GraphBU: MILP Instance Generation with Graph-Native Block Units
Xiaolei Guo, Chenyu Zhou, Jianghao Lin, Dongdong Ge
⭐ 0 stars / 0 repos📚 0 cites
ELI5A tool that generates realistic mathematical optimization problems (MILPs) by breaking down existing problems into reusable building blocks and reassembling them while preserving how those blocks connect to each other.
Problem solvedGetting realistic MILP instances for training solvers is hard when real problems are proprietary. This generator creates new problems that maintain the structural patterns solvers actually learn from, fixing the issue where existing generators lose important coupling information between problem pieces.
💤Quiet2607.06489·Jul 7, 2026·~8 mincs.AI
Multi-Agent Deep Reinforcement Learning for Multi Objective Battery Management in Dairy Farms
Marcos Eduardo Cruz Victorio, Karl Mason
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that uses AI agents to manage batteries on dairy farms, deciding when to store renewable energy and when to use it based on electricity prices—like having a smart coordinator that maximizes profits from buying and selling power.
Problem solvedDairy farms struggle to integrate solar/wind renewable energy profitably while staying within grid rules. This system automatically decides when to charge/discharge batteries to capture price differences and use more clean energy without destabilizing the local grid.
💤Quiet2607.05346·Jul 6, 2026·~7 mincs.AIcs.MA
OptiAgent: End-to-End Optimization Modeling via Multi-Agent Iterative Refinement
Adriana Laurindo Monteiro, Nayse Fagundes, Gabriel Mattos Langeloh, Gustavo de Oliveira Kanno, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that reads English descriptions of optimization problems (like supply chain or scheduling puzzles) and automatically generates both the mathematical equations and working code to solve them, using multiple AI agents that check each other's work.
Problem solvedTranslating real-world optimization problems into solver-ready mathematical models is slow, error-prone, and requires expertise. This automates the entire pipeline from natural language to executable code, reducing expert time and catching mistakes through built-in validation.
💤Quiet2607.02499·Jul 2, 2026·~7 mincs.LGcs.AIphysics.chem-ph
Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials
Gil Harari, Yoel Zimmermann, Ola Tangen Kulseng, Laura Zichi, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5Training machine learning models that predict how atoms interact is slow. Researchers tested newer optimizer algorithms (SOAP, Muon) as alternatives to Adam and found they train these models faster and more accurately, especially when you have limited labeled data.
Problem solvedTraining AI models for molecular simulation takes a long time and wastes compute. By switching optimizers, you can train better models quicker with less labeled atomic data—a practical speedup for scientific discovery and materials research.
💤Quiet2607.02484·Jul 2, 2026·~9 mincs.CVcs.AI
Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning
Xuehui Wang, Xuankun Yang, Wei Shen
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI models look at images with text instructions, they waste computation on irrelevant image patches. This method uses statistical noise filtering and smart selection to keep only the important visual pieces while discarding redundant ones, making the model faster without losing accuracy on detailed questions.
Problem solvedVision-language models are slow because they process every image patch, even redundant ones. Existing pruning methods fail on detailed queries that need fine-grained visual information. This solution preserves critical details while cutting computation, enabling faster inference without accuracy loss.
💤Quiet2606.30634·Jun 29, 2026·~10 mincs.LG
One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining
Philip Zmushko, Egor Petrov, Nursultan Abdullaev, Mikhail Khrushchev, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training giant AI models on many GPUs, you can make them faster by pipelining—but this usually wastes GPU time. This paper shows you can use an async pipeline method without slowdown if you pick the right optimizer; Muon works much better than AdamW for this, and adding error correction helps even more.
Problem solvedPipeline parallelism wastes GPU compute during idle periods, but async versions hurt training quality due to stale gradients. Teams avoid async pipelines thinking staleness breaks optimization—this work proves the real bottleneck is optimizer choice, not the staleness itself, unlocking faster large-scale training.
💤Quiet2606.30625·Jun 29, 2026·~6 minstat.MLcs.AIcs.LG
Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms
Ziwei Su, Junyu Ren, Victor Veitch
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training AI models that learn to compare similar items, the lengths of learned vectors accidentally encode useful information about how specific or common concepts are—even though the model is designed to ignore those lengths. This paper explains why this happens mathematically.
Problem solvedResearchers noticed embedding norms correlated with semantic properties but didn't understand why, treating it as mysterious. This work explains the phenomenon rigorously and shows how to use these 'free' signals for better calibration without extra training.
💤Quiet2606.30559·Jun 29, 2026·~5 mincs.LGmath.NAmath.OC
Convergence of Continual Learning in Homogeneous Deep Networks
Matan Schliserman, Gon Buzaglo, Itay Evron, Daniel Soudry
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI models learn tasks sequentially (like learning to recognize cats, then dogs, then birds), this paper figures out mathematically when and how well they actually improve without forgetting previous tasks—and shows that for certain network types, they converge reliably to good solutions.
Problem solvedContinual learning in deep networks is unstable and poorly understood theoretically. Practitioners don't know when sequential task training will actually converge or prevent catastrophic forgetting, making it hard to deploy continual learning systems reliably.
💤Quiet2606.30509·Jun 29, 2026·~12 mincs.LG
Muon learns balanced solutions in matrix factorization without slow saddle-to-saddle dynamics
Mark Rhee, Jamie Simon, Dhruva Karkada
⭐ 0 stars / 0 repos📚 0 cites
ELI5Muon is an optimizer that learns matrix factorization problems faster than gradient descent by avoiding getting stuck in slow intermediate plateaus and allowing much larger learning rates without becoming unstable.
Problem solvedStandard optimizers like gradient descent get trapped at saddle points during matrix factorization, slowing convergence and requiring careful learning rate tuning. Muon sidesteps this to converge significantly faster with less hyperparameter fiddling.
💤Quiet2606.30460·Jun 29, 2026·~11 mincs.LGcs.DC
HSAP: A Hierachical Sequence-aware Parallelism for Hybrid-Context Generative Models
Songxin Zhang, Zejian Xie, Zhuoyang Song, Cong lin, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training giant language models across multiple GPUs, you can split the text sequence into chunks processed in parallel—but this breaks when you pack multiple short sequences together (a common efficiency trick). This paper fixes that problem by smartly routing partial computations through the right GPU groups.
Problem solvedTraining large language models efficiently requires both packing multiple sequences together (to avoid wasted computation) and splitting sequences across GPUs (to fit larger models). Existing methods force you to choose one or the other; this work makes both work together without corrupting attention calculations.
💤Quiet2606.30455·Jun 29, 2026·~12 mincs.LGmath.OCstat.ML
Curvature-Weighted Gradient Diversity: A Noise Measure for Geometry-Adaptive SGD Schedules
Muhammad Hamza, Ayush Goel
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of treating all directions equally when adjusting learning rates during training, this method looks at how curved the loss landscape is in each direction and adapts the learning rate schedule accordingly—like steering harder on straighter roads and softer on winding ones.
Problem solvedStandard learning rate schedules ignore that noise in sharply curved directions matters less (since learning rates are already small there). This waste means training doesn't reach as good a solution as it could. This method fixes that by accounting for geometry, improving final model quality by ~20% with no extra cost.
💤Quiet2606.30445·Jun 29, 2026·~8 mincs.LG
When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon
Huaqing Zhang, Jingchu Gai, Juno Kim, Bingbin Liu, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training language models by having them learn from expert examples, this paper figures out when it's actually better to learn 'online' (getting feedback as you go) versus 'offline' (learning from a fixed batch). The key insight: online learning only helps when the student model can't perfectly copy the expert—if it can, offline learning works just as well.
Problem solvedTeams building LLMs don't have clear guidance on whether to invest in expensive online feedback loops (like RLHF) or stick with simpler offline fine-tuning. This paper explains the theoretical conditions that determine which approach will actually be better, saving engineering effort.
💤Quiet2606.30442·Jun 29, 2026·~11 mincs.AI
The FIL Hypothesis: Inductive Biases Help with Kernel Engineering
Nikolai Rozanov, Subhabrata Dutta, Preslav Nakov, Iryna Gurevych
⭐ 0 stars / 0 repos📚 0 cites
ELI5When feedback takes hours or weeks instead of seconds (like in science experiments or physical systems), AI models can't learn from pure trial-and-error like they do in games. This paper shows that adding human expertise and constraints into the model design works better than just throwing more data at the problem.
Problem solvedModern AI assumes instant feedback (game wins/losses, classification correct/wrong), but real-world problems like drug discovery or robotics have delays of hours to weeks. This makes data-hungry approaches impractical—you can't collect enough training examples fast enough. Human expertise becomes necessary again.
💤Quiet2606.30384·Jun 29, 2026·~13 mincs.LGcond-mat.dis-nnnlin.CD
Scalar Representations of Neural Network Training Dynamics
Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of trying to understand neural network training by looking at all millions of parameters at once, researchers compress the training trajectory into a single number that still captures the important dynamics—like how sensitive the network is to tiny changes in starting conditions.
Problem solvedNeural network training is impossible to visualize or analyze directly because it happens in millions of dimensions. This creates a low-dimensional summary that preserves the actual dynamics, making it possible to study and compare training runs without the computational nightmare.
💤Quiet2606.30335·Jun 29, 2026·~8 mincs.AI
BayesEvolve: Explicit Belief States for Autonomous Scientific Discovery
Xuening Wu, Shan Yu, Qianya Xu, Shenqin Yin
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that helps AI discover new scientific ideas by maintaining a statistical model of what it's learned so far, rather than just remembering past experiments—like a scientist who keeps updating their intuition about what works instead of just a notebook of tried things.
Problem solvedLLM-based discovery systems waste experiments by relying on simple memory or archives instead of reasoning about uncertainty. BayesEvolve uses probabilistic beliefs to make smarter bets on which hypotheses to test next, saving evaluations in expensive experimental domains.
💤Quiet2606.30333·Jun 29, 2026·~6 minmath.OCcs.LGphysics.comp-ph
Local-Minima-Preserving Continuous Relaxation of Ising Problems
Debraj Banerjee, Santanu Mahapatra, Kunal N. Chaudhury
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of solving hard combinatorial puzzles by checking all possible solutions, this method converts them into smooth math problems where you can slide downhill with regular gradient descent—and the valleys you end up in directly correspond to good solutions for the original puzzle.
Problem solvedHard combinatorial problems like MAX-CUT and graph partitioning are computationally expensive to solve. This work lets you use fast gradient-based optimizers (like ADAM) by creating a relaxed version that preserves the structure of good solutions, avoiding the need for expensive discrete search or special hardware.
💤Quiet2606.30328·Jun 29, 2026·~9 minstat.MLcs.LGmath.NA
Extrapolating from Regularised Solutions for Solving Ill-Conditioned Linear Systems in Machine Learning
Disha Hegde, Jon Cockayne, Chris. J. Oates
⭐ 0 stars / 0 repos📚 0 cites
ELI5A tool that solves tricky linear math problems by combining multiple approximate solutions together, rather than picking one magic number to fix the problem. It works with automatic differentiation so you can use it in end-to-end ML training.
Problem solvedWhen training ML models, you often need to solve ill-conditioned linear systems but picking the regularization parameter by hand is slow and breaks gradient flow. This automates that choice and reuses wasted computation from the selection process.
💤Quiet2606.30316·Jun 29, 2026·~10 mincs.LG
Toward an Energy-Optimized Operation of Data Centers Located in Wind Farms Using Reinforcement Learning
Jan Stenner, Alexander Kilian, Sebastian Peitz, Hermann de Meer
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system learns to automatically shift computing work in data centers based on wind availability and electricity prices, using AI agents that practice making decisions repeatedly to maximize use of free wind energy and minimize costs.
Problem solvedData centers near wind farms waste renewable energy or pay high prices by not timing their workloads smartly—this automates that scheduling to cut energy costs and emissions without needing perfect advance weather forecasts.
💤Quiet2606.30230·Jun 29, 2026·~13 minmath.OCcs.LG
A Distributionally Robust Framework for Learned Reconstructions in Inverse Problems
Floor van Maarschalkerwaart, Subhadip Mukherjee, Christoph Brune, Marcello Carioni
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI models learn to reconstruct images from blurry or incomplete measurements, they break if the noise during testing differs from training. This work makes those models robust by training them against realistic worst-case noise scenarios tied to how the measurement process actually works, rather than assuming all noise is equally likely.
Problem solvedLearned image reconstruction (deblurring, medical imaging, etc.) fails in real deployments when noise characteristics shift. Standard robustness methods are overly cautious and ignore physics. This framework trains models that stay reliable across different noise conditions while remaining practical and interpretable.
💤Quiet2606.30226·Jun 29, 2026·~9 mincs.LG
Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization
Marcelina Marjankowska, Valerio Modugno, Paolo Barucca
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper tracks how the directions of steepest curves in a neural network's loss landscape change during training, revealing that different optimizers (like SGD vs Adam) reorganize these directions very differently—SGD keeps them stable while Adam scrambles them around.
Problem solvedUnderstanding why different optimizers produce different training dynamics is hard; this work provides a concrete way to measure and compare optimizer behavior through Hessian eigenvector movement, making optimizer differences interpretable and measurable.
💤Quiet2606.28308·Jun 26, 2026·~15 mincs.GTcs.AIcs.LG
Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes
Luis Leal
⭐ 0 stars / 0 repos📚 0 cites
ELI5When a game has multiple different ways to play that are all equally good, different algorithms pick different ones. This paper shows that some algorithms reliably choose the 'most random' way to play, while others drift toward less random strategies—and that choice can matter when playing against imperfect opponents.
Problem solvedGame-solving algorithms are usually treated as interchangeable, but teams deploying them in competition or negotiation don't know which equilibrium they'll get. This reveals systematic algorithmic bias in equilibrium selection, which could shift outcomes against real (suboptimal) opponents.
💤Quiet2606.28307·Jun 26, 2026·~10 minmath.OCcs.LG
Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization
Shuang Li, Zhihui Zhu, Qiuwei Li
⭐ 0 stars / 0 repos📚 0 cites
ELI5A new way to solve optimization problems where the standard smoothness assumptions don't apply (like when dealing with polynomials). The method guarantees that when you run the algorithm, it avoids getting stuck at bad saddle points and finds genuinely good solutions.
Problem solvedMany real optimization problems—especially matrix and tensor factorization—don't satisfy standard smoothness conditions, making existing algorithm guarantees fail. This work enables provably correct optimization in those harder cases.
💤Quiet2606.27315·Jun 25, 2026·~10 mincs.LG
Blackwell Approachability and Gradient Equilibrium are Equivalent
Brian W. Lee, Nika Haghtalab, Michael I. Jordan, Ryan J. Tibshirani
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper shows that gradient equilibrium (a way to solve online optimization problems) is mathematically equivalent to Blackwell approachability (a classic game theory concept). Think of it like discovering two different recipes produce the same dish — you can use tools designed for one framework to solve problems in the other.
Problem solvedResearchers didn't understand where gradient equilibrium fit in the online learning toolkit or how it related to other established frameworks. This equivalence lets practitioners reuse algorithms and guarantees from well-studied areas like regret minimization without building new methods from scratch.
💤Quiet2606.27216·Jun 25, 2026·~10 minmath.NAcs.LG
Hierarchical Muon: Tiled Newton-Schulz Updates for Efficient Muon Optimization
Ziyuan Tang, Tianshi Xu, Yousef Saad, Yuanzhe Xi
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper speeds up Muon, an optimizer that updates neural network weights using matrix math. Instead of doing expensive calculations on the entire weight matrix at once, it splits the matrix into smaller tiles, processes each tile independently, and reassembles them—like parallelizing the work while keeping most of the benefits.
Problem solvedMuon optimization is powerful but slow for large neural networks because it requires expensive full-matrix operations that couple all rows and columns together. This makes it impractical for training large transformers, which is why faster alternatives are normally used instead.
💤Quiet2606.27171·Jun 25, 2026·~11 mincs.LGstat.ME
Stochastic Gradient Optimization with Model-Assisted Sampling
Jonne Pohjankukka, Jukka Heikkonen
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of randomly picking which data points to learn from, this method uses a helper model to predict which ones would be most useful, like a smart teacher choosing the best examples to show a student. It reduces noise in gradient estimates and works as a plug-in for any optimizer.
Problem solvedTraining deep networks with mini-batches introduces noise that slows convergence and hurts generalization. Existing fixes add computational overhead. This method reduces gradient variance efficiently by intelligently selecting which samples to use, requiring minimal changes to existing training code.
💤Quiet2606.27153·Jun 25, 2026·~8 mincs.DCcs.LG
DMuon: Efficient Distributed Muon Training with Near-Adam Overhead
Vincent Chen, Starrick Liu, Regis Cheng, Dance Yang, +7
⭐ 0 stars / 0 repos📚 0 cites
ELI5Muon is a smarter optimizer that treats weight updates like matrices instead of individual numbers, but it's slow on distributed systems. This paper makes Muon run 1.5–3x faster by redesigning how it communicates and computes across multiple GPUs, making it nearly as fast as standard optimizers.
Problem solvedMuon optimizers are mathematically superior but too expensive to use in practice on multiple GPUs—they cost 2x more than simpler optimizers. Teams training large models can't afford the slowdown, so they stick with inferior but faster methods. This fixes that tradeoff.
💤Quiet2606.27112·Jun 25, 2026·~6 mincs.LGcs.AI
Heavy-Ball Q-Learning with Residual Weighting Correction
Donghwan Lee
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper speeds up Q-learning (a common reinforcement learning algorithm) by adding momentum—like a heavy ball rolling downhill that keeps its speed even on flat ground. The researchers prove when this actually works and extend it to settings where the agent uses function approximation.
Problem solvedStandard Q-learning converges slowly, which wastes computation in real RL applications. This work provides a theoretically-grounded way to accelerate convergence and tells you exactly when the speedup kicks in.
💤Quiet2606.27082·Jun 25, 2026·~6 mincs.LGcs.DSmath.OC
Finding Stationary Points by Comparisons
Helin Wang, Chenyi Zhang, Xiwen Tao, Yexin Zhang, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper shows how to find points where a function stops improving (stationary points) when you can only compare two function values at a time, without seeing actual numbers. It's like finding a hilltop by only asking 'is this spot higher or lower than that one?'
Problem solvedMany real-world scenarios restrict access to objective values (e.g., expensive simulations, proprietary black-boxes). This work enables optimization with only comparison queries, which are sometimes cheaper or more privacy-preserving than revealing actual function values.
💤Quiet2606.24879·Jun 23, 2026·~6 minmath.OCcs.LG
New Bounds for the Last Iterate of the Stochastic subGradient Method
Guglielmo Beretta, Tommaso Cesari, Roberto Colomboni, Andrea Paudice
⭐ 0 stars / 0 repos📚 0 cites
ELI5A new analysis of stochastic subgradient descent shows that when noise is truly random and independent, the final weight update converges faster than previously thought—removing an extra logarithmic penalty. But if noise isn't perfectly independent, that penalty creeps back in.
Problem solvedPractitioners using stochastic subgradient methods weren't sure how good their final solution would be. Tighter bounds help predict convergence speed more accurately and reveal when algorithm assumptions break down.
💤Quiet2606.24851·Jun 23, 2026·~14 mincs.LG
Real vs. Complex Spectral Bases for Neural Operators: The Role of Green's Function Alignment
Jason Sulskis, Sathya Ravi
⭐ 0 stars / 0 repos📚 0 cites
ELI5Neural networks that solve physics equations using Fourier transforms can be made more efficient by switching to a real-number alternative (Hartley transform) when the underlying physics is symmetric, like gravity or bending. For moving things (waves, fluid flow), stick with complex Fourier. Pick the math that matches your physics.
Problem solvedNeural operator models train slowly and use memory inefficiently because they don't adapt to the type of physics they're solving. This work shows which mathematical foundation (real vs. complex) works best for different equations, letting you choose the right tool upfront and train faster.
💤Quiet2606.23676·Jun 22, 2026·~7 mincs.LGcs.AImath.OC
Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?
Dingzhi Yu, Hongyi Tao, Yuanyu Wan, Luo Luo, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5AdamW is the standard optimizer for training large language models, but nobody has proven it actually works when gradient noise has extreme outliers—something common in real LLM training. This paper asks whether AdamW can handle that mathematically, or if its design actually breaks under heavy-tailed noise.
Problem solvedLLM teams use AdamW without theoretical guarantees it works with the noisy, extreme-valued gradients they actually encounter. Understanding whether AdamW is theoretically sound under realistic conditions matters for trusting and improving training stability.
💤Quiet2606.23637·Jun 22, 2026·~9 mincs.LGmath.OC
Muown Implicitly Performs Angular Step-size Decay
Florian Hübler, Kai Lion, Antonio Orvieto, Niao He
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper reveals that a popular transformer optimizer (Muown) is secretly doing something mathematically cleaner than it appears—adjusting step sizes based on geometry rather than raw numbers. The authors extract this insight to create a better version that explicitly controls this geometric adjustment.
Problem solvedModern optimizers for training large models are often fragile or need careful tuning. Understanding why Muown works well reveals a hidden stability mechanism, letting researchers build faster and more reliable optimizers without mysterious hyperparameter choices.
💤Quiet2606.23631·Jun 22, 2026·~13 mincs.AI
AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring
Yansong Liu, Li-Hsi, Lin, Pramit Khetrapal, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers shortened a 15-question hospital recovery survey down to 5 questions, then used AI to pick which 5 questions best predict patient outcomes. The shorter version works almost as well as the full survey, which matters because patients actually fill it out more often when it's quick.
Problem solvedRemote monitoring of surgical patients relies on daily surveys, but the standard 15-question recovery assessment has poor completion rates. Doctors need a shorter version that patients will actually complete daily while still accurately predicting who might have complications.
💤Quiet2606.20485·Jun 18, 2026·~10 minq-fin.RMcs.AInlin.AO
Optimal Order of Multi-Agent and General Many-Body Systems
Jake J. Xia
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper figures out the sweet spot for how synchronized a group of independent agents should be. Too much coordination makes them productive but fragile; too little keeps them flexible but chaotic. It's like finding the right balance between a military unit's discipline and a jazz band's improvisation.
Problem solvedTeams and systems often struggle between being highly organized (efficient but brittle) and loosely coordinated (flexible but messy). This framework lets you measure and predict the optimal level of order for any multi-agent system based on what you actually care about—growth, stability, or adaptability.
💤Quiet2606.20469·Jun 18, 2026·~14 mincs.LGcs.CG
Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper fixes a long-standing puzzle in deep learning: why does SGD find solutions that generalize well? The answer is that flatness matters, but only when you measure it correctly using geometry tuned to neural networks—not by just looking at the Hessian matrix.
Problem solvedEveryone believed SGD finds flat minima that generalize better, but critics showed this breaks down when you reparametrize the network differently. This paper provides a mathematically rigorous, reparametrization-invariant definition of flatness that actually explains why SGD generalizes.
💤Quiet2606.19199·Jun 17, 2026·~12 mincs.LGcs.AI
Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times
Giuseppe Gabriele, Fabio Pavirani, Seyed Soroush Karimi Madahi, Chris Develder
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of predicting when an EV will leave and then using that prediction separately, this method trains the prediction and charging control together so the prediction focuses only on what matters for making good charging decisions.
Problem solvedEV charging systems need to know when cars will leave, but that info isn't always available. Standard forecasting optimizes for prediction accuracy, but errors still break the charging control—this method cuts those errors' impact by 55% by training forecaster and controller as one unit.
💤Quiet2606.19179·Jun 17, 2026·~12 mincs.LGcs.AImath.OC
Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods
Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training AI models with momentum methods, using bigger batches can make training faster but sometimes wastes compute. This paper figures out exactly when bigger batches help versus hurt, showing that momentum methods behave very differently depending on the shape of your data.
Problem solvedEngineers tuning model training need to know: should I use bigger batches to speed things up, or will that waste GPU/TPU resources? This paper gives concrete guidance on batch size tradeoffs for momentum optimizers used everywhere in practice.
💤Quiet2606.18175·Jun 16, 2026·~14 minmath.NAcs.LGphysics.comp-ph
A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks
Gbenga T. Awojinrin, Abdul-Akeem Olawoyin, Rami M. Younis
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of training neural networks with gradient descent to solve complex equations, this method converts the hard problem into a sequence of easy linear problems that can be solved instantly with standard math—like switching from guessing-and-checking to just doing algebra.
Problem solvedPhysics-informed neural networks are slow and unreliable because training nonlinear problems with gradient descent gets stuck and requires careful tuning. This method replaces that with guaranteed convex solves that finish in seconds, especially for fluid dynamics and other coupled systems.
💤Quiet2606.17013·Jun 15, 2026·~3 minmath.OCcs.LG
Exploding and vanishing gradients in deep neural networks: the effect of residual connections
Vivek S Borkar
⭐ 16 stars / 5 repos📚 0 cites
ELI5Deep neural networks struggle to learn because gradients shrink or explode as they flow backward through many layers. This paper uses math from chaos theory to explain why residual connections (skip connections) fix this by changing how gradients multiply together.
Problem solvedTraining very deep networks fails because gradients vanish or explode during backpropagation, making it impossible to learn good weights. Residual connections help, but there's been no rigorous mathematical explanation for why—this paper provides it.
💤Quiet2606.17000·Jun 15, 2026·~3 mincs.CCcs.GTcs.LG
The Complexity of Min-Max Optimization for Quadratic Polynomials
Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Alexandros Hollender
⭐ 17 stars / 8 repos📚 0 cites
ELI5Finding a balanced solution in a competitive game where two players try to minimize/maximize opposite goals is computationally hard—even for simple quadratic math problems, there's no known fast way to solve it.
Problem solvedGame theory and adversarial ML need to find stable equilibrium points, but proving these problems are fundamentally hard helps explain why algorithms struggle and sets computational limits for what's tractable.
🚀Shipping2606.13657·Jun 11, 2026·~9 mincs.LG
Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation
Guo Yu, Wenlin Liu, Yulan Hu, Hao-Xuan Ma, +2
⭐ 545 stars / 38 repos📚 0 cites
ELI5When you train a language model by having it generate its own examples and learn from a teacher model's feedback, the weight updates are surprisingly small and scattered—mostly in certain network parts—rather than dense rewrites. You can even skip 90% of updates and keep performance.
Problem solvedPost-training recipes blend on-policy learning with teacher guidance, but it was unclear what actually changes in the model. This work reveals the sparse update structure, letting practitioners identify which parameters to train and why dense optimizers matter more than sparsity tricks here.
🚀Shipping2606.13633·Jun 11, 2026·~9 mineess.SYcs.LG
Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model
Ion Matei, Maksym Zhenirovskyy, Takuya Kurihana, Rohit Vupala, +1
⭐ 166 stars / 26 repos📚 0 cites
ELI5A system that predicts how wildfires will spread across terrain, then automatically designs a plan for where and when aerial water/retardant drops should happen to minimize damage. It combines a neural network with fire simulation to test strategies against different weather and uncertainty scenarios.
Problem solvedWildfire suppression crews need to decide where to drop water/retardant in real-time under uncertainty about fire behavior. This tool automates strategy design by modeling fire spread and optimizing drop locations, helping operators make faster, data-driven decisions instead of relying on intuition.
💤Quiet2606.11171·Jun 9, 2026·~8 mincs.LGcond-mat.stat-mechcs.IT
Algorithmic and Minimax Complexities in Kernel Bandits
Yunbei Xu
⭐ 19 stars / 11 repos📚 0 cites
ELI5This paper shows that two popular strategies for making decisions under uncertainty in bandit problems (GP-UCB and DEC methods) are actually solving related but different optimization puzzles—one focuses on what works for a specific problem instance, the other on worst-case guarantees across all possible problems.
Problem solvedWhen designing algorithms for exploration-exploitation tradeoffs, it's unclear whether to optimize for the actual problem you face or hedge against worst-case scenarios. This work clarifies that these two approaches measure different things and can lead to very different performance guarantees.
🚀Shipping2606.11123·Jun 9, 2026·~12 mincs.LG
Overcoming Rank Collapse in Feedback Alignment
Gauthier Boeshertz, Razvan Pascanu, Claudia Clopath
⭐ 122 stars / 8 repos📚 0 cites
ELI5Brain-inspired learning using random feedback weights doesn't work well in deep networks because error signals get squashed into low-dimensional spaces. Adding techniques that spread updates across more dimensions fixes this and makes the learning work much better.
Problem solvedFeedback alignment is a biologically plausible alternative to backprop, but it fails in deep networks. This identifies why (rank collapse) and shows how to fix it, making brain-inspired learning practical for real architectures.
💤Quiet2606.09762·Jun 8, 2026·~9 mincs.LGcs.AI
Preserving Plasticity in Continual Learning via Dynamical Isometry
Andries Rosseau, Robert Müller, Ann Nowé
⭐ 9 stars / 9 repos📚 0 cites
ELI5Neural networks forget how to learn new things over time when trained on changing data. This paper shows that keeping the network's internal transformations at a consistent scale (dynamical isometry) preserves the ability to learn, and introduces a new optimizer variant (AdamO) that enforces this automatically.
Problem solvedIn continual learning, networks progressively lose plasticity—their ability to adapt to new tasks—because their internal layers become over-constrained. This causes performance to plateau even when new data is available. AdamO solves this by maintaining healthy learning dynamics throughout training.

Graph-Regularized Low-Rank Matrix Completion by Variable Projection

Spectral Stability of Pseudoinverse-Based Extreme Learning Machine

From Noisy Traces to Root Causes: Structural Trajectory Analysis and Causal Extraction for Agent Optimization

Neural Operator-enabled Topology-informed Evolutionary Strategy for PDE-Constrained Optimization

An optimal control approach for neural network architecture adaptation with a posteriori error estimation

GraphBU: MILP Instance Generation with Graph-Native Block Units

Multi-Agent Deep Reinforcement Learning for Multi Objective Battery Management in Dairy Farms

OptiAgent: End-to-End Optimization Modeling via Multi-Agent Iterative Refinement

Beyond Adam: SOAP and Muon for Faster, Label-Efficient Training of Machine Learning Interatomic Potentials

Combating Textual Noise and Redundancy: Entropy-Aware Dense Visual Token Pruning

One-Step Gradient Delay is Not a Barrier for Large-Scale Asynchronous Pipeline Parallel LLM Pretraining

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

Convergence of Continual Learning in Homogeneous Deep Networks

Muon learns balanced solutions in matrix factorization without slow saddle-to-saddle dynamics

HSAP: A Hierachical Sequence-aware Parallelism for Hybrid-Context Generative Models

Curvature-Weighted Gradient Diversity: A Noise Measure for Geometry-Adaptive SGD Schedules

When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon

The FIL Hypothesis: Inductive Biases Help with Kernel Engineering

Scalar Representations of Neural Network Training Dynamics

BayesEvolve: Explicit Belief States for Autonomous Scientific Discovery

Local-Minima-Preserving Continuous Relaxation of Ising Problems

Extrapolating from Regularised Solutions for Solving Ill-Conditioned Linear Systems in Machine Learning

Toward an Energy-Optimized Operation of Data Centers Located in Wind Farms Using Reinforcement Learning

A Distributionally Robust Framework for Learned Reconstructions in Inverse Problems

Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization

Which Nash Equilibrium? Solver-Dependent Selection on Zero-Sum Nash Polytopes

Second-Order KKT Guarantees for Bregman ADMM in Nonconvex and Non-Lipschitz Optimization

Blackwell Approachability and Gradient Equilibrium are Equivalent

Hierarchical Muon: Tiled Newton-Schulz Updates for Efficient Muon Optimization

Stochastic Gradient Optimization with Model-Assisted Sampling

DMuon: Efficient Distributed Muon Training with Near-Adam Overhead

Heavy-Ball Q-Learning with Residual Weighting Correction

Finding Stationary Points by Comparisons

New Bounds for the Last Iterate of the Stochastic subGradient Method

Real vs. Complex Spectral Bases for Neural Operators: The Role of Green's Function Alignment

Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

Muown Implicitly Performs Angular Step-size Decay

AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring

Optimal Order of Multi-Agent and General Many-Body Systems

Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

Forecasting what Matters: Decision-Focused RL for Controlled EV Charging with Unknown Departure Times

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

Exploding and vanishing gradients in deep neural networks: the effect of residual connections

The Complexity of Min-Max Optimization for Quadratic Polynomials

Dense Supervision, Sparse Updates: On the Sparsity and Geometry of On-Policy Distillation

Aerial Wildfire Suppression Planning with a Hybrid CNN-Cellular Automata Fire Model

Algorithmic and Minimax Complexities in Kernel Bandits

Overcoming Rank Collapse in Feedback Alignment

Preserving Plasticity in Continual Learning via Dynamical Isometry