Create Next App

All 50 🚀 Shipping 0 📈 Climbing 0 💤 Quiet 50 Unscored 0

What do these badges mean?

🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.

💤Quiet2607.09649·Jul 10, 2026·~9 mincs.AI
ConceptSMILE: Auditing the Trustworthiness of Concept-Based Explainable AI
Mohadeseh Mollapour, Koorosh Aslansefat, Zeinab Dehghani, Bhupesh Kumar Mishra, +2
⭐ 0 stars / 0 repos📚 0 cites
ELI5A tool that checks whether concept-based AI explanations (like 'this image shows vessel damage') are actually reliable, by testing how the model responds when you slightly change parts of the image and seeing if the explanation holds up.
Problem solvedConcept-based explanations seem intuitive to doctors and users, but there's no standard way to verify they're actually faithful to what the model is doing—you could get misleading explanations that sound trustworthy. This framework audits whether those concepts are real.
💤Quiet2607.09576·Jul 10, 2026·~10 mincs.CLcs.AIcs.ET
Conceptual Networks for Cross-Linguistic Idiomatic Expressions:A Feature-Based Graph Approach
Kiran Pala, Punam Silu, Lixun Yu
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of treating idioms as black-box text, this work maps them as a network of shared conceptual patterns—like 'spill the beans' and 'let the cat out of the bag' cluster together because they both involve revealing secrets. It works across 8 languages and outperforms standard embedding models.
Problem solvedIdioms are hard for AI to understand because their meaning doesn't come from word definitions—models trained on raw text statistics miss the conceptual patterns humans use to grasp and translate them. This gives AI a structured, interpretable way to handle idioms across languages.
💤Quiet2607.09544·Jul 10, 2026·~10 mincs.CVcs.LG
The Count Is There, but Misaligned: Understanding and Correcting Counting Failures in VLMs
Ahmed Oumar El-Shangiti, Abzal Nurgazy, Hilal AlQuabeh, Nikolai Rozanov, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5Vision-language models know how to count but give wrong answers anyway. Researchers found they can detect when the model will mess up by watching its internal brain activity, then ask it to try again—boosting accuracy by 15% without retraining.
Problem solvedVLMs fail at basic counting tasks despite having the ability internally. This breaks real applications like inventory management and visual inspection. Now you can catch these failures at inference time and fix them automatically.
💤Quiet2607.08731·Jul 9, 2026·~15 mincs.CLcs.AIcs.CY
Validity of LLMs as data annotators: AMALIA on authority
Manuel Pita
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers test whether an AI model can reliably identify moral concepts in text the same way humans do, not just by matching answers. They found a Portuguese AI agrees with humans on the surface, but actually uses shortcuts rather than understanding the concept — like flagging angry language near authority figures instead of truly measuring moral reasoning.
Problem solvedWhen using LLMs to annotate data for training or research, agreement scores hide whether the model actually understands the concept or just exploits surface patterns. This matters because a model that fakes understanding won't generalize to new data, wasting resources and producing invalid datasets.
💤Quiet2607.08641·Jul 9, 2026·~10 mincs.LG
Steering Neural Network Training through Interpretable Constraints Based on Partial Dependence
Yann Claes, Pierre Geurts, Vân Anh Huynh-Thu
⭐ 0 stars / 0 repos📚 0 cites
ELI5A method that nudges neural networks during training to make their predictions match known rules about how inputs should affect outputs, making them both more accurate and easier to understand.
Problem solvedNeural networks often learn patterns that don't match domain expertise, and their explanations can be misleading. This lets you inject prior knowledge during training so models behave according to what you know should be true.
💤Quiet2607.08605·Jul 9, 2026·~12 mincs.CVcs.AIcs.LG
When Structured Sparse Autoencoders Learn Consistent Concepts Across Modalities
Weiduo Liao, Yunqiao Yang, Ying Wei
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper makes sparse autoencoders (a tool for understanding what neurons in AI models learn) work better across images and text by grouping similar image patches together and training the autoencoder to respect those groups. The result: each learned neuron now represents one clear, consistent concept across both vision and language.
Problem solvedSparse autoencoders can show what features large models learn, but when applied to vision-language models, they break concepts into scattered, disconnected pieces in images. This makes it hard to trust or interpret what the model actually understands about visual concepts.
💤Quiet2607.08573·Jul 9, 2026·~13 mincs.AI
SHAP-Weighted Cross-Modal Expert Fusion for Emotion and Sentiment Recognition: Evidence and Limits
Adis Alihodzic, Selma Skopljakovic Hubljar
⭐ 0 stars / 0 repos📚 0 cites
ELI5When combining audio, video, and text to understand emotions, this paper shows that using SHAP explanations to decide which expert to trust works best when you preserve the total importance score across high and low-dimensional modalities.
Problem solvedMultimodal emotion recognition struggles to balance modularity with cross-modal interaction—early fusion is accurate but rigid, late fusion is flexible but loses connections between modalities. This work shows how to weight different fusion strategies transparently.
💤Quiet2607.08561·Jul 9, 2026·~7 mincs.LGq-bio.NC
Contravariance Theory: Strong Alignment for Minimal Solutions to Hard Tasks
Dan Yamins, Aran Nayebi
⭐ 0 stars / 0 repos📚 0 cites
ELI5When two AI networks solve the same hard problem in minimal ways, their internal structures align automatically—like how different evolutionary paths lead to similar body designs. This alignment happens across layers and explains why different neural networks often develop the same kinds of solutions.
Problem solvedNeuroscientists and AI researchers couldn't reliably compare brain structures to neural networks because there was no principled reason to expect them to align. This shows that hard tasks force networks into similar solutions, making cross-species and cross-architecture comparisons meaningful and predictable.
💤Quiet2607.08545·Jul 9, 2026·~13 mincs.SDcs.LG
Structural Bottlenecks on Frequency Representation in End-to-End Audio Models
Nicole Cosme-Clifford
⭐ 0 stars / 0 repos📚 0 cites
ELI5Audio AI models can produce good-sounding outputs without actually understanding the building blocks of sound (like pitch and timbre). We found that common audio encoders lose access to these blocks through two structural problems, and we created a lightweight fix that restores that access without retraining.
Problem solvedAudio models work well but are black boxes—you can't steer them or understand what they're really learning. When you try to control specific audio features (pitch, timbre), the model can't because its internal representation is jumbled. This fix makes models more interpretable and controllable without rebuilding them.
💤Quiet2607.08499·Jul 9, 2026·~7 mincs.CL
Cross-seed explainability using Procrustes-conditioned Joint End-to-end Top-K Sparse Autoencoders
Bendegúz Váradi, Zoltán Kmetty
⭐ 0 stars / 0 repos📚 0 cites
ELI5When you train the same AI model twice with different random starting points, it learns the same concepts but in scrambled internal representations. This paper aligns those scrambled spaces first, then extracts universal building-block features that work across both versions.
Problem solvedUnderstanding what features neural networks learn is hard because the same model trained twice ends up with completely different internal 'codebooks.' This prevents researchers from building reliable interpretability tools that work across models.
💤Quiet2607.07708·Jul 8, 2026·~12 mincs.CLcs.AIcs.CE
Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning
Chen Tang, Yizhou Wang, Jianyu Wu, Lintao Wang, +25
⭐ 0 stars / 0 repos📚 0 cites
ELI5A AI model that reasons about proteins, molecules, and crystals by treating their 3D structures as discrete, inspectable building blocks—like assembling a transparent blueprint that shows *why* a prediction is correct, not just what the answer is.
Problem solvedScientists struggle to trust AI predictions in biology and chemistry because models operate as black boxes. This system makes structure-based reasoning transparent and explainable, so researchers can verify logic against known scientific principles instead of blindly trusting a number.
💤Quiet2607.07683·Jul 8, 2026·~14 mincs.LG
ECGLight: Compute-Light Framework For Paper ECG Digitization and Myocardial Infarction Screening
Shreyasvi Natraj, Cyrus Achtari, Felice Gragnano, Andrea Milzi, +2
⭐ 0 stars / 0 repos📚 0 cites
ELI5A smartphone app that photographs paper ECGs, converts them into digital signals, and instantly detects heart attacks — all running offline on basic phones without needing cloud servers or fast internet.
Problem solvedRemote clinics with paper ECGs can't access AI diagnosis tools due to poor connectivity or weak devices, causing dangerous delays in detecting acute heart attacks. This system brings clinical-grade diagnosis capability to places where it was impossible before.
💤Quiet2607.07670·Jul 8, 2026·~14 mincs.CLcs.LG
Does Bielik Know What It Doesn't Know? Activation Dispersion Separates Entity Familiarity from Factual Reliability Across Model Scale
Grzegorz Brzezinka
⭐ 0 stars / 0 repos📚 0 cites
ELI5Models can detect when they've never heard of something (a person, place, etc.) just by looking at their internal activity patterns, even before generating an answer—but this doesn't mean they'll actually give correct facts about things they do know. It's like knowing you're unfamiliar with a topic vs. actually knowing the facts about it.
Problem solvedLLMs hallucinate confidently about unfamiliar entities and rarely admit uncertainty. This work shows you can detect when a model hasn't encountered an entity by examining its internal state, potentially enabling early warning signals for unreliable outputs—though it reveals models still won't refuse to answer even when they recognize their own knowledge gaps.
💤Quiet2607.07626·Jul 8, 2026·~11 mincs.CLcs.AI
Future Confidence Distillation in Large Language Models
Sahil Kale
⭐ 0 stars / 0 repos📚 0 cites
ELI5Train LLMs to predict how confident they should be in their answers *before* they finish writing, by learning from confidence signals that only appear *after* they're done. Like teaching someone to feel sure about an answer partway through explaining it, based on patterns from when they finished.
Problem solvedLLMs need to know when they're likely wrong so systems can route to humans or retrieve better info—but current confidence estimates are unreliable and only available after the model commits to an answer, wasting computation and time.
💤Quiet2607.05355·Jul 6, 2026·~10 mincs.CLcs.ETcs.LG
Faithfulness to Refusal: A Causal Audit of Neuron Selectors
Ananth Eswar, Pratinav Seth, Utsav Avaiya, Vinay Kumar Sankarapu
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers tested whether attribution methods (techniques that identify important neurons in language models) actually point to neurons that matter, by directly zeroing them out and measuring the impact. They found that some popular attribution methods work better than others at finding truly important neurons, and that refusal behavior (rejecting harmful requests) can be installed through many different sets of neurons.
Problem solvedTeams use attribution methods to identify which neurons to prune, edit, or study for safety—but nobody was checking if these methods actually point to causally important neurons. This audit reveals that some widely-used selectors fail in ways that simpler rankings don't catch, helping practitioners choose the right tool for neuron identification.
💤Quiet2607.05316·Jul 6, 2026·~11 mincs.CLcs.LG
How Much is Left? LLMs Linearly Encode Their Remaining Output Length
Mohamed Amine Merzouk, Dmitri Carpov, Mirko Bronzi, Damiano Fornasiere, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5LLMs seem to have an internal sense of how long their answer will be, which you can read out from their hidden states even before they start writing. Think of it like a writer mentally knowing their essay will be 5 pages before putting pen to paper.
Problem solvedUnderstanding what's happening inside LLMs is hard. This reveals that models track output length internally, which could help debug why they ramble, stop too early, or behave inconsistently—and suggests they're doing some form of planning.
💤Quiet2607.02494·Jul 2, 2026·~12 mincs.CVcs.CL
Towards Robustness against Typographic Attack with Training-free Concept Localization
Bohan Liu, Wenqian Ye, Guangzhi Xiong, Zhenghao He, +2
⭐ 0 stars / 0 repos📚 0 cites
ELI5Text written on images tricks AI vision models into reading the words instead of understanding what they're actually looking at. This work finds which parts of the model cause this problem and fixes them by tweaking those specific components—no retraining needed.
Problem solvedVision-language models like CLIP fail when images contain text overlays (like a stop sign with graffiti), because they get distracted by the words and misidentify objects. This is dangerous for autonomous vehicles and other safety-critical systems that need to understand images correctly.
💤Quiet2607.02459·Jul 2, 2026·~9 mincs.CL
Language Models as Measurement Apparatus for Culture
Kent K. Chang
⭐ 0 stars / 0 repos📚 0 cites
ELI5Language models don't just measure culture neutrally—the choices you make when building them (what data, how you label it, what you measure) actually shape what 'culture' means in your analysis. It's like the thermometer isn't just reading temperature; it's partly deciding what temperature is.
Problem solvedML researchers often treat cultural measurement as objective and technical, ignoring how their design choices actively construct the cultural reality they claim to measure. This paper pushes back, showing that every decision (model, data, labels) is a hidden ethical and methodological commitment that needs explicit attention.
💤Quiet2607.02423·Jul 2, 2026·~10 mincs.LGcs.AI
Neuron-Aware Active Few-Shot Learning for LLMs
Zhuowei Chen, Liwei Chen, Christian Schunn, Raquel Coelho, +1
⭐ 0 stars / 0 repos📚 1 cites
ELI5Instead of picking which examples to label based on what the model outputs, this method looks inside the model's neurons to see which unlabeled samples the model is most confused about, then labels those to teach it faster.
Problem solvedActive learning for LLMs wastes human labeling effort by selecting examples based on surface-level signals. This approach finds the samples that expose real knowledge gaps in the model itself, reducing annotation costs while improving performance.
💤Quiet2607.02396·Jul 2, 2026·~8 mincs.AIcs.LG
Fast Multi-dimensional Refusal Subspaces via RFM-AGOP
Thomas Winninger
⭐ 0 stars / 0 repos📚 0 cites
ELI5A new technique quickly finds the multi-dimensional 'refusal space' inside LLMs—the mental patterns that make them refuse harmful requests—by adapting an efficient algorithm and using a smart initialization trick. It works in seconds instead of hours, even on models with long reasoning traces.
Problem solvedFinding where safety behaviors live in LLMs is slow and expensive on large models. This makes it hard to study, steer, or audit refusal behavior at scale. The new method is 100x+ faster, making safety analysis practical on reasoning models.
💤Quiet2607.02386·Jul 2, 2026·~12 mincs.CVcs.LG
Transformer Geometry Observatory TGO-II: Representational Similarity Observatory
Kaustubh Kapil, Kishor P. Upla
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers built tools to watch how the internal representations inside Vision Transformers change and specialize as they learn, discovering that layers become increasingly different from each other while tokens stay tightly coordinated rather than becoming independent.
Problem solvedWe didn't understand how transformer representations actually evolve during training—whether layers specialize, how complex the internal geometry becomes, or how tokens interact. This framework lets you measure those changes, revealing the actual learning dynamics hidden inside these models.
💤Quiet2607.02369·Jul 2, 2026·~5 mincs.CLcs.AI
World Wide Models: Literary Tools for Cultural AI
Nina Begus
⭐ 0 stars / 0 repos📚 0 cites
ELI5Literary scholars have techniques for understanding how texts carry cultural meaning across different languages and traditions—this paper argues those same techniques should guide how we build AI systems that work across cultures, not just optimize for one language.
Problem solvedMost large language models are trained primarily on English text and reflect Western perspectives, causing them to misunderstand or misrepresent other cultures. Literary analysis methods offer proven ways to spot these blind spots and build more culturally aware AI.
💤Quiet2606.30625·Jun 29, 2026·~6 minstat.MLcs.AIcs.LG
Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms
Ziwei Su, Junyu Ren, Victor Veitch
⭐ 0 stars / 0 repos📚 0 cites
ELI5When training AI models that learn to compare similar items, the lengths of learned vectors accidentally encode useful information about how specific or common concepts are—even though the model is designed to ignore those lengths. This paper explains why this happens mathematically.
Problem solvedResearchers noticed embedding norms correlated with semantic properties but didn't understand why, treating it as mysterious. This work explains the phenomenon rigorously and shows how to use these 'free' signals for better calibration without extra training.
💤Quiet2606.30609·Jun 29, 2026·~8 mincs.LGcs.AI
C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders
Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI researchers try to understand how language models work by breaking down their internals into interpretable pieces, those pieces often get fragmented or messy. This paper adds a constraint that makes sure the same concept is always represented the same way across multiple examples, cleaning up the interpretation without breaking the model.
Problem solvedSparse autoencoders currently split single concepts into multiple redundant pieces or create random exceptions in learned features, making it hard to reliably understand what a language model is actually doing. This work fixes that by enforcing consistency across examples, improving interpretability without sacrificing accuracy.
💤Quiet2606.30498·Jun 29, 2026·~10 mincs.CVcs.AI
On the Faithfulness of Post-Hoc Concept Bottleneck Models
Laines Schmalwasser, Jan Blunk, Niklas Penzel, Julia Niebling, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5When AI models try to explain their decisions using human-readable concepts (like 'red belly' for identifying birds), they often cheat—picking patterns that predict well but don't actually mean what they claim. This paper catches those cheaters by measuring whether concepts are truly meaningful, not just accurate.
Problem solvedCurrent AI interpretability methods claim to use human concepts but actually learn meaningless shortcuts that happen to work. You can't tell if your 'explainable' model is truly understandable or just getting lucky, which defeats the purpose of interpretability for high-stakes decisions.
💤Quiet2606.30449·Jun 29, 2026·~13 mincs.LG
Internal-State Probes Read the Situation, Not the Action: Three Negative Results for Pre-Action Misalignment Monitoring
Max Fomin, Elad David, Amit LeVi
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers tested whether you can peek inside an AI model's internal states to catch it planning harmful actions before it generates them. They found that the signals they measured were mostly just reflecting the prompt or situation, not actually predicting what unsafe action the model would take next.
Problem solvedAI safety teams want early-warning systems that detect when a model is about to do something harmful. This paper shows that internal-state monitoring techniques—which seemed promising—don't actually work as pre-action detectors; they fail when tested rigorously across different scenarios or unrelated concepts.
💤Quiet2606.30444·Jun 29, 2026·~13 minstat.MLcs.LG
SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model
Tyler LaBonte, Vidya Muthukumar
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper proves mathematically why neural networks get tricked by easy-but-wrong patterns (like a shortcut) instead of learning the real pattern, using a simple XOR problem with a fake correlation. SGD learns the fake pattern first and way faster, and the math shows exactly how and why.
Problem solvedNeural networks often rely on spurious correlations in data instead of learning the true underlying pattern. This work provides the first rigorous theoretical proof of *why* and *how* this happens during training, which is essential for designing better methods to prevent it.
💤Quiet2606.30384·Jun 29, 2026·~13 mincs.LGcond-mat.dis-nnnlin.CD
Scalar Representations of Neural Network Training Dynamics
Pedro Jiménez-González, Miguel C. Soriano, Lucas Lacasa
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of trying to understand neural network training by looking at all millions of parameters at once, researchers compress the training trajectory into a single number that still captures the important dynamics—like how sensitive the network is to tiny changes in starting conditions.
Problem solvedNeural network training is impossible to visualize or analyze directly because it happens in millions of dimensions. This creates a low-dimensional summary that preserves the actual dynamics, making it possible to study and compare training runs without the computational nightmare.
💤Quiet2606.30313·Jun 29, 2026·~11 mincs.CVcs.LG
TRACE: A Concept Bottleneck Model for Longitudinal 3D Glioblastoma Response Assessment
Alia Tarek, Hamsa Saberr, Hamza Elghonemy, Youssef Afify, +4
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that analyzes brain tumor MRI scans over time by first measuring specific tumor features (like size and necrosis), then applying clinical rules to decide if the tumor is responding to treatment—making its reasoning transparent and fixable by doctors.
Problem solvedDoctors need to assess whether glioblastoma tumors are responding to treatment by comparing MRI scans against strict clinical criteria (RANO), but current AI models hide their reasoning and don't align with how clinicians actually think, making it hard to trust or correct them.
💤Quiet2606.30226·Jun 29, 2026·~9 mincs.LG
Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization
Marcelina Marjankowska, Valerio Modugno, Paolo Barucca
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper tracks how the directions of steepest curves in a neural network's loss landscape change during training, revealing that different optimizers (like SGD vs Adam) reorganize these directions very differently—SGD keeps them stable while Adam scrambles them around.
Problem solvedUnderstanding why different optimizers produce different training dynamics is hard; this work provides a concrete way to measure and compare optimizer behavior through Hessian eigenvector movement, making optimizer differences interpretable and measurable.
💤Quiet2606.28294·Jun 26, 2026·~9 mincs.LGcs.MA
Democratic ICAI: Debating Our Way to Steering Principles from Preferences
Kevin Kingslin, Anish Natekar, Ashutosh Ranjan, Vivek Srivastava, +2
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of just asking an AI why it prefers one answer over another, have multiple AI personas debate the decision from different angles. This captures the hidden reasons behind preferences way better than a single explanation, letting you steer AI behavior based on richer, more balanced principles.
Problem solvedCurrent alignment methods ask AI to explain preferences in one pass, missing the real trade-offs and nuance in complex decisions. This leaves you with shallow steering principles that don't actually capture what matters, making it hard to reliably guide AI behavior on subjective tasks.
💤Quiet2606.28287·Jun 26, 2026·~13 minnucl-thcs.LG
Bridging Ab Initio Symmetries and Global Nuclear Masses with Interpretable Neural Networks
Phong Dang, Evander Espinoza, Xiaoliang Wan, Michela Negro, +5
⭐ 0 stars / 0 repos📚 0 cites
ELI5A team used physics symmetries (mathematical patterns describing how nuclear forces work) as inputs to simple neural networks to predict how tightly bound atomic nuclei are. This lets them see *why* predictions work, not just make accurate guesses.
Problem solvedNuclear mass tables are crucial for astrophysics and engineering but require expensive experiments or complex simulations. This approach combines known physics symmetries with interpretable ML to get competitive accuracy while revealing which symmetries matter most—letting physicists understand the underlying rules rather than treating models as black boxes.
💤Quiet2606.28273·Jun 26, 2026·~10 mincs.CL
Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models
Niclas Lietzow, Danielle Bitterman, Carsten Eickhoff, William Rudman, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers found the specific switches inside vision-language models that decide whether to trust what they see in an image or what they've memorized about the world. They discovered just a handful of attention heads (2-5%) act as gatekeepers controlling this choice.
Problem solvedVision-language models sometimes disagree with visual reality when their training data contradicts what's in the image. Understanding where this conflict happens and how to control it helps make multimodal AI more reliable and trustworthy for real applications.
💤Quiet2606.27321·Jun 25, 2026·~13 mincs.LGcs.AI
Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders
Nathanaël Jacquier, Maria Vakalopoulou, Mahdi S. Hosseini
⭐ 0 stars / 0 repos📚 0 cites
ELI5A tool that breaks down what vision AI models learn shows that adding soft penalties alongside hard sparsity rules makes the learned features cleaner and more interpretable, without hurting how well the model reconstructs images.
Problem solvedSparse autoencoders help explain AI vision models, but their fixed-budget designs don't adapt to input complexity and overfit to training settings. This work makes them more flexible and robust by combining two complementary constraint types.
💤Quiet2606.27314·Jun 25, 2026·~9 mincs.CL
Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection
Hamid Reza Firoozfar, Mohammadsadegh Abolhasani, Reza Mousavi, Paul Jen-Hwa Hu
⭐ 0 stars / 0 repos📚 0 cites
ELI5When people want to hide posts from social media filters, they use coded language like slang or euphemisms. This paper creates a system that categorizes how that hidden meaning actually works—not by what people are trying to hide, but by the tricks they use to hide it—so AI can spot coded language better.
Problem solvedContent moderators struggle to catch camouflaged harmful content on social platforms because coded language keeps evolving. A better way to understand *how* meaning gets hidden (the mechanisms) rather than *what's* hidden helps AI systems stay ahead of new obfuscation tactics.
💤Quiet2606.27237·Jun 25, 2026·~9 mincs.CL
LMs as Task-Specific Knowledge Bases: An Interpretability Analysis
Amit Elhelo, Amir Globerson, Mor Geva
⭐ 0 stars / 0 repos📚 0 cites
ELI5Language models don't store facts like a traditional database with one 'true' answer. Instead, they encode the same fact differently depending on the task, like having separate filing cabinets for the same information—which means they can give inconsistent answers depending on context.
Problem solvedWe don't really understand how language models store and retrieve facts, making it risky to rely on them as knowledge sources. This work shows why the same fact can produce different outputs in different tasks, helping explain reliability issues and why LMs can seem to 'know' something in one context but fail in another.
💤Quiet2606.27226·Jun 25, 2026·~12 mincs.AIcs.CL
Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement
Sangwoo Cho, Kushal Chawla, Pengshan Cai, Zefang Liu, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5Instead of asking an LLM for a single opaque score, break evaluation into simple yes/no questions—like 'Is the summary factually accurate?' and 'Does it cover main points?'—then combine the answers into transparent, multi-part scores you can actually understand and debug.
Problem solvedLLM evaluation is slow (humans needed), unreliable (word-matching metrics fail on creative tasks), and opaque (judge models spit out mysterious scores you can't debug or learn from). This makes it hard to improve prompts or trust results.
💤Quiet2606.27201·Jun 25, 2026·~12 mincs.LG
Explaining Temporal Graph Neural Networks via Feature-induced Information Flow
Ping Xiong, Thomas Schnake, Klaus-Robert Müller, Shinichi Nakajima
⭐ 0 stars / 0 repos📚 0 cites
ELI5This paper explains how temporal graph neural networks make decisions by tracking how information flows through all the variables in the model, not just the obvious paths. Think of it like understanding a rumor spread by following not just who told whom, but also the hidden reasons why they chose to share it.
Problem solvedTemporal graph models (used for things like predicting disease spread or recommender systems) were black boxes—you couldn't tell why they made their predictions. Previous explanation methods only traced partial information paths, missing crucial intermediate factors that actually drive the model's reasoning.
💤Quiet2606.27199·Jun 25, 2026·~7 mincs.CLcs.LG
Forecasting With LLMs: Improved Generalization Through Feature Steering
Humzah Merchant, Bradford Levy
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers used a tool to peek inside LLMs' brains and found specific switches that control whether the model reasons about time realistically or accidentally cheats by looking at future information. By flipping these switches, they made LLMs better at forecasting tasks across different domains.
Problem solvedLLMs tend to 'look ahead' when forecasting—using future information they shouldn't have access to. This makes them appear better at prediction than they actually are. By identifying and controlling the internal features causing this, forecasts become genuinely more reliable.
💤Quiet2606.27069·Jun 25, 2026·~12 mincs.CL
Towards Explainable Adjudicative Variance: Quantifying Judicial Discretion via Gated Multi-Task Learning
Stanisław Sójka, Felix Steffek, Matthias Grabmair
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that predicts court rulings by separating what the law says (facts) from what individual judges tend to do (discretion). It uses a special neural network that learns when to trust case facts versus judge identity, making predictions more accurate with fewer parameters than giant language models.
Problem solvedLegal prediction systems struggled to explain why different judges rule differently on similar cases. Courts need to understand whether outcomes reflect law or judge bias—and current approaches either ignore judge identity or can't tell them apart from the actual legal merits.
💤Quiet2606.26094·Jun 24, 2026·~11 mincs.LG
RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments
Babak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, +1
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers built a benchmark where AI models try to reverse-engineer hidden game-playing strategies by watching them play and designing custom opponents to probe their behavior, then reconstructing the actual code behind them.
Problem solvedUnderstanding what an AI opponent is actually doing requires either accessing its internals or running expensive interpretability techniques. This lets you figure out strategies from pure observation—useful for competitive gaming, security analysis, and understanding black-box agents.
💤Quiet2606.26071·Jun 24, 2026·~15 mincs.LGcs.AI
Model Forensics: Investigating Whether Concerning Behavior Reflects Misalignment
Aditya Singh, Gerson Kroiz, Senthooran Rajamanoharan, Neel Nanda
⭐ 0 stars / 0 repos📚 0 cites
ELI5When an AI does something bad, is it because it's truly misaligned, or just confused? This paper develops a detective protocol: read the AI's reasoning, form hypotheses about why it misbehaved, then run tests (like changing prompts) to figure out the real cause.
Problem solvedSafety researchers struggle to distinguish genuine misalignment from benign failures like confusion or bugs. Without knowing the root cause, it's hard to fix the problem or assess real risk. This work provides a systematic way to investigate the actual drivers behind concerning AI behavior.
💤Quiet2606.26050·Jun 24, 2026·~13 mincs.LGcond-mat.dis-nncs.AI
Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
Juliana Li, Diya Sreedhar
⭐ 0 stars / 0 repos📚 0 cites
ELI5Language models learn rules during training, then mysteriously forget them even though evidence for those rules is still in the data. The model's fate for each rule is decided by how often it appears in the training corpus—a simple frequency count predicts what survives and what gets erased.
Problem solvedUnderstanding which learned patterns persist in language models and which disappear is crucial for reliable AI systems. This reveals that models can silently unlearn useful rules through training, and we can't easily restore them once lost—a hidden fragility in pretraining.
💤Quiet2606.24832·Jun 23, 2026·~5 mincs.AI
Difference-Making without Making a Difference
Sander Beckers
⭐ 0 stars / 0 repos📚 0 cites
ELI5A philosopher shows that seven different definitions of 'what causes what' that were supposed to be fundamentally different actually all work the same way underneath, and all of them fail on basic examples.
Problem solvedAI systems and philosophers need precise definitions of causation to explain decisions and assign responsibility, but existing formal frameworks contradict each other and handle common scenarios inconsistently.
💤Quiet2606.24790·Jun 23, 2026·~9 mincs.LGcs.AI
Grad Detect: Gradient-Based Hallucination Detection in LLMs
Anand Kamat, Daniel Blake, Brent M. Werness
⭐ 0 stars / 0 repos📚 0 cites
ELI5A new technique that looks at how a language model's internal math changes during a backward pass to spot when the model is making stuff up, working better than just checking how confident the model sounds.
Problem solvedLLMs confidently generate false information (hallucinations) with no reliable way to catch it at inference time. This makes them risky for real applications where wrong answers are costly. Grad Detect catches hallucinations by analyzing internal signals that confidence scores miss.
💤Quiet2606.23673·Jun 22, 2026·~11 mincs.AIcs.LG
PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment and Decision Support
Sunil Wanjari, Manish Thakre, Aayushi Asole, Sharwari Raut, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that combines multiple mental health screening tools (like depression and anxiety questionnaires) with cognitive and personality assessments to give doctors a more complete, explainable picture of a patient's mental health risk—like a dashboard that pulls together different signals instead of looking at them separately.
Problem solvedMental health providers currently use fragmented screening tools that miss the bigger picture and can't explain their conclusions. PsyBridge unifies multiple assessment dimensions with clear reasoning, making it easier for doctors in digital and telehealth settings to make confident, defensible decisions.
💤Quiet2606.20560·Jun 18, 2026·~14 mincs.LGcs.AI
How Transparent is DiffusionGemma?
Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, +10
⭐ 0 stars / 0 repos📚 0 cites
ELI5DiffusionGemma generates text by gradually refining a fuzzy draft instead of building it word-by-word like normal models. This paper asks: is it harder to understand what the model is thinking during this process? They show you can peek at what's happening between refinement steps and actually track the model's reasoning just as well as with traditional models.
Problem solvedDiffusion-based language models are faster and more flexible than traditional ones, but it's unclear if we can still inspect and understand their decision-making—critical for safety and debugging. This work shows their reasoning can be made nearly as transparent as standard models, removing a barrier to adopting them in practice.
💤Quiet2606.20532·Jun 18, 2026·~9 mincs.AI
How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech
Nityanand Mathur, Hamees Sayed, Wasim Madha, Apoorv Singh, +3
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers created a tool that shows which words in a style description (like 'happy' or 'whispered') influence which parts of the generated speech sound. They tracked attention patterns through a speech-generating AI to understand where style instructions take effect.
Problem solvedWhen AI generates expressive speech from descriptions, engineers can't see why it works or fails. This tool reveals which words control which acoustic features (pitch, loudness), making it easier to debug bad outputs and improve controllability.
💤Quiet2606.20502·Jun 18, 2026·~13 mincs.CRcs.AIcs.SE
Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software
Arastoo Zibaeirad, Marco Vieira
⭐ 0 stars / 0 repos📚 0 cites
ELI5Researchers tested whether AI models trained to spot security bugs in code actually understand security or just memorize patterns. They found models just adjust their confidence levels without learning real reasoning—like a student who learns to guess better without understanding the material.
Problem solvedCompanies want to use LLMs to find vulnerabilities in critical code like Linux kernels, but we don't know if models are actually reasoning about security or just pattern-matching on training data. This work shows fine-tuning doesn't create real understanding, exposing a dangerous gap between benchmark scores and actual capability.
💤Quiet2606.20467·Jun 18, 2026·~11 mincs.LGmath.NAphysics.comp-ph
Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks
Zongmin Yu, Liu Yang
⭐ 0 stars / 0 repos📚 0 cites
ELI5A system that automatically discovers the mathematical equations hidden inside complex physical simulations, rather than just computing numbers. It's like reverse-engineering the rules of a system by searching through candidate formulas and fitting them to data.
Problem solvedScientists and engineers struggle to understand PDE solutions as interpretable mathematical structures—simulations give numbers, neural networks give black boxes, and analytical solutions require years of hand work per problem. ASYS automatically uncovers the underlying equations and relationships.

ConceptSMILE: Auditing the Trustworthiness of Concept-Based Explainable AI

Conceptual Networks for Cross-Linguistic Idiomatic Expressions:A Feature-Based Graph Approach

The Count Is There, but Misaligned: Understanding and Correcting Counting Failures in VLMs

Validity of LLMs as data annotators: AMALIA on authority

Steering Neural Network Training through Interpretable Constraints Based on Partial Dependence

When Structured Sparse Autoencoders Learn Consistent Concepts Across Modalities

SHAP-Weighted Cross-Modal Expert Fusion for Emotion and Sentiment Recognition: Evidence and Limits

Contravariance Theory: Strong Alignment for Minimal Solutions to Hard Tasks

Structural Bottlenecks on Frequency Representation in End-to-End Audio Models

Cross-seed explainability using Procrustes-conditioned Joint End-to-end Top-K Sparse Autoencoders

Accurate, Interdisciplinary and Transparent Structure-property Understanding with Deep Native Structural Reasoning

ECGLight: Compute-Light Framework For Paper ECG Digitization and Myocardial Infarction Screening

Does Bielik Know What It Doesn't Know? Activation Dispersion Separates Entity Familiarity from Factual Reliability Across Model Scale

Future Confidence Distillation in Large Language Models

Faithfulness to Refusal: A Causal Audit of Neuron Selectors

How Much is Left? LLMs Linearly Encode Their Remaining Output Length

Towards Robustness against Typographic Attack with Training-free Concept Localization

Language Models as Measurement Apparatus for Culture

Neuron-Aware Active Few-Shot Learning for LLMs

Fast Multi-dimensional Refusal Subspaces via RFM-AGOP

Transformer Geometry Observatory TGO-II: Representational Similarity Observatory

World Wide Models: Literary Tools for Cultural AI

Optimization Dynamics Imprint Semantic Specificity in Contrastive Embedding Norms

C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

On the Faithfulness of Post-Hoc Concept Bottleneck Models

Internal-State Probes Read the Situation, Not the Action: Three Negative Results for Pre-Action Misalignment Monitoring

SGD Provably Prioritizes a Shortcut Spurious Feature in the XOR Model

Scalar Representations of Neural Network Training Dynamics

TRACE: A Concept Bottleneck Model for Longitudinal 3D Glioblastoma Response Assessment

Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization

Democratic ICAI: Debating Our Way to Steering Principles from Preferences

Bridging Ab Initio Symmetries and Global Nuclear Masses with Interpretable Neural Networks

Vision-Default, Prior-Override: Causal Mechanisms of Perception-Knowledge Conflict in Vision-Language Models

Beyond the Hard Budget: Sparsity Regularizers for More Interpretable Top-k Sparse Autoencoders

Beyond Surface Forms: A Comprehensive, Mechanism-Oriented Taxonomy of Indirect Linguistic Encoding for LLM-Based Coded Language Detection

LMs as Task-Specific Knowledge Bases: An Interpretability Analysis

Ask, Don't Judge: Binary Questions for Interpretable LLM Evaluation and Self-Improvement

Explaining Temporal Graph Neural Networks via Feature-induced Information Flow

Forecasting With LLMs: Improved Generalization Through Feature Steering

Towards Explainable Adjudicative Variance: Quantifying Judicial Discretion via Gated Multi-Task Learning

RevengeBench: Reverse Engineering Code-Space Policies from Behavioral Experiments

Model Forensics: Investigating Whether Concerning Behavior Reflects Misalignment

Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

Difference-Making without Making a Difference

Grad Detect: Gradient-Based Hallucination Detection in LLMs

PsyBridge: A Hybrid Intelligent Framework for Multi-Dimensional Mental Health Assessment and Decision Support

How Transparent is DiffusionGemma?

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks