What do these badges mean?
- 🚀ShippingCode exists. Multiple GitHub repos already reference this paper — people are building on it.
- 📈ClimbingCitation velocity is rising. Researchers are starting to pick it up.
- 💤QuietPublished but no notable signal yet. Most papers live here — could become anything later.
- 🎭HypeHeavy social buzz but no shipping signal. The counter-signal — defer until Twitter/X data is wired up.
- 13 min read🚀Shipping2605.18747·May 18, 2026cs.CLcs.AI
Code as Agent Harness
Xuying Ning, Katherine Tieu, Dongqi Fu, Tianxin Wei, +38
⭐ 1.3k stars / 9 repos📚 0 citesELI5Instead of treating code as just the output LLMs produce, this survey shows how code can be the central operating system for AI agents—the glue that lets them think, act, remember, and verify their work in a way humans can actually understand and check.
Problem solvedCurrent AI agents are hard to make reliable, debuggable, and controllable. Using code as the core infrastructure lets you write agent logic you can read, test, and fix—solving the black-box nature of pure neural approaches and making agents deployable in real systems.
- 13 min read🚀Shipping2605.16215·May 15, 2026cs.AIcs.CL
Fully Open Meditron: An Auditable Pipeline for Clinical LLMs
Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan, Sahaj Vaidya, +4
⭐ 434 stars / 25 repos📚 0 citesELI5Researchers built the first completely transparent medical AI model where you can see everything: what data it learned from, how it was cleaned, how it was trained, and how it works. They combined medical question datasets, added clinician-verified practice guidelines, and had doctors validate every step.
Problem solvedMedical AI systems need to be trustworthy and auditable for doctors to use them, but most 'open' models hide their training data and methods. This makes it impossible to validate they're safe or understand why they give certain answers—a critical problem in healthcare.
- 14 min read🚀Shipping2605.16205·May 15, 2026cs.AIcs.CLcs.LG
Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP
Igor Bogdanov, Chung-Horng Lung, Thomas Kunz, Jie Gao, +2
⭐ 348 stars / 28 repos📚 0 citesELI5Researchers tested different ways to build AI agents that play a cyber defense game where they can't see the full situation. They compared three design choices: what information to show the agent, how much the agent should think things through, and whether to use one big agent or split it into smaller specialist agents. They found that clean data representation and task splitting work best, but adding too much internal reasoning actually makes things worse.
Problem solvedTeams building AI agents for complex, partial-information tasks don't know which design patterns actually improve performance versus just burning compute. This study quantifies the cost-benefit tradeoffs of context, reasoning depth, and hierarchical decomposition so builders can stop guessing and start optimizing.
- 💤Quiet2605.27371·May 26, 2026·~9 mincs.CYcs.AI
Algorithmic Monocultures in Hiring
Rishi Bommasani, Sarah H. Bana, Kathleen A. Creel, Dan Jurafsky, +1
⭐ 58 stars / 21 repos📚 0 citesMany employers screen job applicants with algorithms built by the same few algorithm vendors. We hypothesize that algorithmic monoculture leads to the same individuals and members of the same racial groups facing rejection. We acquire and analyze a novel dataset of 3 million applicants submitting 4 million applications…
- 🚀Shipping2605.27366·May 26, 2026·~8 mincs.AIcs.CLcs.LG
MUSE-Autoskill: Self-Evolving Agents via Skill Creation, Memory, Management, and Evaluation
Huawei Lin, Peng Li, Jie Song, Fuxin Jiang, +1
⭐ 1.4k stars / 53 repos📚 0 citesLarge language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-cen…
- 🚀Shipping2605.27365·May 26, 2026·~10 mincs.CVcs.AIcs.LG
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding
Shihao Wang, Shilong Liu, Yuanguo Kuang, Xinyu Wei, +9
⭐ 239 stars / 51 repos📚 0 citesVision-language models (VLMs) commonly formulate visual grounding and detection as a coordinate-token generation problem, serializing each 2D box into multiple 1D tokens that are learned and decoded largely independently. This token-by-token decoding mismatches the coupled structure of box geometry and creates a practi…
- 💤Quiet2605.27361·May 26, 2026·~10 mincs.AIeess.SY
Natural Language Query to Configuration for Retrieval Agents
Melissa Z. Pan, Negar Arabzadeh, Mathew Jacob, Fiodar Kazhamiaka, +2
⭐ 93 stars / 30 repos📚 0 citesModern retrieval agents expose many configuration choices -- LLM, retriever, number of documents, number of hops, and synthesis strategy -- each shaping both answer quality and serving cost. Today, these pipelines are typically hand-tuned once per workload, leaving substantial per-query optimization untapped. We formul…
- 🚀Shipping2605.27360·May 26, 2026·~12 mincs.NIcs.AI
GENESIS: Harnessing AI Agents for Autonomous 6G RAN Synthesis, Research, and Testing
Tamerlan Aghayev, Maxime Elkael, Michele Polese, Minh Dat Nguyen, +10
⭐ 1.3k stars / 33 repos📚 0 citesCellular research and development (R&D) is throttled by six structural processes that each consume months of manual engineering work per iteration: (i) synthesizing new features from standards or research papers into production code; (ii) conformance and interoperability testing; (iii) hardening against field anomalies…
- 🚀Shipping2605.27358·May 26, 2026·~10 mincs.LGcs.AIcs.CL
MobileMoE: Scaling On-Device Mixture of Experts
Yanbei Chen, Hanxian Huang, Ernie Chang, Jacob Szwejbka, +4
⭐ 121 stars / 40 repos📚 0 citesMixture-of-Experts (MoE) has become the de facto architecture for hundred-billion-parameter language models, yet its advantages at sub-billion scales for on-device deployment remain largely unexplored. To close this gap, we present MobileMoE, a family of on-device MoE language models with sub-billion active parameters…
- 🚀Shipping2605.27355·May 26, 2026·~10 mincs.AIcs.CLcs.LG
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
Dongyoon Hahm, Dylan Hadfield-Menell, Kimin Lee
⭐ 126 stars / 37 repos📚 0 citesReinforcement Learning from Human Feedback (RLHF) is the standard method to align Large Language Models (LLMs) with human preferences. In this work, we introduce alignment tampering, a potential vulnerability where the LLM undergoing alignment influences the preference dataset, causing RLHF to amplify undesired behavio…
- 🚀Shipping2605.27354·May 26, 2026·~9 mincs.LGcs.AIcs.CL
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders
Yi Jing, Zao Dai, Jinwu Hu, Zijun Yao, +3
⭐ 500 stars / 38 repos📚 0 citesModel internals encode rich information about how a large language model (LLM) processes its training data; however, post-training data engineering largely relies on external signals and ignores rich intrinsic signals lying in model internals. We propose SAERL, a data engineering framework for LLM reinforcement learnin…
- 2605.27352·May 26, 2026·~12 mincs.LGstat.ML
From Scores to Gibbs Correctors: Accelerating Uniform-Rate Discrete Diffusion Models
Yuchen Liang, Ness Shroff, Yingbin Liang
Discrete diffusion models have achieved strong empirical performance in text and other symbolic domains, but, especially for uniform-rate models, they often require many steps to generate a single sample. Existing acceleration methods either rely on training additional quantities or suffer from slow mixing. In this wor…
- 2605.27348·May 26, 2026·~13 mincs.CVcs.AI
When Eyes Betray AI: Social Gaze Consistency as a Semantic Cue for AI-Generated Image Detection
Kim Jihyeon, Sohee Kim, Soosan Lee, Souhwan Jung, +2
Recent generative models have largely closed the gap on low-level artifacts - pixel fingerprints, frequency anomalies, upsampling traces - particularly in person-centric and partial-edit settings where the manipulated region is small and surrounded by photometrically authentic content. We introduce Social Gaze Consiste…
- 2605.27345·May 26, 2026·~12 mincs.CL
MATCHA: Matching Text via Contrastive Semantic Alignment
Siran Li, Ece Sena Etoglu, Carsten Eickhoff, Seyed Ali Bahrainian
Reliable evaluation is essential for understanding large language model (LLM) performance, yet today's go-to metrics, namely token-overlap scores (e.g., ROUGE) and embedding-based measures (e.g., BERTScore), often misjudge semantic similarity of documents. Our study shows that both token-overlap metrics and embedding-b…
- 2605.27343·May 26, 2026·~6 mincs.CVcs.LG
Towards Controllable Image Generation through Representation-Conditioned Diffusion Models
Nithesh Chandher Karthikeyan, Jonas Unger, Gabriel Eilertsen
Diffusion models have emerged as powerful tools for high-quality image generation and editing, but guiding these models to produce specific outputs remains a challenge. Conventional approaches rely on conditioning mechanisms, such as text prompts or semantic maps, which require extensively annotated datasets. In this p…
- 2605.27338·May 26, 2026·~7 mincs.AIcs.CCcs.CL
2-ASP(Q) programs with weak constraints: Complexity and efficient implementation
Andrea Cuteri, Giuseppe Mazzotta, Francesco Ricca
ASP(Q) extends Answer Set Programming (ASP) with Quantifiers over answer sets. In this paper we focus on the class of ASP(Q) programs with two quantifiers and weak constraints, denoted as 2-ASP(Q)^w. 2-ASP(Q)^w is a practically relevant fragment of ASP(Q) that is expressive enough to capture optimization problems up to…
- 2605.27333·May 26, 2026·~8 mincs.CL
FinHarness: An Inline Lifecycle Safety Harness for Finance LLM Agents
Haoxuan Jia, Yang Liu, Bin Chong, Yingguang Yang, +9
Finance LLM agents must simultaneously block prompt-induced unauthorized actions and approve legitimate multi-step business workflows. However, boundary filters often miss irreversible mid-trajectory tool calls, while post-hoc LLM judges perform auditing only after termination -- too late for intervention and at a comp…
- 2605.27332·May 26, 2026·~10 mincs.SEcs.AIcs.CV
EdgeFlow: Edge-Map Augmented VLM-Based Flowchart Processing for Industrial Requirements Engineering
Zhifei Dou, Shabnam Hassani, Ou Wei
Flowcharts are widely used in industrial requirements, but usually remain embedded as static images. Vision Language Models (VLMs) show promise in the conversion of these flowcharts into machine-readable models for RE activities, yet, when directly applied to flowchart conversion, they often fail on topology-critical v…
- 2605.27331·May 26, 2026·~9 mincs.AI
Maat: The Agentic Legal Research Assistant for Competition Protection
Basant Mounir, Farida Madkour, Amira Abdelaziz, Asmaa Sami
Competition law experts conducting legal research must review extensive volumes of cases, decisions, and judicial reports to identify precedents and assess key elements in competition and merger cases. Although general research assistants such as Claude and ChatGPT and legal assistants such as SaulLM-7B and LegalGPT ar…
- 2605.27328·May 26, 2026·~10 mincs.SEcs.AIcs.MA
Governed Evolution of Agent Runtimes through Executable Operational Cognition
Mariano Garralda-Barrio
Recent advances in agentic systems increasingly treat code as an executable operational substrate rather than as a disposable output artifact. Prior work such as \emph{Code as Agent Harness} frames validated agent-generated artifacts as runtime entities that can be created, executed, revised, persisted, and reused with…
- 2605.27322·May 26, 2026·~6 mincs.CL
Semantic Gradients Interactions in SSD: A Case Study in Racial Identity and Hate Speech
Felix Ostrowicki, Hubert Plisiecki
We introduce interaction SSD, an extension of Supervised Semantic Differential that models how semantic meaning varies across moderators such as groups, traits, or conditions making this variation testable and interpretable. The method estimates a main semantic gradient, an interaction gradient, and conditional gradien…
- 2605.27320·May 26, 2026·~7 mincs.AIcs.CYecon.GN
Modeling Agentic Technical Debt and Stochastic Tax: A Standalone Framework for Measurement, Simulation, and Dashboarding
Muhammad Zia Hydari, Raja Iqbal, Narayan Ramasubbu
Agentic AI systems combine probabilistic reasoning with delegated action through tools, context, memory, orchestration, and external workflow integration. This note develops a formal and managerially usable model that distinguishes Agentic Technical Debt from Stochastic Tax. Agentic Technical Debt is a stock of accumul…
- 2605.27316·May 26, 2026·~6 mincs.LGmath.OC
Probabilistic Smoothing with Ratio-Monotone Transforms for Global Optimization
Kukyoung Jang, Taehyun Cho, Junrui Zhang, Ping Xu, +1
Probabilistic smoothing is a standard tool for global optimization, but existing methods rely on Gaussian kernels and specific transforms, often resulting in strong hyperparameter sensitivity and limited robustness. We propose a general smoothing framework that combines flexible symmetric unimodal kernels with monotoni…
- 2605.27315·May 26, 2026·~9 mincs.CL
Real Images, Worse Judgments: Evaluating Vision-Language Models on Concreteness and Imagery
Yifan Jiang, Ruoxi Ning, Sheng Yao, Freda Shi
Visual inputs are often assumed to improve language understanding in multimodal models. We examine this assumption by asking whether vision-language models (VLMs) can distinguish useful visual evidence from incidental image context in lexical judgments. We use human concreteness and imagery ratings because they span wo…
- 2605.27313·May 26, 2026·~9 mincs.CL
When Does Demographic Information Help? Data and Modeling Regimes for Perspective-Aware Hate Speech Detection
Weibin Cai, Reza Zafarani
Demographic information is often used to model annotator perspectives in subjective tasks such as hate speech detection, but its benefit is inconsistent: it improves performance in some settings and behaves as noise in others. This paper asks when demographic features help. We analyze demographic gain as a function of…
- 2605.27311·May 26, 2026·~8 mincs.CLcs.CV
Chartographer: Counterfactual Chart Generation for Evaluating Vision-Language Models
Yifan Jiang, Dae Yon Hwang, Jesse C. Cresswell, Freda Shi
Chart question-answering (QA) benchmarks aim to pose questions that require visual reasoning to correctly answer, but models can often reach solutions through shortcuts or prior familiarity with a chart based on their own background knowledge. To strictly evaluate visual reasoning, we propose counterfactual charts wher…
- 2605.27309·May 26, 2026·~8 mincs.LGcs.OH
Greening AI Inference with Accuracy and Latency-aware User Incentives
Vasilios A. Siris, Adamantia Stamou, George D. Stamoulis, Konstantinos Varsos, +1
The widespread use of AI services has raised concerns for its environmental sustainability, towards which recent studies have identified carbon emissions of AI inference as the major contributor. This paper introduces a framework for designing AI inference incentives based on the users' valuation for inference quality…
- 2605.27306·May 26, 2026·~8 mincs.LG
Normal Guidance is what Attention Needs
Ethan Harvey, Dennis Johan Loevlie, Michael C. Hughes
We consider training classifiers for 3D medical images using only one binary label for the entire volume rather than a label for each 2D slice. In such weakly supervised settings, can we learn accurate classifiers for slice-level predictions? Attention-based multiple instance learning (MIL) can produce an attention sco…
- 2605.27299·May 26, 2026·~8 mincs.CRcs.AIcs.HC
Risk Averse Alert Prioritization for IDS Using Subnormal Gaussian Fuzzy Models
Murat Moran
Modern intrusion detection systems generate thousands of alerts daily, but alert fatigue severely limits security operations effectiveness due to too many false positives or low-impact events. We address this by proposing a principled framework for alert prioritization based on subnormal Gaussian fuzzy numbers, explici…
- 2605.27298·May 26, 2026·~12 mincs.CL
Self-Ensembling Vision-Language Models for Chart Data Extraction
Thomas Berkane, Qianyi Wang, Maimuna S. Majumder
Charts effectively convey quantitative information, but the underlying data are often locked in image form, hindering reuse and analysis. Manually digitizing charts is time-consuming and error-prone, motivating automatic chart-to-table extraction. Recent approaches use specialized vision-language models (VLMs), yet per…
- 2605.27296·May 26, 2026·~8 mincs.CL
Probing Cultural Awareness in LLMs: A Case Study of Cross-Culture Aesthetic Stylistics
Jiashuo Wang, Fenggang Yu, Jian Wang, Chak Tou Leong, +5
Large Language Models (LLMs) are increasingly deployed in diverse cultural contexts, yet their ability to master aesthetic stylistics, i.e., the strategic use of language to evoke cultural resonance, remains underexplored. We curate C4STYLI, a benchmark of highly stylized translated movie titles and advertising slogans…
- 2605.27294·May 26, 2026·~11 mincs.CLcs.IR
Separating Semantic Competition from Context Length in RAG Reading
Vyzantinos Repantis, Ameya Gawde, Harshvardhan Singh, Rohit Alekar, +3
Retrieval-augmented generation (RAG) systems can respond incorrectly even when the correct passage was retrieved. The model must still read the retrieved passages and identify which one contains the answer among others that look relevant. This passage-reading model is called the reader. Does it fail simply because the…
- 2605.27293·May 26, 2026·~8 mincs.LGstat.ML
BASIS: Batchwise Advantage Estimation from Single-Rollout Information Sharing for LLM Reasoning
Shijin Gong, Erhan Xu, Kai Ye, Francesco Quinzan, +2
Reinforcement learning with verifiable rewards has become a standard recipe for improving the reasoning abilities of large language models. Existing algorithms face a tradeoff between computational efficiency and sample efficiency in value estimation and policy learning. We introduce BASIS, a critic-free post-training…
- 2605.27292·May 26, 2026·~9 mincs.LGstat.ML
Detectability in Diversity: Improved Canary Crafting for Privacy Auditing in One Run
Mathieu Dagréou, Aurélien Bellet
Privacy auditing aims to empirically assess privacy leakage in machine learning models using membership inference attacks (MIAs), and to derive lower bounds on differential privacy (DP) parameters. Recent one-run auditing methods address the high cost of standard approaches by relying on a single training run with mult…
- 2605.27288·May 26, 2026·~10 mincs.CLcs.AIcs.LG
It's Not Always Sycophancy: Measuring LLM Conformity as a Function of Epistemic Uncertainty
Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, +4
Large language models (LLMs) are known to abandon their initial stance to conform to user pushback. While prior research largely attributes this behavior to sycophancy learned during reinforcement learning from human feedback, we hypothesize that conformity is also driven by a model's epistemic uncertainty at inference…
- 2605.27286·May 26, 2026·~10 mincs.LGcs.AI
Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling
Yiding Liu, Yifan Hu, Hongjie Xia, Peiyuan Liu, +4
Time series foundation models (TSFMs) are transforming the forecasting paradigm through large-scale cross-domain pretraining. However, most existing TSFMs remain univariate, and recent efforts to enable cross-variate modeling still operate directly within the raw variate space. This design introduces fundamental limita…
- 2605.27284·May 26, 2026·~13 mincs.ROcs.AI
FineVLA: Fine-Grained Instruction Alignment for Steerable Vision-Language-Action Policies
Xintong Hu, Xuhong Huang, Jinyu Zhang, Yutong Yao, +10
Vision-Language-Action (VLA) models are increasingly expected to not only complete robot tasks, but also follow human instructions about how those tasks should be executed. However, existing robot datasets usually pair trajectories with coarse goal-level language, leaving execution-critical details such as active arm,…
- 2605.27281·May 26, 2026·~10 mincs.LGstat.ML
Causal Risk Minimization for High-Dimensional Treatments
Nikita Dhawan, Arnav Paruthi, Andrew Kim, Lovedeep Gondara, +2
Predicting the effect of interventions with many possible variations, e.g., therapeutic content that affects mental health outcomes or an earnings call transcript that drives movement in share price, is useful across several domains. However, classical causal estimators tend to assume that all possible interventions ar…
- 2605.27276·May 26, 2026·~11 mincs.AIcs.CL
SIA: Self Improving AI with Harness & Weight Updates
Prannay Hebbar, Yogendra Manawat, Samuel Verboomen, Alesia Ivanova, +3
Humans are the bottleneck in building and improving AI. Both the models and the agents that wrap them are written, tuned, and corrected by people. The long-horizon goal of an AI that can figure out how to improve itself remains open. Two largely disjoint research lines attack this bottleneck. The harness-update school…
- 2605.27269·May 26, 2026·~9 mincs.LGstat.AP
Transfer Learning using 66 Diseases for Disease Forecasting Applications
Lauren J Beesley, Alexander C Murph, Dave Osthus, Lauren A Castro
Disease forecasting models typically rely on a single data stream, making models brittle when histories are short or noisy. Recent top-performing models have shown that synthesizing multiple reporting systems for the same disease improves performance. Other recent work takes this idea a step further, using transfer lea…
- 2605.27268·May 26, 2026·~10 mincs.CLcs.AI
Lost in Sampling: Assessing Lexical Reachability in LLMs via the Word Coverage Score (WCS)
Samer Awad, Javier Conde, Carlos Arriaga, Tairan Fu, +2
Modern Large Language Models (LLMs) are often criticized for producing repetitive and homogeneous text, despite possessing vast latent vocabularies. While previous research has focused on model knowledge and training data, we investigate the role of decoding mechanics in suppressing linguistic diversity. We introduce t…
- 2605.27259·May 26, 2026·~9 mincs.LG
Kan Extension Transformers: A Categorical Unification of Attention, Diffusion, and Predict-Detach Self-Conditioning
Sridhar Mahadevan
We propose Kan Extension Transformers (KETs) as a unifying categorical framework for a diverse group of Transformer implementations. The core claim is that a Transformer layer can be viewed as a weighted structured extension operator: standard attention is the singleton-neighborhood case, Geometric Transformer style in…
- 2605.27258·May 26, 2026·~9 mincs.SDcs.AI
PilotTTS: A Disciplined Modular Recipe for Competitive Speech Synthesis
Bowen Li, Shaotong Guo, Zhen Wang, Yang Xiang, +10
Building state-of-the-art text-to-speech (TTS) systems typically demands millions of hours of proprietary data and complex multi-stage architectures, creating substantial barriers for resource-constrained research teams. In this report, we present PilotTTS, a lightweight autoregressive TTS system that achieves competit…
- 2605.27255·May 26, 2026·~11 mincs.CLcs.AI
Pair-In, Pair-Out: Latent Multi-Token Prediction for Efficient LLMs
Wenhui Tan, Minghao Li, Xiaoqian Ma, Siqi Fan, +4
Long chain-of-thought reasoning has made autoregressive decoding the dominant inference cost of modern large language models. Existing methods target either the input side (latent compression) or the output side (speculative decoding and multi-token prediction, MTP), but the two lines of work have been pursued independ…
- 2605.27254·May 26, 2026·~14 mincs.LGcs.AI
LUCoS: Latent Unsupervised Context Selection for Tabular Foundation Models
Oroel Ipas, Guillermo Gomez-Trenado, Rocío Romero-Zaliz, Isaac Triguero
Selecting which instances to label is a key challenge in low-label tabular learning. For recent Tabular Foundation Models such as TabPFN, context selection directly determines predictive performance. Supervised oracle experiments show that carefully chosen labeled context sets can strongly outperform random selection u…
- 2605.27249·May 26, 2026·~8 mincs.AIcs.CL
Gumbel Machine: Counterfactual Student Writing Generation via Gumbel Noise Steering
Hunter McNichols, Alexander Scarlatos, Mihai Dascalu, Danielle McNamara, +1
An effective method of teaching across disciplines is to provide examples of high-quality work. However, an example may be significantly different from a student's current work, making it challenging for them to emulate. An ideal learning demonstration is a counterfactual version of the student work, an improved versio…
- 2605.27246·May 26, 2026·~6 mincs.LOcs.AImath.LO
Many Logics, One Methodology: A Plea for Logical Pluralism in Formalised Reasoning (preprint)
Christoph Benzmüller, Daniel Kirchner, Luca Pasetto
This position statement looks back on two decades of work on shallow embeddings of non-classical logics in classical higher-order logic (HOL), a line of research that expanded into a range of logic embeddings in HOL and inspired the LogiKEy logic-pluralistic knowledge representation and reasoning methodology. This pape…
- 2605.27245·May 26, 2026·~13 mincs.LG
Symbolic Regression via Latent Iterative Refinement
Xieting Chu, Sriram Vishwanath, Vijay Ganesh
Symbolic regression (SR) seeks closed-form mathematical expressions that fit observed data. Neural SR methods amortize the search by training an encoder to map observations directly to expressions in a single pass, but this amortized inference leaves a residual amortization gap between its one-shot prediction and the t…
- 2605.27240·May 26, 2026·~9 mincs.CL
ENPMR-Bench: Benchmarking Proactive Memory Retrieval for Emotional Support Agents
Xing Fu, Yulin Hu, Mengtong Ji, Haozhen Li, +4
Memory-augmented language agents are increasingly deployed in affective applications such as emotional support, where understanding and responding to users' latent emotional needs is critical. However, existing research often treats memory as a tool for factual retrieval, overlooking its role in shaping users' emotiona…
- 2605.27239·May 26, 2026·~10 mincs.CL
Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora
Idris Abdulmumin, Mokgadi Penelope Matloga, Tadesse Destaw Belay, Botshelo Kondowe, +4
Annotation quality is difficult to sustain when campaigns span weeks or months with small annotator pools. We present a Setswana sentiment dataset of 3,565 tweets annotated by three native-speaker annotators across eight batches and examine why inter-annotator agreement (IAA) declines over time. Despite an aggregate Ra…
- 2605.27236·May 26, 2026·~12 mincs.LGphysics.ao-ph
Explainable Comparison of Feature-Based and Deep Learning Models for TROPOMI Methane Plume Screening
Solomiia Kurchaba, Joannes D. Maasakkers, Berend J. Schuit, Ilse Aben
Continuous and global detection of large methane emissions is a crucial step for global warming mitigation. Satellite observations, such as from S5P/TROPOMI, combined with plume detection algorithms, can play a key role in this effort. However, not all TROPOMI plume detections that look like methane emission plumes are…
- 2605.27220·May 26, 2026·~12 mincs.CLcs.IR
The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System
Zafar Hussain, Kristoffer Nielbo
In modern RAG pipelines, query augmentation methods such as HyDE and query expansion are applied to every query, resulting in substantial LLM inference costs and increased end-to-end latency. The empirical justification for this overhead in real production traffic remains largely unexplored. We present a case study of…
- 2605.27219·May 26, 2026·~12 mincs.LGstat.ML
Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis
Yamato Suetake, Yuta Kawakami, Shunnosuke Ikeda, Yuichi Takano
Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation f…