🚀Shippingscore 74.4May 15, 2026·2605.16103cs.AI

Sign-Separated Finite-Time Error Analysis of Q-Learning

Donghwan Lee

Narrative

Constant step-size Q-learning updates are known to overestimate action values due to the Bellman max operator, but the asymmetry between how positive and negative errors propagate has lacked a clean theoretical treatment. This work decomposes Q-learning error componentwise into negative and positive parts, showing the negative side can be bounded by a stable LTI system tied to the optimal policy — which converges at least as fast as the positive-side bound governed by a linear switching system. The key finding is that positive errors get selected and amplified by the Bellman maximum while negative errors do not, giving a formal finite-time explanation for Q-learning's overestimation bias in both deterministic and stochastic settings.

No production traction yet. The GitHub repos referencing it are generic arxiv aggregators, not implementations or downstream research tools. Zero citations on Semantic Scholar. This is pure theoretical RL analysis, useful as a reference for researchers working on Q-learning convergence guarantees but not something builders are picking up.

Abstract

This paper develops a sign-separated finite-time error analysis for constant step-size Q-learning. Starting from the switching-system representation, the error is decomposed into its componentwise negative and positive parts. The negative part is dominated by a lower comparison linear time-invariant (LTI) system associated with a fixed optimal policy, whereas the positive part is controlled by a linear switching system. The resulting bounds show that the negative-side LTI certificate is no slower than the positive-side switching certificate and may produce a faster exponential envelope. The analysis identifies a max-induced asymmetry in Q-learning error dynamics. This asymmetry is connected to overestimation: positive action-wise errors can be selected and propagated by the Bellman maximum, whereas negative errors admit an optimal-policy lower comparison. Finite-time bounds are provided for both deterministic and stochastic constant-step-size recursions.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 208
Repos: 6
Citations: 0
Velocity: 0.00/d

GitHub repos (6)

Tavish9/awesome-daily-AI-arxiv⭐ 92
“ Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we presen”
tangwen-qian/DailyArXiv⭐ 54
“| **[How Far Back in Time a Digital Twin Reflects the State of the Physical Object: Age of Staleness](https://arxiv.org/abs/2605.16176v1)** | 2026-05-15 | | | **[SwAIther-Precip: Lead-Time-Aware Bias Correction Enables Kilometer-Scale Downscaling of Global AI Precipitation Forec”
CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16143', 'title': 'Look Before You Leap: Autonomous Exploration for LLM Agents', 'authors': 'Ziang Ye, Wentao Shi, Yuxin Liu, Yu Wang, Zhengzhou Cai, Yaorui Shi, Qi Gu, Xunliang Cai, Fuli Feng', 'link': 'https://arxiv.org/abs/2605.16143', 'abstract': 'Larg”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-74-Sign-Separated Finite-Time Error Analysis of Q-Learning {% endnote %} **链接**: https://arxiv.org/abs/2605.16103 **作者**: Donghwan Lee **类目**: Artificial Intelligence (cs.AI) ***备注**: Donghwan Lee”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 11. Sign-Separated Finite-Time Error Analysis of Q-Learning - **Authors**: Donghwan Lee - **URL**: [https://arxiv.org/abs/2605.16103](https://arxiv.org/abs/2605.16103) - **Abstract**: > This paper develops a sign-separated finite-time error analysis for constant step-size Q-l”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16089": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16094": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16099": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16103": "2026-05-18T07:51:44”