🚀Shippingscore 76.6May 15, 2026·2605.16107cs.CL

Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection

Chenwang Wu, Yiuming Cheung, Bo Han, Shuhai Zhang, Defu Lian

Narrative

Token-level detection scores for MGT detectors are noisy because LLM outputs are inherently stochastic — the same prompt can produce text with wildly varying per-token log-probabilities. This paper tackles that by modeling how token-level scores relate to each other across a sequence: a Markov-informed calibration module smooths local transitions, while a rule-support reasoning module applies logical rules derived from global score statistics. The combined framework sits on top of existing metric-based detectors (like DetectGPT, Fast-DetectGPT) rather than replacing them, and claims broad gains across cross-LLM and cross-domain benchmarks with minimal added compute.

No production traction yet. The GitHub references are all arxiv feed aggregators, not implementations. Zero citations at time of writing. The work is recent and the underlying idea — stacking a lightweight inference layer on existing zero-shot detectors — is practical enough to ship, but nothing is deployed or even open-sourced from the authors as of now.

Abstract

Machine-generated texts (MGTs) pose risks such as disinformation and phishing, underscoring the need for reliable detection. Metric-based methods, which extract statistically distinguishable features of MGTs, are often more practical than complex model-based methods that are prone to overfitting. Given their diverse designs, we first place representative metric-based methods within a unified framework, enabling a clear assessment of their advantages and limitations. Our analysis identifies a core challenge across these methods: the token-level detection score is easily biased by the inherent randomness of the MGTs generation process. Then, we theoretically derive the multi-hop transitions of the token-level detection score and explore their local and global relations. Based on these findings, we propose a multi-level contextual token relation modeling framework for MGT detection. Specifically, for local relations, we model them through a lightweight Markov-informed calibration module that refines token-level evidence before aggregation. For global relations, we introduce a rule-support reasoning module that uses explicit logical rules derived from contextual score statistics. Finally, we combine the local calibrated score and the global rule-support reasoning signal in a joint multi-level inference framework. Extensive experiments show broad and substantial improvements across various real-world scenarios, including cross-LLM and cross-domain settings, with low computational overhead.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 180
Repos: 8
Citations: 0
Velocity: 0.00/d

GitHub repos (8)

Tavish9/awesome-daily-AI-arxiv⭐ 92
“ Large Language Models (LLMs) have demonstrated strong capabilities across diverse NLP applications, such as translation, text generation, and question answering. Nevertheless, they remain limited in complex settings that demand deep reasoning and logical inference. Since these ”
CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16191', 'title': 'Optimized Three-Dimensional Photovoltaic Structures with LLM guided Tree Search', 'authors': 'Michael P. Brenner, Lizzie Dorfman, John C. Platt', 'link': 'https://arxiv.org/abs/2605.16191', 'abstract': "We present a case study for how AI”
wwd29/arxiv-daily⭐ 21
“<ul> <li><strong>Authors: </strong>Chenwang Wu, Yiuming Cheung, Bo Han, Shuhai Zhang, Defu Lian</a></li> <li><strong>Subjects: </strong>cs.CL</a></li> <li><strong>Abstract URL: </strong><a href="https://arxiv.org/abs/2605.16107">https://arxiv.org/abs/2605.16107</a></li> <li><stro”
ehijano/rss_fetch⭐ 11
“ </item> <item> <title>Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection</title> <link>https://arxiv.org/abs/2605.16107</link> <description>arXiv:2605.16107v1 Announce Type: new Abstract: Machine-generated texts (MGTs) p”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-73-Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection {% endnote %} **链接**: https://arxiv.org/abs/2605.16107 **作者**: Chenwang Wu,Yiuming Cheung,Bo Han,Shuhai Zhang,Defu Lian **类目**: Computation a”
pchaganti/pchaganti.github.io⭐ 0
“ { "title": "Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection", "summary": "Machine-generated texts (MGTs) pose risks such as disinformation and phishing, underscoring the need for reliable detection. Metric-based methods, which e”
sirichen2/sirichen2.github.io⭐ 0
“ <div class="paper-head"> <div> <span class="paper-flag">Matched to your radar</span> <h2><a href="https://arxiv.org/abs/2605.16107v1" target="_blank" rel="noopener noreferrer">Multi-Level Contextual Token Relatio”
Jinsu-L/DailyIR⭐ 0
“ { "title": "Multi-Level Contextual Token Relation Modeling for Machine-Generated Text Detection", "abstract": "Machine-generated texts (MGTs) pose risks such as disinformation and phishing, underscoring the need for reliable detection. Metric-based methods, wh”