💤Quietscore 71.4May 15, 2026·2605.16234cs.LGcs.AIcs.CL

Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find

Gabriel Garcia

Narrative

Two distinct swap-KL probes for measuring layer redundancy in transformers — replacement (can layer A substitute for layer B in-place?) and interchange (do layers approximately commute when swapped?) — can disagree substantially about which layers are safe to prune. Tested across Pythia 410M and 1.4B training checkpoints, the gap between the two protocols grows from initialization to convergence. At 8B scale, Qwen3-8B shows interchange-guided removal is several-fold safer than replacement-guided at the same pruning budgets, while Llama-3.1-8B ties the two protocols despite lower interchange KL — meaning a metric gap doesn't reliably predict a safety gap. The practical upshot: running both probes requires only unlabeled forward passes and can materially change which layers a compression pipeline flags as removable.

No production traction yet — zero citations and the GitHub repos are all automated arXiv feed scrapers with no implementation work. The core diagnostic is straightforward to implement on top of any existing layer-pruning codebase, but no one has built that out publicly as of this writing.

Abstract

When researchers ask whether two transformer layers are "equivalent" for compression, they often conflate distinct tests. Replacement asks whether one layer's map can substitute for another's in place; interchange asks whether two layers approximately commute when their positions are swapped. Both are output-grounded swap-KL probes, but they need not agree: on pretrained transformers the protocol gap can change which layers look safe to prune by several-fold under the same evaluator, especially when replacement distances are high. We measure both protocols across checkpoints and architectures. On a Pythia training trajectory (410M and 1.4B), the replacement-interchange gap grows from initialization to convergence. Under one matched WikiText-2 contract at 8B scale, Qwen3-8B enters a divergent regime: interchange-guided removal is several-fold safer than replacement-guided at the same layer budgets, while Llama-3.1-8B ties the two protocols for pruning cost even though interchange KL is lower, showing metric gaps need not map one-to-one to removal. Before layer removal or merging, score both swap-KLs on the target checkpoint; the diagnostic requires only unlabeled forward passes.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 73
Repos: 10
Citations: 0
Velocity: 0.00/d

GitHub repos (11)

CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16250', 'title': 'A Generative AI Framework for Intelligent Utility Billing CO 2 Analytics and Sustainable Resource Optimisation', 'authors': 'Pavan Manjunath, Thomas Pruefer', 'link': 'https://arxiv.org/abs/2605.16250', 'abstract': 'Distribution utilitie”
ehijano/rss_fetch⭐ 11
“ </item> <item> <title>Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find</title> <link>https://arxiv.org/abs/2605.16234</link> <description>arXiv:2605.16234v1 Announce Type: cross Abstract: When researcher”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-10-Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find {% endnote %} **链接**: https://arxiv.org/abs/2605.16234 **作者**: Gabriel Garcia **类目**: Machine Learning (cs.LG); Artificial Int”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 57. Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find - **Authors**: Gabriel Garcia - **URL**: [https://arxiv.org/abs/2605.16234](https://arxiv.org/abs/2605.16234) - **Abstract**: > When researchers ask whether two transformer ”
ttmens/ai-radar-wiki⭐ 5
“ "pillar": "capabilities", "type": "papers", "source_type": "papers", "url": "http://arxiv.org/abs/2605.16234v1", "date": "2026-05-18" }, {”
brianbaldock/aigregator⭐ 0
“<li>4 🔬 🟡 ▤×1 🏷️ agents, memory <strong>FORGE: self-evolving agent memory via population broadcast, no weight updates.</strong> Staged population-based protocol evolves prompt-injected natural-language memory; agents improve decisions without gradient steps. Sources: <a href="”
mghnasiri/PORID⭐ 0
“ { "title": "Layer Equivalence Is Not a Property of Layers Alone: How You Test Redundancy Changes What You Find", "authors": "Gabriel Garcia", "url": "http://arxiv.org/abs/2605.16234v1", "date": "2026-05-15" }, {”
Jinsu-L/DailyIR⭐ 0
“- **LLM Score**: 2 - **Keyword Score**: 1 - **Authors**: Gabriel Garcia - **URL**: <http://arxiv.org/abs/2605.16234v1> - **Submitted**: 2026-05-15 17:43:16 - **Comment**: 40 pages, 8 figures, 24 tables. Code and frozen JSON logs are not public during write-up; the authors plan to”
mirae0708/steven⭐ 0
“ > **Source:** [arXiv](http://arxiv.org/abs/2605.16234v1) > **Category:** Artificial_Intelligence/LLM”
sirichen2/sirichen2.github.io⭐ 0
“ "authors": [ "Gabriel Garcia" ], "abs_url": "https://arxiv.org/abs/2605.16234v1", "pdf_url": "https://arxiv.org/pdf/2605.16234v1", "published": "2026-05-15T17:43:16+00:00", "updated": "2026-05-15T17:43:16+00:00",”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16223": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16232": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16233": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16234": "2026-05-18T07:51:44”