🚀Shippingscore 99.0May 15, 2026·2605.16215cs.AIcs.CL

Fully Open Meditron: An Auditable Pipeline for Clinical LLMs

Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan, Sahaj Vaidya, Victor Cartier-Negadi, David Sasu, Lars Klein, Mary-Anne Hartley

PDF ↗arXiv ↗

Narrative

Fully Open Meditron is a complete, auditable training pipeline for clinical LLMs — releasing not just model weights but the full data provenance, curation code, and evaluation protocol. The training corpus unifies eight medical QA datasets plus three synthetic extensions (46,469 clinical guidelines, exam-style QA, clinical vignettes), with decontamination and validation by a four-physician panel. Applied to five base models, the best variant (Apertus-70B-MeditronFO) improves +6.6 points on aggregate medical benchmarks over its base, and Gemma-3-27B-MeditronFO beats MedGemma on HealthBench (58% vs 55.9%) and in 58.6% of head-to-head judge comparisons.

No production traction yet — zero citations and the GitHub references are all automated arxiv digest trackers, not downstream builders. The Meditron brand has prior academic recognition from EPFL's earlier work, so this will likely attract attention from health system AI teams navigating regulatory scrutiny over opaque training pipelines, but nothing is shipping against it yet.

Abstract

Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. Most "open" models are open-weight only, releasing parameters while withholding the data provenance, curation procedures, and generation pipelines that determine model behavior. Fully Open (FO) models, which expose the complete training stack end-to-end, do not currently exist in medicine. We introduce Fully Open Meditron, the first fully open pipeline for building LLM-CDSS, comprising a clinician-audited training corpus, a reproducible data construction and training framework, and a use-aligned evaluation protocol. The corpus unifies eight public medical QA datasets into a normalized conversational format and expands coverage with three clinician-vetted synthetic extensions: exam-style QA, guideline-grounded QA derived from 46,469 clinical practice guidelines, and clinical vignettes. The pipeline enforces system-wide decontamination, gold-label resampling of teacher generations, and end-to-end validation by a four-physician panel. We evaluate using an LLM-as-a-judge protocol over expert-written clinical vignettes, calibrated against 204 human raters. We apply the recipe to five FO base models (Apertus-70B/8B-Instruct, OLMo-2-32B-SFT, EuroLLM-22B/9B-Instruct). All MeditronFO variants are preferred over their bases. Apertus-70B-MeditronFO improves +6.6 points over its base (47.2% to 53.8%) on aggregate medical benchmarks, establishing a new FO SoTA. Gemma-3-27B-MeditronFO is preferred over MedGemma in 58.6% of LLM-as-a-judge comparisons and outperforms it on HealthBench (58% vs 55.9%). These results show that fully open pipelines can achieve state-of-the-art domain-specific performance without sacrificing auditability or reproducibility.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 434
Repos: 25
Citations: 0
Velocity: 0.00/d

GitHub repos (20)

luohongk/Embodied-AI-Daily⭐ 245
“| **Title** | **Date** | **Comment** | | --- | --- | --- | | **[Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search](https://arxiv.org/abs/2605.16238v1)** | 2026-05-15 | | | **[Fully Open Meditron: An Auditable Pipeline for Clinical LLMs](https”
Tavish9/awesome-daily-AI-arxiv⭐ 92
“ Large language models can generate executable code for educational animations, but the resulting renders often exhibit visual defects, including element overlap, misalignment, and broken animation continuity. These defects cannot be reliably detected from the code alone and bec”
CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16238', 'title': 'Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search', 'authors': 'Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, Nicholas G. Reich, Zahra Shamsi', 'link': 'https://arxiv.org/a”
ZenAlexa/agi-brief-history⭐ 11
“- **Summary**: Deep research agents have achieved remarkable progress on complex information seeking tasks. Even long ReAct style rollouts explore only a single trajectory, while recent state of the art systems scale inference time compute via parallel search and aggregation. Yet”
ehijano/rss_fetch⭐ 11
“ </item> <item> <title>Fully Open Meditron: An Auditable Pipeline for Clinical LLMs</title> <link>https://arxiv.org/abs/2605.16215</link> <description>arXiv:2605.16215v1 Announce Type: cross Abstract: Clinical decision support systems (CDSS) require scru”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-20-Fully Open Meditron: An Auditable Pipeline for Clinical LLMs {% endnote %} **链接**: https://arxiv.org/abs/2605.16215 **作者**: Xavier Theimer-Lienhard,Mushtaha El-Amin,Fay Elhassan,Sahaj Vaidya,Victor Cartier-Negadi,David Sasu,L”
jyyang621/DailyArXiv⭐ 8
“| **Title** | **Date** | **Comment** | | --- | --- | --- | | **[Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search](https://arxiv.org/abs/2605.16238v1)** | 2026-05-15 | | | **[Fully Open Meditron: An Auditable Pipeline for Clinical LLMs](https”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 3. Fully Open Meditron: An Auditable Pipeline for Clinical LLMs - **Authors**: Xavier Theimer-Lienhard , Mushtaha El-Amin , Fay Elhassan , Sahaj Vaidya , Victor Cartier-Negadi , David Sasu , Lars Klein , Mary-Anne Hartley - **URL**: [https://arxiv.org/abs/2605.16215](https://”
MayDomine/arxiv_rss_bot⭐ 3
“ --- ### 30. [Fully Open Meditron: An Auditable Pipeline for Clinical LLMs](https://arxiv.org/abs/2605.16215) **Authors**: Xavier Theimer-Lienhard, Mushtaha El-Amin, Fay Elhassan, Sahaj Vaidya, Victor Cartier-Negadi, David Sasu, Lars Klein, Mary-Anne Hartley **Category**: cs.”
NeoCodeSmith/NeoSignal⭐ 1
“ { "id": "d65ba83733bd", "title": "Fully Open Meditron: An Auditable Pipeline for Clinical LLMs", "url": "https://arxiv.org/abs/2605.16215", "summary": "arXiv:2605.16215v1 Announce Type: new Abstract: Clinical decision support systems (CDSS) require scr”
noCharger/ai-dashboard⭐ 1
“ "Mushtaha El-Amin", "Fay Elhassan" ], "url": "https://arxiv.org/abs/2605.16215", "categories": [ "cs.AI", "cs.CL"”
mghnasiri/PORID⭐ 0
“ </item> <item> <title>Fully Open Meditron: An Auditable Pipeline for Clinical LLMs</title> <link>http://arxiv.org/abs/2605.16215v1</link> <guid isPermaLink="true">http://arxiv.org/abs/2605.16215v1</guid> <description>Xavier Theimer-Lienhard, Mushta”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16198": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16205": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16207": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16215": "2026-05-18T07:51:44”
pchaganti/pchaganti.github.io⭐ 0
“ { "title": "Fully Open Meditron: An Auditable Pipeline for Clinical LLMs", "summary": "Clinical decision support systems (CDSS) require scrutable, auditable pipelines that enable rigorous, reproducible validation. Yet current LLM-based CDSS remain largely opaque. ”
shaijing/arxiv-paper⭐ 0
“| **Title** | **Date** | **Abstract** | **Comment** | | --- | --- | --- | --- | | **[Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search](https://arxiv.org/abs/2605.16238v1)** | 2026-05-15 | <details><summary>Show</summary>Probabilistic forec”
sirichen2/sirichen2.github.io⭐ 0
“ "Lars Klein", "Mary-Anne Hartley" ], "abs_url": "https://arxiv.org/abs/2605.16215v1", "pdf_url": "https://arxiv.org/pdf/2605.16215v1", "published": "2026-05-15T17:29:08+00:00", "updated": "2026-05-15T17:29:08+00:00",”
stephendongg/ai-tns-digest⭐ 0
“- [Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems](https://arxiv.org/abs/2605.16198v1) - It proposes concrete methods for lifecycle auditing and compliance monitoring of advanced AI systems. Source: arXiv. Published: 2026-05”
brianbaldock/aigregator⭐ 0
“<ul> <li>4 🔬 🟡 ▤×1 🏷️ agents, memory FORGE: self-evolving agent memory via population broadcast, no weight updates. Staged population-based protocol evolves prompt-injected natural-language memory; agents improve decisions without gradient steps. Sources: <a h”
Varelser/varelser.github.io⭐ 0
“ <article class="digest-article" data-month="2026-05" data-genre="LLM / NLP" data-field="cs.AI" data-source="arXiv cs.AI" data-area="AI" data-categories="cs.AI, cs.CL" data-title="Fully Open Meditron: 監査可能な臨床LLMパイプライン" data-timestamp="1778866148000" data-url="https://arxiv.org/ab”
Xkrilandar/xavier-theimer-lienhard.github.io⭐ 0
“ <ul style="margin: 1rem 0; color: #6b7280;"> <li>MeditronFO (<a href="https://arxiv.org/abs/2605.16215">preprint</a>, <a href="https://huggingface.co/collections/EPFLiGHT/meditronfo">🤗models</a>), the first fully open medical LLMs.</li>”