🚀Shippingscore 79.0May 15, 2026·2605.16238cs.AI

Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search

Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, Nicholas G. Reich, Zahra Shamsi

Narrative

An LLM-guided tree search system autonomously writes, runs, and refines epidemiological forecasting code — iterating on model candidates without human intervention. In a live, real-time evaluation during the 2025–2026 US respiratory season, the ensemble of machine-generated models matched or beat the CDC's human-curated hub ensembles for influenza, COVID-19, and RSV out-of-sample. Key engineering details: log-scale reward metrics prevent reward hacking, and an automated judge enforces that generated code adheres to epidemiological theory rather than just fitting data patterns.

No production traction yet — zero citations and the GitHub references are all AI news aggregators, not implementations or forks. The work comes out of a team with ties to Harvard (Brenner) and UMass (Reich, who runs FluSight), which gives it credibility in the CDC forecasting ecosystem, and the prospective real-world evaluation is stronger evidence than most ML-for-epidemiology papers offer. Worth watching for whether it integrates into CDC FluSight infrastructure or spawns an open toolkit, but nothing is shipping today.

Abstract

Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic resolutions or emerging pathogens. Here, we present an autonomous system using Large Language Model (LLM)-guided tree search to iteratively generate, evaluate, and optimize executable forecasting software. In a fully prospective, real-time evaluation during the 2025-2026 US respiratory season, the system autonomously discovered methodologically diverse models for influenza, COVID-19, and respiratory syncytial virus (RSV). Aggregating these machine-generated models yielded an ensemble that consistently matched or outperformed the gold-standard, human-curated Centers for Disease Control and Prevention (CDC) hub ensembles out-of-sample. The system successfully navigated data-scarce "cold start" scenarios for RSV. Moreover, controlled retrospective ablations revealed that optimizing log-scale distance metrics prevents reward hacking, while an automated judge-in-the-loop ensures structural fidelity to complex scientific theories. By autonomously translating epidemiological theory into accurate, transparent code, this framework overcomes the modeling labor bottleneck, enabling rapid deployment of expert-level disease forecasting at unprecedented scales.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 171
Repos: 10
Citations: 0
Velocity: 0.00/d

GitHub repos (12)

Tavish9/awesome-daily-AI-arxiv⭐ 92
“ Rigorous evaluation of domain-specific language models requires benchmarks that are comprehensive, contamination-resistant, and maintainable. Static, manually curated datasets do not satisfy these properties. We present a graph-based evaluation harness that transforms structure”
CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16238', 'title': 'Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search', 'authors': 'Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, Nicholas G. Reich, Zahra Shamsi', 'link': 'https://arxiv.org/a”
flyryan/ai-news-aggregator⭐ 15
“ "id": "3c31122c210d", "title": "Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search", "content": "Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual mo”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-8-Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search {% endnote %} **链接**: https://arxiv.org/abs/2605.16238 **作者**: Sarah Martinson,Michael P. Brenner,Martyna Plomecka,Brian P. Williams,Nichol”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 1. Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search - **Authors**: Sarah Martinson , Michael P. Brenner , Martyna Plomecka , Brian P. Williams , Nicholas G. Reich , Zahra Shamsi - **URL**: [https://arxiv.org/abs/2605.16238](https://arxiv.”
ttmens/ai-radar-wiki⭐ 5
“暂无中文摘要 ## 链接 - 📄 arXiv: http://arxiv.org/abs/2605.16238v1 ## PM 视角解读 > 由 Stage 2 LLM 分析后补充”
NeoCodeSmith/NeoSignal⭐ 1
“ { "id": "63f43c841615", "title": "Prospective multi-pathogen disease forecasting using autonomous LLM-guided tree search", "url": "https://arxiv.org/abs/2605.16238", "summary": "arXiv:2605.16238v1 Announce Type: new Abstract: Probabilistic forecasting ”
noCharger/ai-dashboard⭐ 1
“ "Michael P. Brenner", "Martyna Plomecka" ], "url": "https://arxiv.org/abs/2605.16238", "categories": [ "cs.AI" ],”
RCZhao/ArxivDaily⭐ 0
“ <div class="arxiv-abstract-text">Probabilistic forecasting of infectious diseases is crucial for public health but relies on labor-intensive manual model curation by expert modeling teams. This bespoke development bottlenecks scalability to granular geographic res”
sirichen2/sirichen2.github.io⭐ 0
“ "Nicholas G. Reich", "Zahra Shamsi" ], "abs_url": "https://arxiv.org/abs/2605.16238v1", "pdf_url": "https://arxiv.org/pdf/2605.16238v1", "published": "2026-05-15T17:45:17+00:00", "updated": "2026-05-15T17:45:17+00:00",”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16232": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16233": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16234": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16238": "2026-05-18T07:51:44”
alanhou/blog⭐ 0
“:::en **Paper**: [2605.16238](https://arxiv.org/abs/2605.16238) **Authors**: Sarah Martinson, Michael P. Brenner, Martyna Plomecka, Brian P. Williams, Nicholas G. Reich, Zahra Shamsi”