🚀Shippingscore 78.0May 15, 2026·2605.16142cs.AIcs.LG

Property-Guided LLM Program Synthesis for Planning

Augusto B. Corrêa, André G. Pereira, Jendrik Seipp

Narrative

Using formal property verification instead of numeric scoring to guide LLM-based program synthesis, this work shows that counterexample-driven feedback — stopping evaluation early and telling the model exactly how a candidate failed — can cut program generation by 7x compared to score-based methods. The domain is AI planning: an LLM synthesizes heuristic functions for PDDL domains, and the target property is that every state reachable by strictly improving transitions has a strictly improving successor (making hill-climbing complete). Tested across ten planning domains on out-of-distribution tasks, the synthesized heuristics are effectively "direct" on nearly all test cases and require orders of magnitude less computation to evaluate than prior generation-based approaches.

No production traction yet. The GitHub references are all arxiv-tracking aggregators, not implementations or downstream users. Zero citations on Semantic Scholar. The ideas here are directly relevant to anyone building LLM-based code synthesis pipelines where formal specs exist — property-checking with counterexample feedback is a practical alternative to fuzzing or test-suite scoring — but nothing is shipping from this work yet.

Abstract

LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests. Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs. We study a different approach: property-guided LLM program synthesis. Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property. When the property is violated, we stop the evaluation early and provide the LLM with a concrete counterexample showing exactly how the program failed. This feedback drastically reduces both the number of program generations and the evaluation cost, and can guide the LLM to generate stronger programs. We evaluate this approach on PDDL planning domains, asking the LLM to synthesize direct heuristic functions: every state reachable by strictly improving transitions has a strictly improving successor. A heuristic with this property leads hill-climbing algorithm directly to a goal state. A counterexample-guided repair loop generates one candidate program, checks the property over a training set, and returns the first case that violates the property. We evaluate our approach on ten planning domains with an out-of-distribution test set. The synthesized heuristics are effectively direct on virtually all test tasks, and compared to the best prior generation method our approach generates seven times fewer programs per domain on average, solves more tasks without using search, and requires several orders of magnitude less computation to evaluate candidates. Whenever a problem admits a verifiable property, property-guided LLM synthesis can reduce cost and improve program quality.

Citation timeline

Not enough citation snapshots yet to plot a timeline. Come back after a few cron runs.

Signal

Stars: 156
Repos: 10
Citations: 0
Velocity: 0.00/d

GitHub repos (10)

Tavish9/awesome-daily-AI-arxiv⭐ 92
“ LLM-driven program evolution has emerged as a powerful tool for automated scientific discovery, yet existing frameworks offer no principled guide for designing their individual components and provide no guarantee that the search converges. We introduce SMCEvolve, which recasts ”
CSQianDong/Awesome-arXiv-Daily-Reporter⭐ 47
“{'arxiv_id': 'arXiv:2605.16198', 'title': 'Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems', 'authors': 'Parand A. Alamdari, Toryn Q. Klassen, Sheila A. McIlraith', 'link': 'https://arxiv.org/abs/2605.16198', 'abstract': "We ”
lonePatient/lonePatient.github.io⭐ 9
“{% hideToggle 点击查看摘要 %} {% note blue no-icon %} ID-54-Property-Guided LLM Program Synthesis for Planning {% endnote %} **链接**: https://arxiv.org/abs/2605.16142 **作者**: Augusto B. Corrêa,André G. Pereira,Jendrik Seipp **类目**: Artificial Intelligence (cs.AI); Machine Learning (cs.”
2shin0/arxiv-ai-mailing⭐ 6
“ ## 8. Property-Guided LLM Program Synthesis for Planning - **Authors**: Augusto B. Corrêa , André G. Pereira , Jendrik Seipp - **URL**: [https://arxiv.org/abs/2605.16142](https://arxiv.org/abs/2605.16142) - **Abstract**: > LLMs have shown impressive success in program synthesis,”
ValoraY/arXiv-daily⭐ 1
“<hr /> <h4 id="abstract_46">📄 Abstract</h4> <p>Developing and evaluating e-commerce web agents requires environments that preserve meaningful task structure while enabling controllable, reproducible, and scalable scientific comparison. Existing methodologies force a tradeoff: li”
NeoCodeSmith/NeoSignal⭐ 1
“ { "id": "783dfdb89151", "title": "Property-Guided LLM Program Synthesis for Planning", "url": "https://arxiv.org/abs/2605.16142", "summary": "arXiv:2605.16142v1 Announce Type: new Abstract: LLMs have shown impressive success in program synthesis, disco”
daoyuly/new-blog⭐ 0
“ - **arXiv ID**: [2605.16142](https://arxiv.org/abs/2605.16142) - **研究方向**: planning, evaluation”
shaijing/arxiv-paper⭐ 0
“| **[IndicSafe: A Benchmark for Evaluating Multilingual LLM Safety in South Asia](https://arxiv.org/abs/2603.17915v2)** | 2026-05-15 | <details><summary>Show</summary><p>As large language models (LLMs) are deployed in multilingual settings, their safety behavior in culturally div”
lxl-sword/arxiv_paper_LLM_list⭐ 0
“<div class='col-md-6 mb-4'> <div class='card'> <div class='card-body'> <h4 class='card-title'><a href='https://arxiv.org/abs/2605.16142v1' target='_blank'>Property-Guided LLM Program Synthesis for Planning</a></h5> <h5 class='card-subtitle mb-2 text-muted'>Authors:Augusto B. Corr”
mickdur/tech-watch⭐ 0
“ "https://arxiv.org/abs/2605.16126": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16134": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16138": "2026-05-18T07:51:44.206446+00:00", "https://arxiv.org/abs/2605.16142": "2026-05-18T07:51:44”