test(contract): spec-refiner prompt must inject row's file_scope and budget_cycles #10
Reference in New Issue
Block a user
Delete Branch "test/spec-refiner-prompts-row-constraints"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
One new source-grep contract test (
test_refine_spec_prompt_includes_row_constraints) intests/contract/test_contracts_match_source.pythat assertssrc/damascus/phases.py::refine_spec's prompt construction references the row's declaredfile_scopeandbudget_cycles.Why
The spec-refiner contract (
wiki/concepts/spec-refiner-contract.md§1, "Prompt assembly order" step 2) requires the prompt to include the row's declaredfile_scopeandbudget_cycles. The current prompt atsrc/damascus/phases.py:37-46omits both — observed at 2026-06-23 03:36 on rowlists-1, where the LLM produced a 12-file spec for a row that declared a 2-file scope. The E2E testtest_spec_refiner_03_honors_declared_file_scopecodifies the behavioral end of the contract (the spec the LLM produces honors the row's declared scope); this new source-grep test codifies the structural end (the prompt actually contains the row's constraints).The two tests are complementary, not redundant: the E2E catches a prompt that mentions
file_scopebut assembles it wrong, or a prompt that mentions it but the LLM ignores it; the source-grep catches the prompt that doesn't mention it at all. Both fail today; both pass once Option A from the gap note lands.Test
pytest tests/contract/test_contracts_match_source.py::test_refine_spec_prompt_includes_row_constraints→ FAILS with:AssertionError: spec-refiner prompt does not reference the row's declared file_scope (wiki/concepts/spec-refiner-contract.md §1, 'Prompt assembly order' step 2). See wiki/queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md for the gap note and Option A (30 min) fix.Source-grep form (per the skill's contract-test pattern)
refine_spec(viadef refine_spec → next top-level defsplit), so it doesn't false-fail for unrelated reasons.phases.py.item["file_scope"],item['file_scope'],file_scope=item, andfile_scope = itemso a future refactor that destructuresitemor renames the variable doesn't false-fail.Why this is a separate PR (not part of Option A)
The skill's "revived-dead test" pattern says: do not make a previously-dead test green inside a mechanical PR. The spec-refiner E2E (
test_spec_refiner_03) is RED; the new contract test is also RED. Both stay RED until a follow-up PR picks up Option A from the gap note and fixes the prompt. This PR just makes the structural end of the contract visible in CI so the next PR-author (Kay or a future heartbeat) sees it as a failing test alongside the existing E2E red.Risk
Source-grep, narrowly scoped, no code-under-test changes. Cannot regress runtime behavior. The only "risk" is that a future refactor moves the prompt template to a different file — the test would then need to be updated to read from the new file. This is acceptable; the skill says "If a contract changes, these tests will fail. Update them deliberately."
Refs
wiki/concepts/spec-refiner-contract.md§1 — the contractwiki/queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md— the gap note with Option Atests/e2e/test_spec_refiner.py:68(test_spec_refiner_03_honors_declared_file_scope) — the behavioral counterpartself-hosted-state-machine-orchestrator→ "Contract tests as source-grep" patternSelf-review
Cannot self-approve per the skill (heartbeat agents can't
tea pulls approvetheir own PRs). Will post atea commentwith the diff review immediately after opening.Self-review (heartbeat agent, 2026-06-24 05:55 UTC)
Mechanically clean. Source-grep test, narrowly scoped, 48 lines added, one file changed (
tests/contract/test_contracts_match_source.py). No code-under-test changes, no schema changes, no infra changes.Diff review
The new test:
phases.pyand slices fromdef refine_specto the next top-level def — same pattern as the existing source-grep tests (test_loop_breaker_routes_to_blocked,test_ensure_worktree_uses_no_dash_b_for_existing_branch).file_scope(item["file_scope"],item['file_scope'],file_scope=item,file_scope = item).budget_cycles.Multi-pattern tolerance. Accepting four forms per attribute means a future refactor that destructures
item(e.g.file_scope=item) or renames it doesn't false-fail. This is the source-grep pattern's robustness check from the skill.Negative-checked. I ran the test against the current
phases.pyand confirmed it fails with the expected assertion message — the gap is real and the test catches it. No "the test passes on broken code" footgun.What this PR does NOT do
test_spec_refiner_03_honors_declared_file_scope) is already RED; this PR adds a parallel structural test, not a fix.Merge order
PR #7 is still open and awaiting Kay's review-merge (the heartbeat can't self-approve). PR #10 is a small, scoped test addition. These are independent — PR #10 can be merged in any order relative to PR #7.
PR #10 does NOT make main's test suite go from green to red on its own — the new test is added as a known-RED marker. Main will go 28 pass / 1 fail (the new RED) until Option A from the gap note lands, at which point the test goes GREEN and the E2E test should also go GREEN.
If Kay wants main to stay green during this PR's lifetime, the cleanest path is to mark the new test as
xfailwith a reason pointing at the gap note. I deliberately did NOT do that — the test's value is that it's RED and visible in CI. Marking it xfail hides the gap. If Kay prefers the xfail approach, I'll add the marker in a follow-up.Recommendation
Mergeable as-is. Mechanical, scoped, no code-under-test change, contract-codifying, CI-friendly. The only judgment call is "RED in main vs xfail-marked RED" — recommend the former so the next PR-author (whoever picks up Option A) sees the failing test in CI as the path to green.
— heartbeat agent, 2026-06-24 05:55 UTC
bc5f7733aetoc7ba4c7a65