test(contract): reviewer validate layer must not pass-through on missing artifacts #14
Reference in New Issue
Block a user
Delete Branch "test/reviewer-validate-no-pass-through"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Source-grep contract test that codifies the reviewer-validate Loop-C bug surfaced by
tests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmdon 2026-06-24.The bug:
phases.py:313and:317return_verdict("pass", {"note": "...passing through"})whentest_cmdis missing or the worktree is missing. The contract atwiki/concepts/reviewer-contract.md(validate layer hard gate, step 3) requires## Test Commandto actually exit 0 in the worktree before the reviewer returnspass. The early-return bypasses the test, routes the row tomergedwithout running it, defeating the validate layer's defense-in-depth purpose.Why this PR exists
The reviewer-validate bug is a real Loop-C gap. The previous tick (2026-06-24 09:05) filed:
wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.mdtests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmd(currently RED)Per the operator skill's "describe, don't fix" discipline + the "no in-flight work" recipe, the next concrete unit of work is a contract test that codifies the gap structurally. The actual fix (one of three options in the gap note) needs an LLM session, not a heartbeat tick.
What this PR adds
One new source-grep test function in
tests/contract/test_contracts_match_source.py:46 lines added, 1 file changed.
Test status
1 failed, 28 passed(the 1 fail is this new test; the other 28 contract+unit pass)._verdict("pass"with_verdict("tests_failed"). Verified by patchingphases.pyand rerunning.branchfield, then validate). Verified the new logic contains no"passing through"literal.validate_skippedtyped verdict). Same.Multi-pattern tolerance: only the exact
"passing through"phrase is forbidden. The fix author can phrase the replacement note however they want — no false-positive risk on stylistic choices.Pattern consistency with PR #10 and PR #12
Same source-grep contract-test idiom:
test(contract): spec-refiner prompt must inject row's file_scope and budget_cycles) — codifies the spec-refiner prompt gap.fix(spec): route any non-empty Ambiguities section to awaiting_human) — the mechanical fix for the gap PR #11 codified.All four are the same recipe: source-grep test that REDs on broken code, GREENs on the fix, no runtime changes, no architectural choices, fits in a heartbeat tick.
What this PR is NOT
test_reviewer_03_validate_layer_runs_test_cmdis the behavioral test; this PR is the structural test. Both go green on the same fix.1 failed, 28 passedis the test doing its job — surfacing a real gap.Recommendation
Merge this PR as a RED-in-main contract test. Then schedule the fix (Option A, 5-line patch, recommended for fastest unblock) as a follow-up PR.
The fix author should:
main(this PR's test will RED on brokenphases.py, GREEN on the fix).pytest tests/contract/ tests/unit/ tests/e2e/test_reviewer.py -q— all 30 tests should pass.Cross-references
wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.mdwiki/concepts/reviewer-contract.mdtests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmdSelf-review
Diff walk: 1 file, +46 lines, no source change in
src/. One new test function (test_reviewer_validate_does_not_pass_through_on_missing_artifacts) appended totests/contract/test_contracts_match_source.py. Test source-grepsphases.pyfor the literal"passing through"substring and asserts it is absent.Contract walk: the test codifies the validate-layer hard-gate contract from
wiki/concepts/reviewer-contract.md(step 3: "## Test Commandexits 0 in the worktree" is a hard block, not advisory). The contract's hard-gate wording is unambiguous; the code violates it by routingpass(and thusmerged) when the gate cannot actually run.Test walk:
1 failed, 28 passed— the 1 fail is this new test, exactly as expected.phases.py:313and:317to_verdict("tests_failed", ...)and rerunning."passing through"literal.validate_skippedtyped verdict). Same.Multi-pattern tolerance: only the exact phrase
"passing through"is forbidden. The fix author can phrase the replacement note however they want — no false-positive risk on stylistic choices. The contract test won't break on a refactor that rewords the failure note.Pitfalls-avoided:
re.escape()regex-meta-char gotcha: not applicable here. The contract is a literal substring grep, not a regex pattern. If the bug pattern had been a regex like\?\s*$, I'd needre.escape(). (Verified the inverse in PR #12.)phases.py, not the test file itself, and the test docstring does not include the literal"passing through". (Verified: when I wrote a draft that quoted the literal in the docstring, the test would have falsely failed on the fix; rewrote in prose.)SRC = ORCH_ROOT / "src" / "damascus", which honorsDAMASCUS_ROOT. Default is/root/damascus-orchestrator(the canonical location). When CI setsDAMASCUS_ROOTto the checked-out workspace, the test reads the right file.Why source-grep and not behavioral: per the operator skill, contract tests codify invariants structurally without runtime cost. The E2E counterpart (
tests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmd) is the behavioral test; both should go green on the same fix.Companion to PR #10 and PR #12: same source-grep pattern, different contracts. PR #10 codifies the spec-refiner prompt gap. PR #12 fixes the spec-refiner ambiguity routing gap (PR #11 codified it; PR #12 subsumed it). This PR codifies the reviewer-validate gap; the fix is a follow-up.
Recommendation: merge. This is a strict-improvement PR — it adds a RED-in-main contract test that codifies a known gap, no source change, no architectural choice. Kay can merge at any time without coordination. The fix (Option A or B) is a follow-up PR that will go green on the same commit.
Heartbeat cannot self-approve (Gitea rejects
tea pulls approvefrom the same author). Kay's call.Self-review: fix commit
ec27096on PR #14This PR was originally test-only (commit
795115b). This follow-up commit (ec27096) ships the source-grep fix that turns PR #14 from "red-in-main" to "green". Per the tick-after-contract-test recipe in the operator skill: the gap was mechanical (≤50 lines, ≤2 files, no architectural choice), the previous tick shipped the contract test as the structural counterpart, so the fix was the natural next unit of work.Why this commit ships to PR #14's branch (not a new PR)
Per the cross-tick PR test+fix subsumption pitfall: branching a fix off
mainwould duplicate the test file byte-for-byte (the fix author would copy the test to understand the contract), creating two PRs that can't both merge cleanly. Pushing to PR #14's existing branch means both the test and the fix land in a single PR, no duplication, no merge conflict.The fix (Option A from the gap note)
Two early-return
passbranches insrc/damascus/phases.py::review()were bypassing the validate layer's hard gate:_verdict("pass", {"note": "no test_cmd recorded; PR exists, passing through"})_verdict("tests_failed", {"review_test": False, "reason": "no test_cmd recorded; cannot validate", "branch": ..., "story_id": ...})_verdict("pass", {"note": "worktree gone; PR exists, passing through"})_verdict("tests_failed", {"review_test": False, "reason": "worktree missing; cannot validate", "branch": ..., "story_id": ...})The cycle's verdict switch (
_next_phase_on_verdictincycle.py) already routestests_failedfromreview→build, so a row whose validate layer could not actually validate goes back to retry the build (or, on budget exhaustion,blocked). No cycle changes needed.Why Option A over B/C
validate_skippedverdict): ~20 lines (new verdict, cycle switch update, typed_verdicts list update). The cleanest semantics, but adds a new verdict and is structurally the biggest change.All three remove the literal
"passing through"text that PR #14's contract test forbids, so all three would land the contract test green. A picks the one with the least new surface area.Verification
RED-on-broken / GREEN-on-fixed simulation (per the contract test verification recipe):
Pitfalls avoided
re.escape()on the contract pattern: PR #14's test already uses literal-string assertion (assert "passing through" not in phases_py), not a regex. No escape issues.phases.pythat would re-trigger the contract test.git push --forceblocked: no force-push needed; the new commit is a clean fast-forward from the existing branch tip.Out of scope (filed separately)
tests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmdis still RED. Diagnosed but not fixed: the test setup manuallyUPDATE phase='review'after a spec cycle but does not clearclaimed_at, so the secondrun_cycle_in_container()cannot re-claim the row (stale-claim filter blocks it; claimed_at is < 30 min old). This is a test setup bug, not a source bug — the fix in this PR is correct. A separate PR can either (a) addclaimed_at=NULLto the manual UPDATE in the test, or (b) update the test to useSTALE_CLAIM_MINUTES=0via env. Logged in the existing gap notewiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.mdas a separate item; not blocking this PR.Heartbeat cannot self-approve
Per the operator skill: heartbeat agents cannot
tea pulls approveon their own PRs (Gitea rejects withError: approve your own pull is not allowed). This comment is the self-review; Kay's call to merge.Refs: PR #14, issue #13, gap note
wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.md, skill entry "tick-after-contract-test exception."ec27096cb5to32f5fe212e