docs(wiki): ambiguity-routing drift gap note + catch-up log entries
- Add queries/damascus-orchestrator/spec-refiner-ambiguity-routing-drift-2026-06-24.md codifying the contract drift where phases.py:74's ambiguity-detection requires the section to end with '?', but the contract (wiki/concepts/ spec-refiner-contract.md AC line ~122) says any non-empty Ambiguities section routes to awaiting_human. Three options with Option A (drop the regex) recommended; PR #11 will ship the contract test. - Append heartbeat log entries from 04:50 (PR #7 rebase + push) and 05:55 (PR #10 contract test) to queries/damascus-orchestrator/2026-06-23-test-session.md. Both rows were captured in prior ticks but the wiki was never committed; catch-up commit so the wiki matches the in-repo heartbeat log. - Append the §7-metrics-analyzer.md change log with the PR #7 rebase verification entry (no new content; same reason as above). - Mirror the change-log + heartbeat entries to _meta/log.md. No code change, no PR. The contract test PR (test/spec-refiner-ambiguity-routing branch) opens in the damascus-orchestrator repo alongside this commit.
This commit is contained in:
10
_meta/log.md
10
_meta/log.md
@@ -45,3 +45,13 @@ type: meta
|
||||
- Tracked by `kaykayyali/damascus-orchestrator` issue #6
|
||||
- Index updated (Concepts section +1, total pages 8→9)
|
||||
- Total page count: 9
|
||||
|
||||
## [2026-06-24] update | §7 Metrics/Analyzer contract page
|
||||
- Appended a rebase-verification entry to the §7 contract page change log
|
||||
- No new content; PR #7 is a self-heal stack change, not a §7 design change
|
||||
|
||||
## [2026-06-24] update | 2026-06-23-test-session heartbeat log
|
||||
- One row appended (04:50): drove PR #7 to mergeable via `git merge main`
|
||||
rebase + clean fast-forward push, 19/19 contract tests pass
|
||||
- `tea comment 7` self-review posted (heartbeat can't self-approve)
|
||||
- No code changes to the orchestrator
|
||||
|
||||
@@ -317,5 +317,12 @@ Per issue #6's "minimum bar":
|
||||
|
||||
## Change log
|
||||
|
||||
## Change log
|
||||
|
||||
- 2026-06-24 — created. Status `awaiting-bmad-picks`. Mirrored the 5
|
||||
heartbeat recommendations from issue #6 into §7. No code change, no PR.
|
||||
heartbeat recommendations from issue #6 into §7. No code change, no PR.
|
||||
- 2026-06-24 — rebase verification for PR #7: `git merge main --no-edit`
|
||||
from the PR's old tip, clean fast-forward push, 19/19 contract tests pass
|
||||
including `test_db_volume_self_heals_on_recreate`. Tea comment posted as
|
||||
self-review (heartbeat can't self-approve). PR #7 is now mergeable onto
|
||||
current `main` (`9aea9ee`).
|
||||
@@ -55,6 +55,8 @@ related: [damascus-orchestrator-overview, spec-refiner-gap-2026-06-23]
|
||||
|---|---|---|
|
||||
| 2026-06-23 03:48 | Installed heartbeat cron. lists-1 in `build` (spec-refiner produced a 12-file spec ignoring the row's pre-declared 2-file scope). Orchestrator cycle is OFF per user direction. | Pinging human to drive the BMAD interrogation of the spec-refiner gap. |
|
||||
| 2026-06-24 04:35 | main jumped from `60cc8d7` (the old PR-#1 merge tip) to `9aea9ee` while I was off the air — PRs #2/3/5/8/9 all merged in between. §4 (`budget_cycles=3`, `spec_ambiguous` exempt) is **DONE** on main. Issue #6 (`§7 Metrics/Analyzer — implementation blocked on 5 BMAD design questions`) is still open; my last tick recommended answers for the 5 open questions and committed to writing the contract page "on the next tick." Only open PR is #7 (`fix(compose): db service self-heals tainted dbdata volume on bootstrap`), branched off the pre-Postgres base (`60cc8d7`) and now 14 commits behind current main. | Wrote the contract page at `wiki/concepts/§7-metrics-analyzer.md` with my 5 recommendations mirrored as "Heartbeat-proposed answers (pending BMAD/human approval)." Updated `wiki/_meta/index.md` and `wiki/_meta/log.md`. PR #7 rebase is queued for a later tick (will use the `git merge main` recipe from the skill). |
|
||||
| 2026-06-24 04:50 | Stack is healthy: db-1 (Postgres), redis-1, orchestrator + scheduler all up; `damascus cycle` returns `{"claimed":null,"transition":null,"events":[]}` (no rows, no error). `work_items` table is empty (the `e2e-reviewer-01` seed from 04:18 must have been wiped by `docker compose down -v` for the PR #7 verification — no data loss concern, the seed is a re-runnable test fixture). PR #7 still the only open PR. Issue #6 still open (BMAD pick needed before §7 code lands — the contract page IS the deliverable per gap-finding discipline). | **Drove PR #7 to mergeable.** Worktree at `/tmp/pr7-rebase`, `git merge main --no-edit` (ort auto-merged cleanly across all 13 commits landed on main since branch-base), `git push` as a clean fast-forward (merge commit `dbf5cba`). `DAMASCUS_ROOT=/tmp/pr7-rebase pytest tests/contract/ -q` → **19/19 pass** including `test_db_volume_self_heals_on_recreate` (the new contract test introduced on this branch). Verified the `DAMASCUS_ROOT` env var is required when running pytest from a worktree (default is `/root/damascus-orchestrator`, which would read the main checkout, not the worktree). Posted the full rebase verification + diff review as a `tea comment 7` self-review (heartbeat can't self-approve). Updated this log. **Next tick:** §7 still waiting on BMAD/human picks from issue #6 — no code path until then. If issue #6 stays silent, the next concrete unit of work is to surface §1 SRVG (also parked, design alignment needed) or pick up drift / inbox-cleanup. |
|
||||
| 2026-06-24 05:55 | main still at `9aea9ee` (no drift). PR #7 still open (Kay's call to merge). Issue #6 still open. Re-read the spec-refiner contract (`wiki/concepts/spec-refiner-contract.md` §1) and the gap note (`wiki/queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md`): the contract explicitly says "Prompt assembly order step 2: Row constraints — declared `file_scope`, `budget_cycles`, `attempts`." Current `phases.py:37-46` still does NOT pass either. The E2E test `test_spec_refiner_03_honors_declared_file_scope` is already RED; no source-grep counterpart in the contract suite. §7 still gated on BMAD picks (no code path). | **Opened PR #10** — one new source-grep contract test (`test_refine_spec_prompt_includes_row_constraints`) in `tests/contract/test_contracts_match_source.py`. Asserts `phases.py::refine_spec`'s prompt construction references the row's `file_scope` and `budget_cycles`. Branch `test/spec-refiner-prompts-row-constraints` off main, 48 lines added, one file. `pytest` → **28 pass, 1 fail** (the new test, expected — gap is real, test catches it). Multi-pattern tolerance (4 forms per attribute) so a future refactor doesn't false-fail. `tea pulls create` → **PR #10 opened**. `tea comment 10` posted the full diff review + the "RED in main vs xfail" judgment-call for Kay. Did NOT mark the test `xfail` — its value is the visible-in-CI red. **Next tick:** if PR #10 stays open past next tick, surface it in the heartbeat report. If Kay merges, the next concrete PR is the spec-refiner Option A fix (~30 min) — but that needs an LLM session, not a heartbeat tick, so it'd be a gap note + follow-up PR (gap-finding discipline). If issue #6 picks arrive, §7 implementation is unblocked. |
|
||||
|
||||
## Findings worth keeping
|
||||
|
||||
|
||||
@@ -0,0 +1,126 @@
|
||||
---
|
||||
title: Spec Refiner Ambiguity-Routing Drift
|
||||
created: 2026-06-24
|
||||
updated: 2026-06-24
|
||||
type: query
|
||||
project: damascus-orchestrator
|
||||
tags: [gap, finding, spec-refiner, parked, decision-needed, contract-drift]
|
||||
sources: [concepts/spec-refiner-contract.md]
|
||||
related: [spec-refiner-contract, spec-refiner-gap-2026-06-23, damascus-e2e-test-session-2026-06-23, damascus-orchestrator-overview]
|
||||
confidence: high
|
||||
contested: false
|
||||
---
|
||||
|
||||
# Spec Refiner Ambiguity-Routing Drift — 2026-06-24
|
||||
|
||||
**Status:** PARKED. Awaiting fix-PR (gap-finding discipline — code-level decision, not a heartbeat tick fix).
|
||||
|
||||
## The finding
|
||||
|
||||
The spec-refiner at `src/damascus/phases.py:74` guards the `awaiting_human`
|
||||
transition on a `?` ending:
|
||||
|
||||
```python
|
||||
if "## Ambiguities" in text and re.search(r"\?\s*$", _section(text, "Ambiguities")):
|
||||
issue_id = state.open_human_issue(...)
|
||||
...
|
||||
return _verdict("spec_ambiguous", ...)
|
||||
```
|
||||
|
||||
The contract at `wiki/concepts/spec-refiner-contract.md` says (lines 70-77
|
||||
and the AC at line 122):
|
||||
|
||||
> **`## Ambiguities` non-empty → spec-refiner opens a `human_issue` and
|
||||
> sets `phase='awaiting_human'`, does not transition to build**
|
||||
>
|
||||
> ✅ On ambiguity: row transitions `spec → awaiting_human`, `human_issues`
|
||||
> row is created with the question text
|
||||
|
||||
The contract is "any non-empty `## Ambiguities` section". The code requires
|
||||
the section to end with `?`. **These don't match.** A spec whose Ambiguities
|
||||
section lists an open question without ending it in `?` (e.g.
|
||||
|
||||
```
|
||||
## Ambiguities
|
||||
- the auth model is unclear because of X
|
||||
```
|
||||
|
||||
falls through to `phase='build'` and the human never sees the issue.
|
||||
|
||||
## Why it matters
|
||||
|
||||
The ambiguity channel is the only way the orchestrator surfaces design
|
||||
questions to a human. A spec-refiner that silently swallows non-`?`-
|
||||
terminated ambiguities means the build phase either guesses at the
|
||||
ambiguity (rarely a good answer for "is OAuth or session auth?") or crashes
|
||||
on the test command with `tests_failed`. Both are worse than a 30-minute
|
||||
human round-trip.
|
||||
|
||||
This is the same failure mode as the spec-refiner file-scope gap (see
|
||||
[[spec-refiner-gap-2026-06-23]]) — the contract is specific, the code
|
||||
ignores part of it, and the gap is invisible to source-grep because both
|
||||
`awaiting_human` and `human_issue` are present in the code (just gated on
|
||||
the wrong condition).
|
||||
|
||||
## The three options
|
||||
|
||||
### Option A — Drop the `?` check (smallest change)
|
||||
|
||||
Replace line 74 with a non-empty check:
|
||||
|
||||
```python
|
||||
if "## Ambiguities" in text and _section(text, "Ambiguities").strip():
|
||||
issue_id = state.open_human_issue(...)
|
||||
...
|
||||
return _verdict("spec_ambiguous", ...)
|
||||
```
|
||||
|
||||
**Cost:** 5 minutes. **Risk:** a `## Ambiguities` section that's whitespace-
|
||||
only (e.g. left over from a template) now triggers `awaiting_human`. Mitigation:
|
||||
`.strip()` on the section content is the standard non-empty-after-whitespace
|
||||
check. **Verifies via:** the new contract test `test_refine_spec_routes_non_empty_ambiguities_to_awaiting_human` (PR TBD, this tick) + a new E2E test that feeds a synthetic spec with a non-`?`-terminated ambiguity and asserts the row lands in `awaiting_human`.
|
||||
|
||||
### Option B — LLM-as-judge
|
||||
|
||||
Keep the regex but tighten the contract: "the LLM is asked to terminate the
|
||||
Ambiguities section with `?` if it has a question; if it doesn't, the
|
||||
section is informational and not routed." Update the wiki contract to
|
||||
match the code.
|
||||
|
||||
**Cost:** 15 min (wiki edit only). **Risk:** the spec-refiner's prompt
|
||||
template (line 41-46) doesn't tell the LLM to terminate with `?`. The
|
||||
heuristic is fragile because the LLM is inconsistent. **Verifies via:** no
|
||||
test change needed; this is a contract-only update.
|
||||
|
||||
### Option C — Strict parser
|
||||
|
||||
Parse the Ambiguities section as a list of bullet points; any non-empty
|
||||
bullet list is treated as a question. The section format is
|
||||
`## Ambiguities\n- <bullet>\n- <bullet>` per the wiki's "Spec template"
|
||||
section; a strict parser matches the template.
|
||||
|
||||
**Cost:** 30 min. **Risk:** the LLM doesn't always follow the template
|
||||
exactly (missing `-`, blank line, etc.). A strict parser that's too brittle
|
||||
will block valid specs. **Verifies via:** a parser unit test with 5+
|
||||
template variations.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Option A** (the contract is right; the code is wrong). Option B treats
|
||||
the symptom by lowering the bar; Option C is over-engineered for a phase
|
||||
that already has a working `human_issue` channel. The fix is 5 lines; the
|
||||
gap is silent in production because no current row has tripped it.
|
||||
|
||||
The new contract test in PR #11 (this tick, opening after this note) makes
|
||||
the gap visible. The E2E counterpart should follow in the same PR if the
|
||||
fix lands — the same PR can ship both the fix and the E2E.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [[spec-refiner-contract]] — the contract this drift violates (lines 70-77
|
||||
and AC at line 122)
|
||||
- [[spec-refiner-gap-2026-06-23]] — sibling gap (file_scope / budget_cycles
|
||||
not injected into the prompt); PR #10 is the contract test for that one,
|
||||
Option A fix is a separate PR that needs an LLM session
|
||||
- [[damascus-e2e-test-session-2026-06-23]] — the test session that surfaced
|
||||
both gaps
|
||||
Reference in New Issue
Block a user