init: damascus-wiki structure + contracts + test session

This commit is contained in:
Hermes (damascus-orchestrator)
2026-06-23 03:55:56 +00:00
parent a12b5f16de
commit 659e788e42
10 changed files with 1069 additions and 0 deletions

146
_meta/SCHEMA.md Normal file
View File

@@ -0,0 +1,146 @@
---
title: Damascus Wiki Schema
updated: 2026-06-23
type: meta
---
# Damascus Wiki — Schema
## Domain
The local knowledge layer for the **damascus-orchestrator** system. Holds what workers (the spec-refiner, builder, reviewer, and any future agent roles) have learned, what each project is doing right now, and the worker-level scratch that doesn't belong in the source-of-truth MySQL.
**Hard rule:** MySQL is the source of truth on **work-item state** (which story is in which phase, attempts, budget, last verdict, cost). This wiki is the source of truth on **what the workers are seeing, deciding, and learning**. The two are kept in sync by the orchestrator writing to both, but they are not interchangeable. Never write a work-item state fact to the wiki when MySQL has the answer.
## Conventions
- **File names:** lowercase, hyphens, no spaces (e.g. `react-mui-datagrid-patterns.md`)
- **Every wiki page** starts with YAML frontmatter (see below)
- **`[[wikilinks]]`** to link between pages (minimum 2 outbound links per page)
- **Top-level pages, project tag in frontmatter.** No per-project subdirectories. A page about `wh40k-pc` lives at `entities/foo.md` and has `project: wh40k-pc` in frontmatter. This makes cross-project queries (which components, which patterns, which lessons) easy.
- **Update policy:** when updating a page, always bump the `updated` date and append the change to `log.md`
- **Every new page** must be added to `index.md` under the correct section
- **Every action** must be appended to `log.md`
- **Provenance:** when a page's claims trace to a specific source, link it via `sources:` in frontmatter
## Frontmatter
```yaml
---
title: Page Title
created: YYYY-MM-DD
updated: YYYY-MM-DD
type: entity | concept | comparison | query | meta | summary
project: <project-id> | null # e.g. wh40k-pc, restitution, null=global
tags: [from taxonomy below]
sources: [raw/articles/source-name.md]
# Optional quality signals:
confidence: high | medium | low # how well-supported the claims are
contested: true # set when the page has unresolved contradictions
contradictions: [other-page-slug] # pages this one conflicts with
related: [other-page-slug] # non-contradictory cross-links beyond [[wikilinks]]
---
```
## Tag Taxonomy
Add new tags here **before** using them. Freeform tags decay into noise.
**Worker / orchestrator:**
- `agent`, `loop`, `state-machine`, `queue`, `cron`, `heartbeat`
- `spec-refiner`, `builder`, `reviewer`, `claude-code`, `litellm`
**Project / domain:**
- `react`, `mui`, `vite`, `jest`, `wh40k`, `gitea`, `docker`
- `game-design`, `web-app`, `data-grid`
**Operational:**
- `pitfall`, `lesson`, `finding`, `gap`, `decision`, `convention`
- `infra`, `cicd`, `registry`, `proxy`, `auth`
- `blocked`, `parked`, `in-progress`, `done`
**Meta:**
- `architecture`, `design-doc`, `prd`, `plan`
## Page Thresholds
- **Create a page** when an entity/concept appears in 2+ sources OR is central to one source OR is a worker finding worth re-finding
- **Add to existing page** when a new source mentions something already covered
- **DON'T create a page** for passing mentions, transient state, or things outside the domain
- **Split a page** when it exceeds 200 lines
- **Archive a page** when fully superseded — move to `_archive/`, remove from index, fix cross-links
## Types
| Type | Where | Purpose |
|---|---|---|
| `entity` | `entities/` | Things that exist: a project, a library, a worker role, a service |
| `concept` | `concepts/` | Patterns, designs, approaches, conventions |
| `comparison` | `comparisons/` | Side-by-side: which approach won, with dimensions |
| `query` | `queries/` | Filed answers, status reports, run logs worth keeping |
| `meta` | `_meta/` | This schema, the index, the log, topic maps |
| `summary` | `entities/` or `concepts/` | One-shot summarization of a multi-source topic |
## Update Policy (when sources conflict)
1. Check dates — newer generally supersedes older
2. If genuinely contradictory, note both with dates and sources
3. Mark contradiction in frontmatter: `contradictions: [page-name]`
4. Flag in next lint report
## Lint
Run periodically. The agent runs lint when:
- Asked by the human
- When `log.md` exceeds 500 entries (then rotate to `log-YYYY.md`)
- Before any bulk-update that touches 10+ pages (ask first)
Lint checks (in order of severity):
1. **Broken wikilinks**`[[links]]` to missing pages
2. **Orphans** — pages with no inbound `[[wikilinks]]` from anywhere
3. **Index completeness** — every `.md` file appears in `index.md`
4. **Frontmatter validity** — required fields present, tags in taxonomy
5. **Stale content**`updated` > 90d old AND a newer source mentions the same entity
6. **Contradictions**`contested: true` or `contradictions:` set
7. **Quality signals**`confidence: low` pages, single-source pages with no confidence
8. **Source drift**`raw/` files whose `sha256:` mismatches the recomputed hash
9. **Page size** — over 200 lines
10. **Tag audit** — tags used but not in taxonomy
## What goes in `raw/`
Immutable sources. The agent reads but **never modifies** these.
- `raw/articles/` — web articles, blog posts, reference docs
- `raw/papers/` — PDFs, arxiv papers
- `raw/transcripts/` — meeting notes, interviews, design reviews
- `raw/assets/` — images, diagrams referenced via `![[image.png]]`
`raw/` files get their own frontmatter:
```yaml
---
source_url: https://example.com/article
ingested: YYYY-MM-DD
sha256: <hex digest of the body below the frontmatter>
---
```
`sha256:` is computed over the body only (everything after the closing `---`). On re-ingest: recompute, compare, skip if identical, flag drift if different.
## What does NOT go in the wiki
- **Work-item state.** That's MySQL. Query MySQL.
- **Secrets, tokens, credentials.** Even if mentioned in a source — link to a secrets manager, never inline.
- **Source code diffs.** That's the project repo.
- **The LLM-wiki raw content from the design doc** (the multi-project-orchestration-plan). That lives in the design repo. Link to it; don't duplicate it.
## Working dir convention for agents
Every agent role that reads the wiki must:
1. Read `_meta/SCHEMA.md` (this file)
2. Read `index.md`
3. Scan last 20-30 entries of `log.md`
4. Then proceed
This is the **orient step** — skipping it causes duplicate pages, missed cross-references, and contradictions.

36
_meta/index.md Normal file
View File

@@ -0,0 +1,36 @@
---
title: Damascus Wiki Index
updated: 2026-06-23
type: meta
---
# Damascus Wiki — Index
> Content catalog. Every wiki page listed under its type with a one-line summary.
> Read this first to find relevant pages for any query.
> Last updated: 2026-06-23 | Total pages: 8
## Entities
- [[damascus-orchestrator]] — the system being built: MySQL-backed state machine, three loops, typed verdicts
## Concepts
- [[spec-refiner-contract]] — what Loop A takes in, produces, and how build resumes
- [[builder-contract]] — what Loop B takes in, produces, and how review resumes
- [[reviewer-contract]] — what Loop C takes in, produces, and the merge sequence
- [[state-resume-protocol]] — how the system picks up where it left off across restarts
## Comparisons
(none yet)
## Queries
- [[2026-06-23-test-session]] — the E2E test session for design doc §4/§12
- [[spec-refiner-gap-2026-06-23]] — the prompt gap, parked for BMAD interrogation
## Meta
- [[SCHEMA]] — conventions, tag taxonomy, lint rules
- [[log]] — chronological action log

41
_meta/log.md Normal file
View File

@@ -0,0 +1,41 @@
---
title: Damascus Wiki Log
updated: 2026-06-23
type: meta
---
# Damascus Wiki Log
> Chronological record of all wiki actions. Append-only.
> Format: `## [YYYY-MM-DD] action | subject`
> Actions: create, update, ingest, query, lint, archive, delete
> When this file exceeds 500 entries, rotate: rename to log-YYYY.md, start fresh.
## [2026-06-23] create | Wiki initialized
- Domain: damascus-orchestrator local knowledge layer
- Structure: SCHEMA.md, index.md, log.md, raw/, entities/, concepts/, comparisons/, queries/, _meta/
- Migration: test-session status note moved from `/opt/damascus/llm-wiki/notes/` (image-baked) → shared host bind mount
- Reason: workers (spec-refiner, builder, reviewer) need a shared local context; MySQL is work-item source of truth, this is worker scratch + project state
- Git: initialized and pushed to `kaykayyali/damascus-wiki` on Gitea
- Tags taxonomy: see SCHEMA.md
- First project page: `queries/damascus-orchestrator/2026-06-23-test-session.md`
## [2026-06-23] create | Contract docs (4 pages)
- `concepts/spec-refiner-contract.md` — Loop A contract
- `concepts/builder-contract.md` — Loop B contract
- `concepts/reviewer-contract.md` — Loop C contract
- `concepts/state-resume-protocol.md` — how the process picks up where it left off
- Each page documents: input contract, output contract, side effects on the row, acceptance criteria, how the next phase resumes
- These are the executable form of design doc §4-§5-§12
- E2E tests in `tests/e2e/` codify each acceptance criterion
## [2026-06-23] create | Spec-refiner gap page
- `queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md`
- Documents the prompt gap found in the 2026-06-23 test (LLM ignored the row's declared `file_scope`)
- Three options: constrain (A), refactor to consume BMAD (B), replace with BMAD (C)
- Recommendation: A first
- The E2E test `test_spec_refiner_honors_declared_file_scope` already encodes the fix acceptance
## [2026-06-23] update | Index
- Added 5 new entries (4 concept + 1 query) to `index.md`
- Total page count: 8

View File

@@ -0,0 +1,138 @@
---
title: Builder Contract
created: 2026-06-23
updated: 2026-06-23
type: concept
project: damascus-orchestrator
tags: [agent, builder, claude-code, contract, git, worktree, pr]
sources: [raw/articles/multi-project-orchestration-plan-1.md]
related: [damascus-orchestrator-overview, spec-refiner-contract, reviewer-contract, state-resume-protocol]
---
# Code Builder — Contract
What the code builder (Loop B) takes in, what it must produce, and how the reviewer resumes from its output. **This is the contract the design doc §4 Loop B and §6 promise; the E2E suite tests adherence to it.**
## Input (what the agent receives)
The builder reads the row from `work_items` where `phase='build'` (atomic claim via `SELECT ... FOR UPDATE SKIP LOCKED`).
| Field | Source | Required? |
|---|---|---|
| `id` | `work_items.id` | yes |
| `project` | `work_items.project` | yes |
| `story_id` | `work_items.story_id` | yes |
| `file_scope` | `work_items.file_scope` (from spec-refiner) | yes |
| `spec_path` | `work_items.spec_path` | yes |
| `base_commit` | `work_items.base_commit` (the main tip) | yes — set by claim |
| `attempts` | `work_items.attempts` | yes |
| `budget_cycles` | `work_items.budget_cycles` | yes |
| Spec file contents | loaded from `spec_path` | yes |
| `## TDD Plan` | parsed from spec | yes |
| `## Test Command` | parsed from spec | yes |
## What the builder must do (sequentially, no skipping)
1. **Worktree isolation** (design doc §6 Layer 1): create a git worktree off `base_commit` at `/workspace/worktrees/<story_id>`, branch `feat/<story_id>`. This is **physical isolation** — concurrent builders cannot overwrite each other.
2. **Hand the spec to Claude Code** via LiteLLM, model `minimax-m3`:
```
ANTHROPIC_BASE_URL=http://host.docker.internal:4000
ANTHROPIC_API_KEY=sk-dummy
claude --print --model minimax-m3 --max-turns 50+ <full spec as prompt>
```
**Critical:** `--max-turns 50+` is the realistic starting point for non-trivial component work. 12 turns is too few; the model burns turns reading files.
3. **Constrain Claude's filesystem access** to the worktree and the `file_scope` paths. Files outside the declared scope → reject (defense in depth; the spec refiner should not have allowed them, but verify).
4. **Run the spec's `## Test Command`** in the worktree. The TDD plan says tests must fail before code exists; after Claude writes, they must pass.
5. **Rebase onto the project's main branch** before opening a PR. Branch detection: try `main`, `master`, `develop` in order. The current default for wh40k-pc is `master`. Rebase conflict → verdict `rebase_conflict`, route back to builder with the conflict as context.
6. **Commit + push** the branch, then **open a PR via Gitea API**:
```
POST /api/v1/repos/<owner>/<repo>/pulls
{ head: feat/<story_id>, base: master|main, title: <story title>, body: <spec link> }
```
7. **Persist the `pr_url`** on the row:
```sql
UPDATE work_items SET
phase = 'review',
branch = 'feat/<story_id>',
pr_url = '<url returned by Gitea>',
base_commit = '<new main tip after rebase>',
updated_at = NOW()
WHERE id = '<row id>'
```
8. **Emit `events_outbox`** events:
- `build.committed` (after `git commit`)
- `build.pushed` (after `git push`)
- `build.pr_opened` (after Gitea API returns 201)
- `phase.transitioned` (build → review)
## Side effects on the row
On success:
- `phase = 'review'`
- `branch = 'feat/<story_id>'`
- `pr_url = <real Gitea PR URL>` ← **MUST be non-null before transition**
- `base_commit = <sha after rebase>`
- `last_verdict` not set yet (the reviewer will set it)
- `attempts` does NOT increment on success
On `tests_failed` (Claude wrote code, tests don't pass):
- `phase = 'build'` (stays — retry)
- `attempts++`
- `last_verdict = 'tests_failed'`
- `last_feedback = { "test_output": "...", "files_changed": [...] }`
- The worktree persists so the next attempt can read the partial work
On `rebase_conflict`:
- `phase = 'build'`
- `attempts++`
- `last_verdict = 'rebase_conflict'`
- `last_feedback = { "conflicting_files": [...], "their_sha": "..." }`
On `no_pr` (defensive verdict — Claude claimed success but no PR exists):
- `phase = 'build'`
- `attempts++`
- `last_verdict = 'no_pr'`
- This verdict was added after observing the bug where the build phase returned success-without-opening-a-PR. The reviewer must never merge a row with `pr_url IS NULL`.
On `attempts >= budget_cycles`:
- `phase = 'blocked'`
- `events_outbox` emits `blocked.exhausted`
- The row is parked; surfaced to the human
## Files-touched audit (the E2E test point)
The builder MUST record every file it modified, in `last_feedback.files_changed`. The reviewer and the metrics analyzer both consume this. This is the data the scope-disjoint check (§6 Layer 2) runs against.
## Acceptance criteria (the contract)
- ✅ Builder claims a row in `build` phase via `SELECT ... FOR UPDATE SKIP LOCKED`
- ✅ Builder creates a worktree on a unique branch per story
- ✅ Builder invokes Claude Code with the spec as prompt, model `minimax-m3`, max-turns ≥ 50
- ✅ Builder runs the spec's `## Test Command` and only succeeds on exit 0
- ✅ Builder rebases onto the project's main branch (`main` → `master` → `develop` in order)
- ✅ Builder opens a real Gitea PR via the Gitea API (not just local commit)
- ✅ Builder persists `pr_url` on the row BEFORE transitioning to `review`
- ✅ Builder records `files_changed` in `last_feedback` for every file it touched
- ✅ Builder NEVER sets `phase='review'` with `pr_url IS NULL` (this was the bug)
- ✅ On test failure: `attempts` increments, `last_verdict='tests_failed'`, row stays in `build`
- ✅ On budget exhaustion: `phase='blocked'`, no further attempts
- ✅ Concurrent builders on different stories cannot overwrite each other (worktree isolation)
## How the next phase resumes
The reviewer, when it claims a row in `review`:
1. Reads `work_items.pr_url` → fetches the PR diff from Gitea API
2. Reads `work_items.branch` → knows which branch to validate
3. Reads `work_items.base_commit` → knows what `master|main` tip the rebase was tested against
4. Re-runs `## Test Command` in the worktree to confirm
5. Looks up the spec from `work_items.spec_path` to know what was promised
If `pr_url IS NULL` → reviewer MUST return verdict `no_pr` and NOT merge. This is the defensive guard against the bug.
## Cross-references
- [[damascus-orchestrator-overview]]
- [[spec-refiner-contract]] — what feeds the builder
- [[reviewer-contract]] — what the builder hands off
- [[state-resume-protocol]] — general resume rules
- [[builder-concurrency-model]] — Layer 1 (worktree isolation) + Layer 2 (scope-disjoint dispatch)

View File

@@ -0,0 +1,156 @@
---
title: Reviewer Contract
created: 2026-06-23
updated: 2026-06-23
type: concept
project: damascus-orchestrator
tags: [agent, reviewer, contract, pr, merge, typed-verdict]
sources: [raw/articles/multi-project-orchestration-plan-1.md]
related: [damascus-orchestrator-overview, spec-refiner-contract, builder-contract, state-resume-protocol]
---
# Reviewer — Contract
What the reviewer (Loop C) takes in, what it must produce, and how the merge completes. **This is the contract the design doc §4 Loop C, §5 (typed verdicts + loop-breaker), and §12 promise; the E2E suite tests adherence to it.**
## Input (what the agent receives)
The reviewer reads the row from `work_items` where `phase='review'` (atomic claim).
| Field | Source | Required? |
|---|---|---|
| `id` | `work_items.id` | yes |
| `project` | `work_items.project` | yes |
| `story_id` | `work_items.story_id` | yes |
| `pr_url` | `work_items.pr_url` | **yes — must be non-null** |
| `branch` | `work_items.branch` | yes |
| `base_commit` | `work_items.base_commit` (the sha the rebase was tested against) | yes |
| `spec_path` | `work_items.spec_path` | yes |
| `file_scope` | `work_items.file_scope` | yes |
| `attempts` | `work_items.attempts` | yes |
| `budget_cycles` | `work_items.budget_cycles` | yes |
| PR diff | Gitea API `GET /api/v1/repos/<owner>/<repo>/pulls/<n>/files` | yes |
| Spec | loaded from `spec_path` | yes |
## Two layers, strictly separated (design doc §4)
### Layer 1 — `validate` (objective, **HARD GATE**)
Mechanical checks. **Failure here = hard block, no LLM needed.**
1. **`pr_url IS NOT NULL`** — defensive guard against the build-without-PR bug
2. **PR is open and mergeable** in Gitea (no merge conflicts, base branch matches)
3. **`## Test Command` exits 0** in the worktree (re-run the test; don't trust the builder's last claim)
4. **Build succeeds** (if `## Test Command` is `npm run build`, that covers it; if not, the spec must include a build step)
5. **Lint clean** (if the project has a linter, the spec's test command should include it)
6. **Branch is rebased onto current main**`git fetch origin && git rev-parse origin/main` and verify the builder's `base_commit` is an ancestor. If meanwhile-merged items have moved main, rebase again.
Any failure in this layer → verdict `tests_failed` (or `rebase_conflict` for #6), row goes back to `build` with `attempts++`.
### Layer 2 — `assess` (subjective, **ADVISORY ONLY**)
LLM call. **Cannot block a green build.** The design doc says this in capital letters because it's the most-violated rule in practice.
The assess LLM sees:
1. System prompt: "You are the code-quality reviewer. You CANNOT block a green build. If tests/build/lint fail, the verdict is `tests_failed`, not 'looks bad'."
2. The spec
3. The PR diff
4. The test results (pass/fail + output)
The LLM returns JSON:
```json
{
"verdict": "pass" | "tests_failed" | "spec_ambiguous",
"blocker": "human-readable if not pass, or null",
"comments": ["code-quality comments — advisory, not blocking"],
"suggested_followup_issue": null
}
```
The reviewer only considers `assess` if `validate` passed. If `validate` failed, the verdict is `validate`'s finding, and `assess` is not run (saves tokens).
## Typed verdicts (design doc §5)
| Verdict | Routes to | Meaning | Set by |
|---|---|---|---|
| `pass` | merge | validate passed; assess returned pass | reviewer |
| `tests_failed` | builder | validate: test/build/lint failed | reviewer (validate) |
| `rebase_conflict` | builder | meanwhile-merged items | reviewer (validate) |
| `spec_ambiguous` / `spec_wrong` | spec-refiner | the spec didn't match what was built | reviewer (assess) |
| `no_pr` | builder | `pr_url IS NULL` — defensive | reviewer (validate, no LLM) |
**The `no_pr` verdict is added defensively** after observing the build phase return success-without-opening-a-PR and the reviewer merging anyway. See `[[builder-contract]]` for the full story.
## On `pass` — the merge sequence
1. Call Gitea API: `POST /api/v1/repos/<owner>/<repo>/pulls/<n>/merge`
2. On 200, update the row:
```sql
UPDATE work_items SET
phase = 'merged',
last_verdict = 'pass',
merged_at = NOW(),
updated_at = NOW()
WHERE id = '<row id>'
```
3. Emit `events_outbox`:
- `review.passed`
- `merge.completed` (with merge commit SHA)
- `phase.transitioned` (review → merged)
4. Write the success to the `cost_ledger` (reviewer's LLM call cost)
5. **Merge serialization:** only one merge at a time. The atomic claim on `phase='review'` provides this — once a row is in `merged`, the next cycle's claim won't pick it up.
## Side effects on the row
On `pass`:
- `phase = 'merged'`
- `last_verdict = 'pass'`
- `merged_at` timestamp set
- All advisory comments from `assess` go into `last_feedback.comments` (non-blocking)
On `tests_failed` / `rebase_conflict` / `no_pr`:
- `phase = 'build'`
- `attempts++`
- `last_verdict = <verdict>`
- `last_feedback = { "blocker": "...", "test_output": "...", "comments": [...] }`
On `spec_ambiguous` / `spec_wrong`:
- `phase = 'spec'`
- `attempts++`
- `last_verdict = 'spec_ambiguous'`
- `last_feedback = { "issue": "<what's ambiguous>" }`
- The spec-refiner gets a second chance to refine
On `attempts >= budget_cycles`:
- `phase = 'blocked'`
- `events_outbox` emits `blocked.exhausted`
- **No infinite ping-pong possible** (design doc §5)
## Acceptance criteria (the contract)
- ✅ Reviewer claims a row in `review` phase only
- ✅ If `pr_url IS NULL` → verdict `no_pr`, do NOT merge (this is the defensive guard)
- ✅ `validate` layer runs test_cmd, build, lint; failure → `tests_failed`, no LLM call
- ✅ `assess` layer NEVER blocks a green build — it can only return `pass` or `spec_ambiguous`/`spec_wrong`
- ✅ On `pass`: Gitea merge API is called, row transitions to `merged`, `merged_at` set
- ✅ On non-pass: `attempts` increments, row routes to the correct next phase per the verdict table
- ✅ On `attempts >= budget_cycles`: row transitions to `blocked`
- ✅ Subjective comments are stored in `last_feedback.comments` but do NOT prevent merge
- ✅ Merge is serialized — only one merge per cycle tick (atomic claim)
- ✅ The reviewer writes to `cost_ledger` for its LLM call (even on `pass`)
## What the next phase (post-merge) does
The system is **observational only** after merge:
- The `events_outbox` drainer relays the merge event to the overseer / Discord (Phase 3)
- The `wiki_pin` mechanism (Phase 3) captures the wiki state at merge time, derives facts, writes to `wiki/facts/` — but only facts derived from the merged artifact
There is **no Phase 4+ action** that consumes a merged row. The system has done its job. The merged code lives in the project repo; the merged event lives in the outbox; the audit trail lives in MySQL.
## Cross-references
- [[damascus-orchestrator-overview]]
- [[spec-refiner-contract]]
- [[builder-contract]]
- [[state-resume-protocol]]
- [[typed-verdict-decision-table]] — full verdict → phase routing table

View File

@@ -0,0 +1,143 @@
---
title: Spec Refiner Contract
created: 2026-06-23
updated: 2026-06-23
type: concept
project: damascus-orchestrator
tags: [agent, spec-refiner, contract, litellm, claude-code]
sources: [raw/articles/multi-project-orchestration-plan-1.md]
related: [damascus-orchestrator-overview, builder-contract, reviewer-contract, state-resume-protocol]
---
# Spec Refiner — Contract
What the spec-refiner agent (Loop A) takes in, what it must produce, and how downstream phases resume from its output. **This is the contract the design doc §4 Loop A and §12 promise; the E2E suite tests adherence to it.**
## Input (what the agent receives)
The spec-refiner reads the row from `work_items` where `phase='spec'` and any artifacts the row references.
| Field | Source | Required? |
|---|---|---|
| `id` | `work_items.id` | yes |
| `project` | `work_items.project` | yes |
| `story_id` | `work_items.story_id` | yes |
| `title` | `work_items.title` | yes |
| `file_scope` (declared) | `work_items.file_scope` (JSON array) | **yes — agent MUST honor this** |
| `budget_cycles` | `work_items.budget_cycles` | yes |
| `attempts` | `work_items.attempts` | yes |
| `last_feedback` | `work_items.last_feedback` (JSON) | when attempts > 0 |
| `architecture.md` | `/workspace/projects/<project>/architecture.md` (if exists) | no |
| `bmad_story` | `/opt/damascus/bmad/<project>/_bmad-output/planning-artifacts/<story_id>.md` (if exists) | no — but preferred over the row when present |
**Prompt assembly order** (§4 discipline):
1. System preamble: "You are the spec refiner. Output is markdown with sections: Goal, AC, TDD Plan, File Scope, Test Command, Ambiguities."
2. **Row constraints**: declared `file_scope`, `budget_cycles`, `attempts`
3. **Project context**: `architecture.md` if present
4. **BMAD story**: full body if present, else the row's `title`
5. **Prior attempt feedback**: `last_feedback` if `attempts > 0`
**Known gap (2026-06-23):** the current prompt template at `src/damascus/phases.py:33-46` does NOT pass the row's declared `file_scope` or `budget_cycles` to the LLM. E2E test `test_spec_refiner_honors_declared_file_scope` codifies the fix. See `[[spec-refiner-gap-2026-06-23]]`.
## Output (what the agent must produce)
A spec file at `/data/specs/<project>/<story_id>.spec.md` (or wherever the orchestrator's spec dir is configured). The file must contain these sections, parseable by the build phase:
```markdown
# Spec: <story_id>
## Goal
<12 sentence description>
## Acceptance Criteria
1. <criterion>
2. <criterion>
...
## TDD Plan
- <path/to/test/file.test.js>: <what it tests>
- <path/to/test/file.test.jsx>: <what it tests>
...
## File Scope
- <path/to/file.ext> *(new or edit)*
- <path/to/file.ext>
...
## Test Command
<exact shell command that proves done>
## Ambiguities
- <open question, if any> → will become a human_issue
```
**Parsing rules for the build phase:**
- `## File Scope` section → list of paths
- `## Test Command` section → exact string, run from `/workspace/projects/<project>/`
- `## Ambiguities` non-empty → spec-refiner opens a `human_issue` and sets `phase='awaiting_human'`, **does not** transition to build
## Side effects on the row (the resume contract)
On success, the spec-refiner MUST:
1. Write the spec file to disk
2. Update the row in MySQL:
```sql
UPDATE work_items SET
phase = 'build',
spec_path = '<path to written file>',
file_scope = <parsed JSON array from the spec>,
updated_at = NOW()
WHERE id = '<row id>'
```
3. Emit an `events_outbox` row:
```
event_type: 'spec.completed'
payload: { spec_path, file_scope, model, tokens_in, tokens_out, usd }
```
4. Write a `cost_ledger` row with the LLM call's tokens + USD
On ambiguity (non-empty `## Ambiguities`):
1. Open a `human_issues` row
2. Set `phase = 'awaiting_human'`
3. Emit `events_outbox` event `human.issue.opened`
4. Do **not** set `phase = 'build'`
On LLM error (transient — network, rate limit, model error):
1. Increment `attempts` by 1
2. Store the error in `last_feedback`
3. Set `phase` back to `spec` (does not advance)
4. If `attempts >= budget_cycles` → `phase = 'blocked'`, emit `blocked.exhausted`
5. Emit `events_outbox` event `spec.error`
## Acceptance criteria (this is the contract)
- ✅ Spec refiner reads from `work_items` with `phase='spec'` only (not other phases)
- ✅ Spec refiner writes spec file to disk
- ✅ Spec refiner's prompt includes the row's declared `file_scope` and `budget_cycles`
- ✅ Spec refiner's spec output's `## File Scope` is a subset (or equal) of the row's declared `file_scope`
- ✅ Spec refiner's spec output's `## Test Command` is a non-empty string parseable as a shell command
- ✅ On non-ambiguous success: row transitions `spec → build`, `spec_path` is set, `events_outbox` has `spec.completed`
- ✅ On ambiguity: row transitions `spec → awaiting_human`, `human_issues` row is created with the question text
- ✅ On LLM error: `attempts` increments, row stays in `spec` (or transitions to `blocked` if exhausted)
- ✅ Spec refiner NEVER guesses: if `## Ambiguities` is empty but the spec is missing required sections, that is an error (verdict `spec_ambiguous`)
## How the next phase resumes from here
The build phase, when it claims a row, reads:
1. `work_items.spec_path` → load the spec file
2. `work_items.file_scope` (the parsed subset) → files the builder may touch
3. `## Test Command` from the spec → command to run after build
4. `## TDD Plan` from the spec → which failing tests must now pass
If any of these are missing, the build phase must return `tests_failed` with the missing field in `last_feedback` — **never invent defaults**.
## Cross-references
- [[damascus-orchestrator-overview]]
- [[builder-contract]] — what consumes the spec-refiner's output
- [[reviewer-contract]] — what the build phase hands to the reviewer
- [[state-resume-protocol]] — the general "pick up where left off" rules
- [[spec-refiner-gap-2026-06-23]] — the known gap in the current implementation

View File

@@ -0,0 +1,152 @@
---
title: State Resume Protocol
created: 2026-06-23
updated: 2026-06-23
type: concept
project: damascus-orchestrator
tags: [state-machine, resume, restart, idempotency, contract]
sources: [raw/articles/multi-project-orchestration-plan-1.md]
related: [damascus-orchestrator-overview, spec-refiner-contract, builder-contract, reviewer-contract]
---
# State Resume Protocol
How the damascus-orchestrator picks up exactly where it left off after any kind of interruption — a single cycle finishing, a worker dying, the host rebooting, or the cron firing late. **This is the contract that makes the system "Demonstrable" (design doc north star).**
## The fundamental rule
> **MySQL is the single source of truth on work-item state. The orchestrator is stateless. The wiki is a derived read layer. Every phase is idempotent given the same input row.**
This means: at any instant, the answer to "what should this row do next?" is fully computable from the row in MySQL. There is no in-memory state, no scheduler-internal queue, no "what was I doing" — just the row.
## What "resume" means in each case
### Case 1 — Single cycle finishes (normal completion)
The cycle ran, transitioned one row from phase X to phase Y, and exited. The next cycle reads the row in phase Y and acts on it. This is the happy path; nothing to resume.
### Case 2 — Worker dies mid-phase
The cycle crashed (OOM, network, kill) partway through a phase. The row is in some intermediate state:
| What the row looks like | What the next cycle must do |
|---|---|
| `phase='build'`, `claimed_at` set within last 30m, `claimed_by=<dead worker>` | Treat as **stale claim**. The next cycle can re-claim if `claimed_at < NOW() - INTERVAL 30 MINUTE`. The worktree persists; Claude Code's work may be partial. The builder resumes by re-running Claude with the same spec, in the same worktree, with `last_feedback` from the prior attempt as additional context. |
| `phase='build'`, `claimed_at` not set | No claim was taken. The next cycle claims it normally. |
| `phase='spec'`, no `last_feedback` | No spec-refiner call was made. Retry normally. |
| `phase='spec'`, `last_feedback` has `error` | LLM call failed. Increment `attempts`, retry. If `attempts >= budget_cycles`, block. |
| `phase='review'`, `pr_url` set, no `last_verdict` | Reviewer's `validate` ran, `assess` did not. Re-claim; re-run assess. |
| `phase='review'`, `pr_url IS NULL` | This is the bug. The build phase crashed after starting but before opening the PR. Re-route to build with verdict `no_pr`. |
### Case 3 — Host reboots, container dies
Same as Case 2: MySQL persists (separate container with its own volume); wiki persists (host bind mount); the orchestrator container can be restarted from scratch. On restart, the next cycle reads `work_items` and resumes. No special recovery code.
### Case 4 — Worker is in `awaiting_human`
The row sits in `awaiting_human` until the human answers. The cycle's `claim_for_spec` (and the other claim functions) **do not pick up rows in `awaiting_human`**. The resume happens when:
1. The human calls `damascus answer <issue_id> "<answer>"`
2. That command sets `phase = 'spec'` (back to the refiner) and closes the `human_issues` row
3. The next cycle claims it
The **orchestrator never blocks** waiting for the human — it moves on to other ready rows.
### Case 5 — Row is `blocked`
The `budget_cycles` is exhausted. The row does not get re-claimed automatically. The resume is manual:
```sql
UPDATE work_items SET phase = 'spec', attempts = 0 WHERE id = '<id>';
```
This is the human's escape hatch from a stuck row. The next cycle re-runs the spec-refiner from scratch.
## Idempotency contract (per phase)
Every phase function MUST be idempotent given the same input row. Concretely:
### `refine_spec(row)` — idempotent
- Reads: row, BMAD story, architecture, prior `last_feedback`
- Writes: spec file, `phase=build`, `spec_path`, `file_scope`, `cost_ledger` row, `events_outbox` event
- **Same row in twice:** the second invocation produces a new spec file (overwrites if the path is the same), and a new cost/event row. The `phase` is already `build`, so the second call would not even be claimed (the claim query looks for `phase='spec'`). So the spec-refiner is naturally idempotent.
### `build(row)` — must be idempotent across restarts
- Reads: row, spec, project repo
- Writes: worktree, branch, PR, `phase=review`, `branch`, `pr_url`, `base_commit`, `files_changed` in `last_feedback`
- **Idempotency requirement:** if the worktree already exists from a prior attempt, reuse it. If the branch already exists, check it out. If the PR already exists, fetch its URL and set `pr_url`. Only create new artifacts when the prior ones don't exist.
This is the trickiest one. The implementation must check before each write:
- `git worktree list` → if `<story_id>` worktree exists, don't recreate
- `git branch -a` → if `feat/<story_id>` exists, check it out, don't create
- Gitea API `GET /pulls?head=feat/<story_id>` → if PR exists, use its URL
If the cycle crashes after creating the worktree but before opening the PR, the next attempt finds the worktree, sees no PR, and continues from "open PR."
### `review(row)` — mostly idempotent
- Reads: row, PR diff, test results
- Writes: `phase=merged` (on pass), `last_verdict`, `merged_at`, `cost_ledger`, `events_outbox`
- **Idempotency requirement:** if `phase='merged'` already, the reviewer's claim function won't pick it up (claim looks for `phase='review'`). If a row gets re-claimed accidentally (race), the reviewer must check `merged_at IS NULL` before proceeding and bail out if set.
## What does NOT need to be idempotent
- **The LLM calls themselves.** The spec-refiner may produce a slightly different spec on re-run; that's fine, that's the whole point of "refine." Claude Code may take a different path on re-run. The cost ledger accumulates cost on every call; that is by design.
- **The events_outbox.** It is append-only. Re-running a phase adds new events; the drainer is idempotent on the (work_item_id, event_type, idempotency_key) tuple.
- **The cost_ledger.** It is append-only.
## Stale-claim recovery
The atomic claim sets `claimed_by = '<worker_id>'` and `claimed_at = NOW()`. If a worker dies, the row stays claimed. The next cycle's claim function MUST do:
```sql
UPDATE work_items SET
claimed_by = '<new worker>',
claimed_at = NOW()
WHERE id = (
SELECT id FROM work_items
WHERE phase = 'build'
AND (claimed_at IS NULL OR claimed_at < NOW() - INTERVAL 30 MINUTE)
AND attempts < budget_cycles
ORDER BY priority ASC, created_at ASC
LIMIT 1
FOR UPDATE SKIP LOCKED
)
```
The 30-minute stale-claim window is a configuration parameter. Default 30m; tunable per project.
## Acceptance criteria (the resume contract)
-`work_items` is the single source of truth — no in-memory scheduler state
- ✅ Every phase function is idempotent given the same input row (with the worktree-check, branch-check, PR-check pattern in `build`)
- ✅ A row whose `claimed_at` is > 30m stale is reclaimable by the next cycle
- ✅ Rows in `awaiting_human` are NOT auto-claimed (the human must answer first)
- ✅ Rows in `blocked` require manual unblock via SQL
- ✅ The orchestrator container can be killed mid-cycle; on restart, no row is lost
- ✅ The MySQL data volume persists across container restarts (it's a separate named volume)
- ✅ The wiki bind-mount persists across container restarts
- ✅ The Gitea PRs persist (they're in Gitea, not in the orchestrator)
- ✅ The `events_outbox` is append-only and the drainer is idempotent (Phase 3, but the table exists)
- ✅ The `cost_ledger` is append-only; reruns do not corrupt it
## What the E2E tests check
The test suite (`tests/e2e/`) has a `test_state_resume.py` file with tests that:
1. Insert a row in `build` phase with no `claimed_at` → cycle claims it normally
2. Insert a row in `build` phase with `claimed_at = NOW() - INTERVAL 31 MINUTE` → cycle re-claims it
3. Insert a row in `build` phase with `claimed_at = NOW() - INTERVAL 5 MINUTE` → cycle does NOT claim it
4. Insert a row in `awaiting_human` → cycle does NOT claim it
5. Kill the orchestrator container mid-build → restart → verify the next cycle resumes correctly
6. Insert a row in `review` with `pr_url IS NULL` → cycle re-routes it to `build` with verdict `no_pr`
## Cross-references
- [[damascus-orchestrator-overview]]
- [[spec-refiner-contract]]
- [[builder-contract]]
- [[reviewer-contract]]
- [[stale-claim-recovery]] — the 30-minute window and the worker_id pattern

View File

@@ -0,0 +1,71 @@
---
title: Damascus Orchestrator Overview
created: 2026-06-23
updated: 2026-06-23
type: entity
project: damascus-orchestrator
tags: [agent, state-machine, queue, docker, architecture]
sources: [raw/articles/multi-project-orchestration-plan-1.md]
related: [damascus-e2e-test-session-2026-06-23, spec-refiner-gap-2026-06-23]
---
# Damascus Orchestrator — Overview
The system being built. A Postgres/MySQL-backed state machine that schedules BMAD-decomposed stories across multiple project repos, where each story flows `spec-refine → build → review → merge` via atomic queue claims, every phase transition is gated by an objective signal, subjective judgment can only produce non-blocking advisory issues, the human is an async non-blocking input, and a purely observational overseer aggregates metrics and relays to the human — with attempt budgets and a global spend cap guaranteeing termination and bounded cost.
## Stack
- **MySQL 8.4** — source of truth on work-item state (`work_items`, `coordination_gates`, `human_issues`, `cost_ledger`, `events_outbox`)
- **Atomic claim:** `SELECT ... FOR UPDATE SKIP LOCKED`
- **Builder:** Claude Code via LiteLLM (routes to Ollama cloud) using `minimax-m3`
- **File-based specs and worktrees** at `/data/specs/<project>/<story_id>.spec.md` and `/workspace/worktrees/<story_id>`
- **Sidecar** for external concurrency view at `http://<host>:9100/status/active.json`
- **Gitea** for PRs, code, wiki (this wiki), registry
## Three loops
- **Loop A — Spec Refiner:** `phase=spec` → reads story + architecture → writes `.spec.md` → declares `file_scope` + `test_cmd` → moves to `build`
- **Loop B — Code Builder:** `phase=build` → worktree + branch → Claude Code writes code → tests locally → rebase → push → open PR → moves to `review`
- **Loop C — Reviewer:** `phase=review``validate` (objective gate: tests + build + lint + clean rebase) → `assess` (subjective, advisory) → typed verdict → merge on `pass`
## Typed verdicts
| Verdict | Routes to | Meaning |
|---|---|---|
| `pass` | merge | objective gate clean |
| `tests_failed` | builder | code problem |
| `rebase_conflict` | builder | merged meanwhile |
| `spec_ambiguous` / `spec_wrong` | spec-refiner | escalate up |
| `no_pr` (added defensively) | builder | build claimed success but did not open a PR |
## Loop-breaker
`attempts >= budget_cycles``phase=blocked`. Default `budget_cycles=5`. No infinite ping-pong.
## What's currently built
- ✅ MySQL schema, atomic claim
- ✅ Bare repo + worktree provisioning
- ✅ Spec refiner (with known gap — see `[[spec-refiner-gap-2026-06-23]]`)
- ✅ Code builder (Claude Code + LiteLLM)
- ✅ Reviewer (with `no_pr` defensive verdict)
- ✅ Sidecar `/status/active.json` on :9100
- ✅ Cost ledger
- ✅ Events outbox
-`human_issues` inbox + `damascus questions` CLI
- ✅ Wiki on shared volume, versioned in Gitea
## What's NOT built yet (Phase 25)
- Cross-project scheduler
- Wiki snapshot-pinning + `wiki_pin` on work items
- Async human channel (plumbing exists, no UI)
- Fair-share scheduling
- Metrics analyzer with threshold triggers
- Coordination gates for cross-story deps
## Cross-references
- [[multi-project-orchestration-plan-1]] — the design doc
- [[damascus-e2e-test-session-2026-06-23]] — current test run
- [[spec-refiner-gap-2026-06-23]] — known gap in the spec-refiner prompt

View File

@@ -0,0 +1,80 @@
---
title: Damascus E2E Test Session — 2026-06-23
created: 2026-06-23
updated: 2026-06-23
type: query
project: wh40k-pc
tags: [agent, in-progress, decision, finding, gap, parked]
sources: [raw/transcripts/session-2026-06-23.md]
confidence: high
related: [damascus-orchestrator-overview, spec-refiner-gap-2026-06-23]
---
# Damascus E2E Test Session — 2026-06-23
**Started:** 2026-06-23 03:35 UTC
**Operator:** Hermes
**Goal:** Falsify design doc §4/§12 — one work item `spec → build → review → merged` with real PR, real test, real merge.
**Current state:** PARKED. Awaiting BMAD-style interrogation of the spec-refiner gap.
## Stories in flight
### lists-1 — useLocalStorage hook (wh40k-pc)
- **DB ID:** b016ea9f-6eb4-11f1-99df-4a942d1e5561
- **Phase at last tick:** `build` (spec phase completed, build attempted, failed, attempt count 1/5)
- **file_scope (row-level):** `src/hooks/useLocalStorage.js`, `src/hooks/useLocalStorage.test.js`
- **budget_cycles:** 5
- **Test command (planned):** `npm test -- --testPathPattern=useLocalStorage` in `/workspace/projects/wh40k-pc`
- **Branch:** `master` (verified, `git_ops.py` tries `main``master``develop`)
## Cron
- **Orchestrator cycle cron (REMOVED):** was `*/30 * * * * cd /root/damascus-orchestrator && docker compose exec -T orchestrator damascus cycle` — removed because the user clarified this is not for the system being built; the orchestrator cycle is the system under test, not the heart of the dev session
- **Heartbeat cron (ACTIVE):** `damascus-test-heartbeat` (job id `c0f2194bdfdc`), every 30m, agent-driven, delivers to Discord home channel — this is the "ping me to keep working" cron
## Cycle log (chronological)
| When | Tick # | Action | Verdict | Notes |
|---|---|---|---|---|
| 2026-06-23 03:36 | 1 | claim_for_spec → LLM call → write spec | `pass` | spec-refiner produced a 12-file feature spec ignoring the row's pre-declared 2-file scope |
| 2026-06-23 03:48 | — | heartbeat installed; orchestrator cycle cron removed | — | lists-1 left in `build` phase, parked |
## Decisions made this session
- **Scope:** Picked `lists-1` (useLocalStorage hook) as the smallest unit that proves the loop. Original scope was the full list import feature; reduced to a 1-file change to give one Claude call room to finish in 50 turns.
- **BMAD pre-flight skipped.** The wh40k-pc stories are hand-written rows, not sharded PRD stories. Test is for §4/§12 only, not for the BMAD → orchestrator ingestion path.
- **`no_pr` typed verdict added** as defensive — observed real failure mode where the build phase returned success-without-opening-a-PR, and the review phase merged anyway.
- **Cron split into two roles:** orchestrator cycle (the system being built, runs the loop) vs heartbeat (drives the dev session forward, pings the agent). Initially confused; clarified 2026-06-23 03:50.
- **Wiki now on a shared volume, not baked into the image.** Created `kaykayyali/damascus-wiki` on Gitea. Workers can now mount it. This page lives in the new wiki at `queries/damascus-orchestrator/2026-06-23-test-session.md`.
## Heartbeat log (cron: c0f2194bdfdc, every 30m → Discord home)
| When | What I saw | What I prompted |
|---|---|---|
| 2026-06-23 03:48 | Installed heartbeat cron. lists-1 in `build` (spec-refiner produced a 12-file spec ignoring the row's pre-declared 2-file scope). Orchestrator cycle is OFF per user direction. | Pinging human to drive the BMAD interrogation of the spec-refiner gap. |
## Findings worth keeping
- **Spec-refiner ignores row-level `file_scope`.** The prompt template at `src/damascus/phases.py:33-46` does not pass `work_items.file_scope` or `work_items.budget_cycles` to the LLM. LLM hallucinates scope from the title alone. → see `[[spec-refiner-gap-2026-06-23]]`
- **Reviewer was passing through merges with no PR.** Fixed during the test by adding the `no_pr` typed verdict and the `pr_url IS NULL` guard. → see `[[damascus-orchestrator-overview]]`
- **Wiki must be on a shared volume, not in the image.** Initial bake-in (`/opt/damascus/llm-wiki/` inside the image) made the wiki invisible to other workers and vulnerable to container death. → wiki now at `/root/damascus-orchestrator/wiki/` bind-mounted into containers.
## What's NOT being tested in this run
- Multi-project (§10)
- Wiki facts + snapshot-pinning (§11)
- Async human channel (§9, `awaiting_human` is plumbed but unused)
- Fair-share scheduling (§10)
- Metrics analyzer with threshold triggers (§7)
- Coordination gates for cross-story dependencies (§6 Layer 2, prerequisite items)
- Re-review / reviewer-loop optimization
These are Phase 25 in the design doc. This test is Phase 1.
## Cross-references
- [[damascus-orchestrator-overview]] — the system being built
- [[spec-refiner-gap-2026-06-23]] — the spec-refiner prompt gap, parked for BMAD interrogation
- [[multi-project-orchestration-plan-1]] — the design doc this session is testing against

View File

@@ -0,0 +1,106 @@
---
title: Spec Refiner Prompt Gap
created: 2026-06-23
updated: 2026-06-23
type: query
project: damascus-orchestrator
tags: [gap, finding, spec-refiner, parked, decision-needed]
sources: [raw/transcripts/session-2026-06-23.md]
related: [spec-refiner-contract, damascus-e2e-test-session-2026-06-23, damascus-orchestrator-overview]
confidence: high
contested: false
---
# Spec Refiner Prompt Gap — 2026-06-23
**Status:** PARKED. Awaiting decision on refactor-vs-replace-vs-constrain.
## The finding
The spec-refiner's prompt template at `/root/damascus-orchestrator/src/damascus/phases.py:33-46` does **not** pass the row's pre-declared `file_scope` or `budget_cycles` to the LLM. The LLM sees only the project name, the story title, and any BMAD/architecture files. It hallucinates the rest.
**Evidence from the 2026-06-23 test:**
- Row `lists-1` was inserted with `file_scope = ["src/hooks/useLocalStorage.js", "src/hooks/useLocalStorage.test.js"]` and `title = "Add useLocalStorage React hook with cross-tab sync and tests"`.
- The spec-refiner's first call produced a spec covering **12 files** (listStore.js, battlescribeParser.js, ListImport.jsx, ListsPanel.jsx, ConfirmDialog.jsx, App.jsx, main.jsx, package.json, plus 4 test files). 8 acceptance criteria covering the full list-import feature.
- The spec's `## File Scope` was NOT a subset of the row's declared `file_scope`.
- The `## Test Command` section was truncated/missing.
This is a clear violation of `[[spec-refiner-contract]]`: "Spec refiner's prompt includes the row's declared `file_scope` and `budget_cycles`" and "Spec refiner's spec output's `## File Scope` is a subset (or equal) of the row's declared `file_scope`."
## Why it happened
Looking at the prompt code (`phases.py:33-46`):
```python
user = (
f"# Project\n{project}\n\n# Story\n{title}\n\n"
f"# BMAD story file\n{bmad_story or '(missing)'}\n\n"
f"# Architecture\n{arch or '(missing)'}\n\n"
"Write a Markdown spec with these sections:\n"
"## Goal\n## Acceptance Criteria (numbered)\n## TDD Plan (list the failing tests)\n"
"## File Scope (list of paths/globs the implementation may touch)\n"
"## Test Command (the exact shell command that proves done)\n"
"## Ambiguities (any open questions for a human)\n"
)
```
The LLM is told to **list** the file scope, but is not told what the row has already declared. So it lists what it thinks the spec should touch. With an LLM that wants to be helpful and a story title that says "useLocalStorage hook" (which is a small thing), the LLM reason that the hook would be used by a list-import feature, and expanded the scope to cover the whole feature.
## The three options (BMAD-style interrogation)
### Option A — Constrain (smallest change)
Modify the prompt to inject the row's declared constraints:
```python
user = (
f"# Project\n{project}\n\n"
f"# Story\n{title}\n\n"
f"# Pre-declared file_scope (HARD LIMIT — the spec MUST be a subset of this):\n"
f"{file_scope}\n\n"
f"# budget_cycles: {budget_cycles}\n\n"
f"# BMAD story file\n{bmad_story or '(missing)'}\n\n"
f"# Architecture\n{arch or '(missing)'}\n\n"
"Write a Markdown spec with these sections:\n"
"## Goal\n## Acceptance Criteria (numbered)\n## TDD Plan (list the failing tests)\n"
"## File Scope (list of paths/globs the implementation may touch — MUST BE A SUBSET OF THE PRE-DECLARED SCOPE)\n"
"## Test Command (the exact shell command that proves done)\n"
"## Ambiguities (any open questions for a human)\n"
)
```
Then add a post-parse check: if the spec's `## File Scope` is not a subset of the row's `file_scope`, return verdict `spec_ambiguous` with a note.
**Cost:** 30 min. **Risk:** the LLM still might not honor the constraint. **Verifies via:** the new E2E test `test_spec_refiner_honors_declared_file_scope`.
### Option B — Refactor (medium change)
Make the spec-refiner consume a BMAD story file (which is what the design doc §2 says it should do) instead of the row's `title`. The BMAD story already has a `## File Scope` section, so the LLM has structure to follow. The row's `file_scope` is **derived from the BMAD story's File Scope section at ingest time**, not hand-written.
**Cost:** 2-3 hours. **Risk:** every row now needs a BMAD story file before it can be ingested; the hand-written row pattern goes away. **Verifies via:** a new `ingest` test that ingests a BMAD project and derives `file_scope` correctly.
### Option C — Replace with BMAD (largest change)
Make the spec-refiner call `bmad-create-story` instead of its own prompt. The output is a proper BMAD story file with the full template. The row's `file_scope` is irrelevant — BMAD enforces the structure.
**Cost:** half a day. **Risk:** tight coupling to BMAD versions, less control over the spec shape. **Verifies via:** the existing BMAD story files (`bmad/wh40k-pc/_bmad-output/planning-artifacts/lists-1.md`) round-trip through the new refiner and produce equivalent specs.
## What the E2E suite already encodes
The test `test_spec_refiner_honors_declared_file_scope` in `tests/e2e/test_spec_refiner.py` codifies the contract. It is RED right now. Any of the three options makes it GREEN.
## Recommendation
**Start with Option A** (constrain) because:
- Smallest change, lowest risk
- Codifies the contract in the prompt AND in the post-parse check
- Doesn't change the data model (rows still have `file_scope` as a column)
- The BMAD path stays a future option (B/C) if A is insufficient
If Option A is insufficient after one round of testing (e.g., the LLM still over-scopes because the story title is too vague), escalate to Option B.
## Cross-references
- [[spec-refiner-contract]] — the contract this gap violates
- [[damascus-e2e-test-session-2026-06-23]] — where the gap was observed
- [[damascus-orchestrator-overview]] — the system