Bind-mount /root/.hermes/scripts/damascus-ntfy-bridge.py into the
damascus-api container at /usr/local/bin/, so a container recreate
(image rebuild) doesn't wipe the bridge script. Add the named
volume damascus_ntfy_state mounted at /var/lib/damascus-ntfy to
persist the bridge's high-water mark, so the phone doesn't get
re-pinged for events it already received after a redeploy.
See ~/.hermes/skills/devops/damascus-ntfy-bridge/SKILL.md for the
deployment contract.
- react-markdown@9.1.0 + remark-gfm@4.0.1 for question rendering
- AnswerPopover component (shared between drawer + OpenIssues widget)
- OpenIssues: markdown render + inline 'Answer' button per row
- ItemDrawer: markdown render for the answer prompt
- useAskHermes hook + AskHermesResponse schema
- POST /v1/issues/{id}/ask-hermes — emits hermes_ping event
(queued) or echoes existing answer (answered)
- Tests: 4 new API tests for /ask-hermes, updated UI tests
for new popover trigger + mock returns
- docs/human-issue-ux.md — flow + migration notes
The 'Ask Hermes' flow: UI pings the backend, backend emits an
event for the leader (operator session) to pick up, leader drafts
an answer and POSTs back via the existing answer endpoint. UI
prefills the textarea — never auto-submits, the human always
reviews and clicks Submit.
scripts/verify.sh — bash E2E smoke that proves 'v1 works' without a browser.
8 sections (preflight, stack-up, mcp-stdio, ingest-via-mcp, ui-shows-it,
drive-cycle, cleanup, summary); exits non-zero on first failure. Drives
phase transitions via direct SQL to bypass the orchestrator worker's claim
loop. Cleans up its own rows so re-runs are idempotent.
scripts/_verify_mcp_helper.py — Python MCP stdio helper used by verify.sh.
Drives python -m damascus.mcp_server via the official mcp SDK client and
frames the JSON-RPC handshake + tools/list + ingest_story so bash does
not have to manage Content-Length headers or heredoc framing.
docs/VERIFICATION.md — <1 page runnable-by-hand recipe plus architecture
notes (token source, MCP upstream DNS, why direct SQL, failure modes).
Verified end-to-end: bash scripts/verify.sh exits 0 against the live stack
(7/7 sections green; log at .hermes/evidence/p6a/verify.log, gitignored).
tests/contract + tests/unit still 56/56 green.
BMAD-onboarding kit for the Damascus orchestrator:
- docs/adding-a-new-project.md — full onboarding guide covering layout,
required story section headers, common pitfalls (with the four classes
of bug that have cost real cycles here: Path.rglob doesn't follow
symlinks, architecture.md must be at planning-artifacts/architecture.md
exactly, missing section headers burn 3 retries each, etc.)
- bmad/_kit/ — read-only reference material (templates + sample)
- templates/{prd,architecture,epics,story}.md
- sample/hello-bmad/_bmad-output/ — one fully-formed worked example
(2-story FastAPI project, valid end-to-end)
- README.md — kit-level contract
- scripts/test-ingest.sh — pre-flight validation that catches the four
bug classes before any DB write. Verified against the live orchestrator
container: passes on the sample, fails (correctly) on a hand-broken tree
with both missing-section AND symlink bugs in one run.
- docker-compose.yml — replace /home/kaykayyali/_bmad bind (which
doesn't exist on this server) with ./bmad/_kit. Kit now ships with
the repo.
- .gitignore — re-include bmad/_kit/ so it travels with the repo while
keeping the existing 'bmad/ is ephemeral mount content' contract.
Verified end-to-end: 'damascus ingest --project hello-bmad' succeeded
on the live orchestrator, _find_bmad_story resolved both stories.
The 'architecture.md is ingested as a work item' quirk is documented in
docs/adding-a-new-project.md §'Common pitfalls' with a one-liner fix.
Refs: t_5aa80e4b (parallel dashboard work — committed separately)
P6 worker hit the 120-iter budget cap twice while finishing the e2e
harness and the verify.sh recipe. The artifacts on disk were correct
and passing — both runs reported 'all 4 phases PASSED' before the
budget ran out — but the worker died before commit/push. Recovered by
running the test suite against merged main (PR #19 landed as 60ec5f6)
and committing the verified artifacts.
What this PR ships:
1. tests/e2e/test_entry_points_e2e.py (668 lines)
Single Playwright + MCP integration test exercising the full v1
entry-points surface against the live docker-compose stack:
Phase 1: ingest_story via MCP server (stdio subprocess) ->
assert WorkItemResponse.phase == 'spec'
Phase 2: navigate UI to /#/items, poll for the new row within 5s,
open the drawer, assert the 4 P5 widgets render non-zero
Phase 3: drive state.set_phase spec -> build -> review -> merged;
reload UI after each transition, assert phase pill updates
Phase 4: open a human_issue via state.open_human_issue; answer it
via MCP.answer_question; assert status -> 'answered';
reload drawer, assert the answer shows
Own cleanup (project='e2e-test' only) so it doesn't collide with
other tests against the same DB.
2. tests/e2e/conftest.py
Helpers: state.open_human_issue, state.set_phase, state.get_item
wrappers that the e2e test uses to drive the cycle directly without
spinning the orchestrator loop.
3. scripts/verify.sh
30-second manual smoke: /healthz, /v1/items read, /v1/items?group_by=project
(P5 backend), /v1/stats, auth 401 path, smoke ingest with token.
Exits non-zero on any failure.
4. docs/VERIFICATION.md
One-page recipe: 30s check + full cycle walkthrough. Runnable by
Kay without agent help.
5. .gitignore
Add .hermes/evidence/ — e2e screenshots/logs are regenerated by
the test on every run, no need to ship them.
Live verification (post-merge, against main):
bash scripts/verify.sh -> PASSED (7/7 checks green)
pytest tests/e2e/test_entry_points_e2e.py -q -> 1 passed in 32.24s
Worker self-block reason noted in t_556485a7: 'review-required handoff'
style summary was written before the budget ran out; the work is
complete and verified.
P5 schema (GroupedItemsResponse + ProjectGroup + ListItemsQuery.group_by)
landed in 79d1d74 but the runtime handler never wired it. Without this
commit, the dashboard renders against a 422 on every load.
Handler routing:
- group_by=project -> GroupedItemsResponse (one bucket per project,
per-phase counts, sort/pagination intentionally
not honored in the grouped view)
- group_by=<other> -> 400 bad_request
- group_by absent -> ListItemsResponse (unchanged)
response_model widened to Union[ListItemsResponse, GroupedItemsResponse]
so FastAPI's OpenAPI schema reflects both shapes.
Tests: 4 new cases covering grouped shape, filter interaction, the 400
path, and a regression check that no-group_by stays flat. 34/34 in
tests/api/test_api_endpoints.py, 86/86 across tests/api + tests/contract.
Live verified: POST 2 items, GET /v1/items?group_by=project returns
single project bucket with both items and per-phase counts.
Brings the worker-authored Playwright spec into the branch so the PR diff
is complete. Covers: /ingest form redirect, dashboard widget rendering,
project-grouped tabs, ItemDrawer answer form for awaiting_human items,
and a 375x667 mobile-viewport smoke (no fixed pixel widths).
Runs against the fixture_api on :9111 (vite build with
VITE_API_BASE_URL=http://127.0.0.1:9111) seeded with one awaiting_human
item, one open issue, and 7 days of cost data.
TDD: red wrote the test, green extracted the component. Pure
presentation: takes phase_counts + total, renders the stacked Paper
+ Box bar. width is set as inline style (not sx) so the test can
read element.style.width directly; sx routes dynamic values through
emotion's stylesheet where assertion is harder. The Dashboard and
any project-grouped sub-view can now mount this in place of the
inline rendering.
- POST /v1/items (in-memory insert, idempotent on (project, story_id),
validates IngestStoryRequest field constraints)
- POST /v1/issues/{id}/answer (sets answer/status='answered'/
answered_at, returns AnswerIssueResponse)
- GET /v1/cost?days=N (synthetic 7-day deterministic data; one spike
day so the CostSparkline widget has a visible shape)
- GET /v1/issues?status=&project=&limit=&offset= (backing the
OpenIssues widget's 'last 5' list)
- GET /v1/items?group_by=project returns GroupedItemsResponse
(GroupedItemsResponse shape matches src/damascus/api_schemas.py
P5 additions); bad group_by values return 400
Fixture dataset grows from 3 items (v1) to 5 (v2): adds an item in
awaiting_human with an open issue (answer-form target) and a blocked
item with last_verdict+last_feedback (BlockedItems target). Also
adds 2 events for the awaiting_human item so the drawer's recent
events list has something to render.
v1 e2e open-issues-count expectation bumped 1→2 to match the new
fixture.
- useIngestStory / useAnswerIssue: useMutation with onSuccess
invalidation of the relevant query keys (items, stats, item, issues)
- useCostSummary(days): polled every 5s, returns CostSummaryResponse
- useGroupedItems: /v1/items?group_by=project → GroupedItemsResponse
- useOpenIssues(limit): /v1/issues?status=open&limit=N for the OpenIssues widget
Drops dead v1 exports (deriveProjects / matchesPhaseFilter / DEFAULT_*
constants) that had no consumers.
Adds VITE_API_WRITE_TOKEN env var (baked at build time, LAN-trusted
per contract §4). When non-empty, every POST sets
'Authorization: Bearer ***. GETs remain token-free per the
contract. Empty token (test fixture, read-only deployment) is a
no-op — the bundle still ships and the write fails server-side with
401.
Adds vitest.config.ts + tests/unit/setup.ts + 3 unit tests covering
the header-on-POST / header-absent-on-GET / header-absent-on-empty
paths. TDD: red wrote the tests first, green added the header.
The v1 e2e suite (npm run test:e2e) hardcoded port 9110 for the
fixture_api.py and VITE_API_BASE_URL. P2's real damascus-api now binds
9110 on the developer host, so reuseExistingServer: true makes the
suite hit the real (empty) API and the tests fail with '0 matching'.
Move the fixture to 9111 by default; CI / clean hosts override with
FIXTURE_API_PORT=9110.
Also adds docs/plans/2026-06-24-p5-damascus-ui-v2.md (the P5 plan
that the worker will execute against), a test:unit script, and the
testing-library devDeps needed by the v2 component tests.
The 'git merge origin/main' auto-merged both P2 and P3 dev-deps blocks
into one pyproject.toml with two [project.optional-dependencies] sections
(tomltools refuses to parse that). Drop the second copy; both blocks
listed the same pytest + pytest-asyncio pair, just in different order.
Caught by 'python -m pytest' exiting 1 with a TOMLDecodeError before any
test ran.
Implements wiki/concepts/entry-points-contract.md sections 2 + 4:
- All 10 endpoints wired to existing state.* helpers (no new mutations):
GET /healthz, /v1/items, /v1/items/{id}, /v1/issues,
/v1/events, /v1/cost, /v1/stats
POST /v1/items, /v1/items/bulk, /v1/issues/{id}/answer
- Token check middleware on writes (POST). Empty DAMASCUS_API_TOKEN at
startup fails closed (serve_cmd exits 1 before importing api).
- Token-bucket rate limit per source IP, default 30/min write +
120/min read, configurable via env. Returns 429 + Retry-After.
- psycopg_pool.ConnectionPool(min=2, max=5) shared across FastAPI
threadpool (lazy, env-driven).
- StaticFiles mount for UI bundle at /opt/damascus/ui; does not crash
if the dir is empty (P4 ships this).
- 'damascus serve' CLI subcommand with --reload for dev.
docker-compose: new damascus-api service reuses the existing
damascus-orchestrator image, mounts /opt/damascus/ui from ./ui-bundle
(empty dir is fine), reads /root/.hermes/.env for the token, depends
on db, healthchecks /healthz.
Tests (46 pass against live Postgres at 127.0.0.1:5432):
- tests/api/test_api_auth_and_ratelimit.py (auth, 401, 429, /healthz)
- tests/api/test_api_endpoints.py (every endpoint, all happy/error paths)
- tests/contract/test_api_schemas_match_db.py (enum parity + 3
POST response shape round-trips through real upsert_story + read-back)
Acceptance (live compose service at :9110):
- healthz -> 200 '{"status":"ok"}'
- POST /v1/items no token -> 401 unauthorized
- POST /v1/items wrong token -> 401 unauthorized
- POST /v1/items correct token -> 200
- 31st POST in 60s from same IP -> 429 with Retry-After
- /openapi.json exposes all 10 expected paths
Companion to PR #14 (the source-grep contract test). The contract at
wiki/concepts/reviewer-contract.md (step 3) requires that ## Test
Command actually exits 0 in the worktree before the reviewer returns
'pass'. The previous implementation had two early-return pass branches
on missing test_cmd or missing worktree — both bypass the actual test
execution and route the row to merged.
Fix: replace both early-return pass branches with tests_failed verdicts
that carry the reason. The cycle's verdict switch routes tests_failed
back to build (retry path), which is the correct behavior for a row
whose validate layer could not actually validate.
Option A from the gap note
(wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.md).
The other options (B: recreate worktree from row.branch, C: typed
validate_skipped verdict) are also valid; this PR picks A for minimum
blast radius. The contract test in PR #14 forbids only the literal
"passing through" phrase, so this fix lands it GREEN.
Verified:
- RED on main: 2 occurrences of "passing through" in phases.py
- GREEN on fix: 0 occurrences after this commit
- contract test on this branch: PASSED (1/1)
- full contract+unit suite: 29/29 pass
- E2E test_reviewer_03: still RED (separate setup bug — manual UPDATE
does not clear claimed_at, so the second cycle cannot re-claim the
row; documented in the gap note, out of scope for this fix)
Refs: PR #14, issue #13, gap note above.
Co-Authored-By: Claude <noreply@anthropic.com>
Companion to PR #10 (spec-refiner file_scope/budget_cycles) and PR #12
(spec-refiner ambiguity routing fix). Source-grep codification of the
reviewer-validate Loop-C bug surfaced by
tests/e2e/test_reviewer.py::test_reviewer_03_validate_layer_runs_test_cmd
on 2026-06-24.
Bug: phases.py:313 and :317 return _verdict('pass') with note
'passing through' when test_cmd is missing or worktree is missing.
The contract at wiki/concepts/reviewer-contract.md (step 3) requires
the actual test_cmd to exit 0 in the worktree before reviewer returns
'pass'. The early-return bypasses the test, routes the row to 'merged',
defeating the validate layer's defense-in-depth purpose.
Fix options (gap note wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.md):
- A: 5-line fail-closed (replace pass with tests_failed)
- B: recreate worktree from row.branch then validate
- C: new 'validate_skipped' typed verdict
All three remove the 'passing through' literal; test passes on any.
Multi-pattern tolerance: only the exact 'passing through' phrase is
forbidden; fix author can phrase the replacement note however they want.
RED on main today. GREEN on any of the three fix options.
28/29 contract+unit pass; the 1 fail is this new test.
Refs: wiki/queries/damascus-orchestrator/reviewer-validates-failing-test-cmd-still-merges-2026-06-24.md,
issue #13, PR #12 (same source-grep pattern, different contract).
The Playwright webServer boots the fixture FastAPI on 127.0.0.1:9110;
without VITE_API_BASE_URL the SPA fetches /v1/* same-origin from the
vite preview at :4173, gets 500s, and the suite flakes based on
whether dist/ happens to be fresh from a prior build.
- Vite 6 + React 19 + MUI 6 SPA at ui/
- Routes: Dashboard, Items (MUI DataGrid), ItemDrawer
- /v1/items filter+sort+limit wired to URL hash for shareable links
- React Query hooks (useStats, useListItems, useItemDetail, useRecentEvents)
- Playwright e2e suite: 4 tests against fixture API on :9110
(dashboard widgets, items table, drawer with item+open_issues+recent_events, phase filter narrows results)
- Multi-stage Dockerfile (node:22-alpine build -> /bundle)
- Compose service damascus-ui-build: one-shot, writes dist/ to
named volume damascus_ui for the (P2) damascus-api container to mount
- Fixture FastAPI app (tests/e2e/fixture_api.py) for e2e runs without
a live damascus-api (P4 ships ahead of P2 by design)
Acceptance: build green, 4/4 e2e tests green
Implements P3 of the entry-points work. The MCP server is a thin
stdio wrapper around damascus-api: seven tools, each one HTTP call.
No direct Postgres access — all data flows through the API.
Tool catalog (7):
- list_items → GET /v1/items
- get_item → GET /v1/items/{id}
- list_open_questions → GET /v1/issues?status=open
- answer_question → POST /v1/issues/{id}/answer
- ingest_story → POST /v1/items
- bulk_ingest → POST /v1/items/bulk
- system_status → GET /v1/stats
Tool input schemas are derived from Mcp*Args Pydantic models via
model_json_schema() — single source of truth, no hand-written JSON.
Test test_input_schemas_derived_from_mcp_args_models asserts no drift.
Adds:
- src/damascus/mcp_server.py mcp SDK stdio server + 7 tools
- tests/contract/test_mcp_roundtrip.py 11 round-trip tests via httpx.MockTransport
- tests/contract/test_mcp_cli.py CLI subcommand tests
- McpBulkIngestArgs / McpBulkIngestStoryItem in api_schemas.py
- damascus mcp-serve CLI subcommand
- pyproject.toml: mcp>=1.0 dep, pytest-asyncio dev extra, asyncio_mode=auto
Acceptance:
- python -c 'from damascus.mcp_server import mcp; print(len(mcp.list_tools()))' → 7
- pytest tests/contract/test_mcp_roundtrip.py tests/contract/test_mcp_cli.py → 14/14 pass
- All 6 args-derived tools have zero schema drift
Companion to PR #10. The contract at wiki/concepts/spec-refiner-contract.md
§1 'Prompt assembly order' step 2 requires the prompt to include the row's
declared file_scope + budget_cycles so the LLM honors the row's pre-declared
constraints. Without this, the LLM sees only project + story + BMAD + arch
and hallucinates its own scope (observed 2026-06-23 on row lists-1: declared
2 files, LLM produced a 12-file spec).
Option A from wiki/queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md
(constrain, ~30 min). The contract test in PR #10 forbids the literal
"file_scope = item" and "budget_cycles = item" absent — this fix lands
it GREEN.
Verified:
- RED on main: contract test fails (assertion on missing file_scope/budget_cycles)
- GREEN on this branch: contract test passes (29/29 contract+unit pass)
Refs: PR #10, gap note above, issue #? (TBD)
Co-Authored-By: Claude <noreply@anthropic.com>
The spec-refiner contract (wiki/concepts/spec-refiner-contract.md §1,
'Prompt assembly order' step 2) requires the prompt to include the
row's declared file_scope and budget_cycles. The current prompt at
src/damascus/phases.py:37-46 omits both — observed at 2026-06-23 03:36
on row lists-1, where the LLM produced a 12-file spec for a row that
declared a 2-file scope.
This source-grep test codifies the structural end of the contract:
the prompt must reference both row attributes. The E2E test
test_spec_refiner_03_honors_declared_file_scope codifies the
behavioral end. Both fail today; both pass once the spec-refiner
adopts Option A from the gap note (wiki/queries/damascus-orchestrator/
spec-refiner-gap-2026-06-23.md, ~30 min).
Source-grep form (per the skill's contract-test pattern): CI-friendly,
no docker, structural-not-behavioral, narrow scope to the prompt
construction. Negative-checked by reverting phases.py to a known
broken state and confirming the test still fails as expected.