9 Commits

Author SHA1 Message Date
402193e9ab feat(e2e): P6b Playwright + MCP spec (env indirection + pinned deps) (#24)
Some checks failed
test / contract-and-unit (push) Failing after 14s
2026-06-27 16:38:37 +00:00
8bf73e255f feat(orchestrator): distinguish transient vs structural tests_failed (ADR-005) (#31)
Some checks failed
test / contract-and-unit (push) Has been cancelled
2026-06-27 16:38:32 +00:00
acec3ea7e4 Merge branch 'verify/p6a-recipe' into main: P6a manual verification recipe (closes part of P6)
Some checks failed
test / contract-and-unit (push) Failing after 14s
3 files added/changed:
- scripts/verify.sh — bash E2E smoke, 8 sections, 7/7 green
- scripts/_verify_mcp_helper.py — Python MCP stdio helper
- docs/VERIFICATION.md — <1 page operator runbook

P6 is split into P6a (this) + P6b (Playwright e2e, in flight). P6a is the manual
merge-gate proof; P6b adds the automated Playwright spec on top.
2026-06-26 14:18:48 +00:00
Hermes
01607f4d9e feat(dashboard): human-issue UX — markdown + inline answer + Ask Hermes
Some checks failed
test / contract-and-unit (pull_request) Failing after 15s
- react-markdown@9.1.0 + remark-gfm@4.0.1 for question rendering
- AnswerPopover component (shared between drawer + OpenIssues widget)
- OpenIssues: markdown render + inline 'Answer' button per row
- ItemDrawer: markdown render for the answer prompt
- useAskHermes hook + AskHermesResponse schema
- POST /v1/issues/{id}/ask-hermes — emits hermes_ping event
  (queued) or echoes existing answer (answered)
- Tests: 4 new API tests for /ask-hermes, updated UI tests
  for new popover trigger + mock returns
- docs/human-issue-ux.md — flow + migration notes

The 'Ask Hermes' flow: UI pings the backend, backend emits an
event for the leader (operator session) to pick up, leader drafts
an answer and POSTs back via the existing answer endpoint. UI
prefills the textarea — never auto-submits, the human always
reviews and clicks Submit.
2026-06-26 14:09:57 +00:00
hermes-kanban
79e3e59ab5 feat(verify): P6a manual verification recipe + verify.sh
scripts/verify.sh — bash E2E smoke that proves 'v1 works' without a browser.
8 sections (preflight, stack-up, mcp-stdio, ingest-via-mcp, ui-shows-it,
drive-cycle, cleanup, summary); exits non-zero on first failure. Drives
phase transitions via direct SQL to bypass the orchestrator worker's claim
loop. Cleans up its own rows so re-runs are idempotent.

scripts/_verify_mcp_helper.py — Python MCP stdio helper used by verify.sh.
Drives python -m damascus.mcp_server via the official mcp SDK client and
frames the JSON-RPC handshake + tools/list + ingest_story so bash does
not have to manage Content-Length headers or heredoc framing.

docs/VERIFICATION.md — <1 page runnable-by-hand recipe plus architecture
notes (token source, MCP upstream DNS, why direct SQL, failure modes).

Verified end-to-end: bash scripts/verify.sh exits 0 against the live stack
(7/7 sections green; log at .hermes/evidence/p6a/verify.log, gitignored).
tests/contract + tests/unit still 56/56 green.
2026-06-26 07:03:45 +00:00
damascus-heartbeat
82b9758be6 feat(bmad): add canonical _kit (templates + sample) + ingest validation
Some checks failed
test / contract-and-unit (push) Failing after 14s
BMAD-onboarding kit for the Damascus orchestrator:

- docs/adding-a-new-project.md — full onboarding guide covering layout,
  required story section headers, common pitfalls (with the four classes
  of bug that have cost real cycles here: Path.rglob doesn't follow
  symlinks, architecture.md must be at planning-artifacts/architecture.md
  exactly, missing section headers burn 3 retries each, etc.)
- bmad/_kit/ — read-only reference material (templates + sample)
  - templates/{prd,architecture,epics,story}.md
  - sample/hello-bmad/_bmad-output/ — one fully-formed worked example
    (2-story FastAPI project, valid end-to-end)
  - README.md — kit-level contract
- scripts/test-ingest.sh — pre-flight validation that catches the four
  bug classes before any DB write. Verified against the live orchestrator
  container: passes on the sample, fails (correctly) on a hand-broken tree
  with both missing-section AND symlink bugs in one run.
- docker-compose.yml — replace /home/kaykayyali/_bmad bind (which
  doesn't exist on this server) with ./bmad/_kit. Kit now ships with
  the repo.
- .gitignore — re-include bmad/_kit/ so it travels with the repo while
  keeping the existing 'bmad/ is ephemeral mount content' contract.

Verified end-to-end: 'damascus ingest --project hello-bmad' succeeded
on the live orchestrator, _find_bmad_story resolved both stories.

The 'architecture.md is ingested as a work item' quirk is documented in
docs/adding-a-new-project.md §'Common pitfalls' with a one-liner fix.

Refs: t_5aa80e4b (parallel dashboard work — committed separately)
2026-06-26 06:03:39 +00:00
Hermes
98412abefc test(e2e): P6 entry-points end-to-end merge gate (in-process recovery)
Some checks failed
test / contract-and-unit (pull_request) Failing after 14s
P6 worker hit the 120-iter budget cap twice while finishing the e2e
harness and the verify.sh recipe. The artifacts on disk were correct
and passing — both runs reported 'all 4 phases PASSED' before the
budget ran out — but the worker died before commit/push. Recovered by
running the test suite against merged main (PR #19 landed as 60ec5f6)
and committing the verified artifacts.

What this PR ships:

1. tests/e2e/test_entry_points_e2e.py (668 lines)
   Single Playwright + MCP integration test exercising the full v1
   entry-points surface against the live docker-compose stack:
     Phase 1: ingest_story via MCP server (stdio subprocess) ->
              assert WorkItemResponse.phase == 'spec'
     Phase 2: navigate UI to /#/items, poll for the new row within 5s,
              open the drawer, assert the 4 P5 widgets render non-zero
     Phase 3: drive state.set_phase spec -> build -> review -> merged;
              reload UI after each transition, assert phase pill updates
     Phase 4: open a human_issue via state.open_human_issue; answer it
              via MCP.answer_question; assert status -> 'answered';
              reload drawer, assert the answer shows
   Own cleanup (project='e2e-test' only) so it doesn't collide with
   other tests against the same DB.

2. tests/e2e/conftest.py
   Helpers: state.open_human_issue, state.set_phase, state.get_item
   wrappers that the e2e test uses to drive the cycle directly without
   spinning the orchestrator loop.

3. scripts/verify.sh
   30-second manual smoke: /healthz, /v1/items read, /v1/items?group_by=project
   (P5 backend), /v1/stats, auth 401 path, smoke ingest with token.
   Exits non-zero on any failure.

4. docs/VERIFICATION.md
   One-page recipe: 30s check + full cycle walkthrough. Runnable by
   Kay without agent help.

5. .gitignore
   Add .hermes/evidence/ — e2e screenshots/logs are regenerated by
   the test on every run, no need to ship them.

Live verification (post-merge, against main):
  bash scripts/verify.sh           -> PASSED (7/7 checks green)
  pytest tests/e2e/test_entry_points_e2e.py -q -> 1 passed in 32.24s

Worker self-block reason noted in t_556485a7: 'review-required handoff'
style summary was written before the budget ran out; the work is
complete and verified.
2026-06-25 12:33:32 +00:00
damascus-heartbeat
32de1c540c chore(ui): port fixture to 9111 to avoid colliding with live damascus-api
The v1 e2e suite (npm run test:e2e) hardcoded port 9110 for the
fixture_api.py and VITE_API_BASE_URL. P2's real damascus-api now binds
9110 on the developer host, so reuseExistingServer: true makes the
suite hit the real (empty) API and the tests fail with '0 matching'.
Move the fixture to 9111 by default; CI / clean hosts override with
FIXTURE_API_PORT=9110.

Also adds docs/plans/2026-06-24-p5-damascus-ui-v2.md (the P5 plan
that the worker will execute against), a test:unit script, and the
testing-library devDeps needed by the v2 component tests.
2026-06-25 04:59:10 +00:00
fddc632d45 docs: add original plan + reviewed amendment doc
Some checks failed
test / contract-and-unit (push) Failing after 9s
Adds docs/multi-project-orchestration-plan_1.md (the original design plan,
previously only in Downloads) and docs/multi-project-orchestration-plan_
amendments.md (the six architecture amendments, reviewed and revised against
the post-migration codebase).

Revisions vs the amendment draft:
- SRVG: mechanical rubric checks are the objective gate; the LLM judge is a
  soft, budget-tolerant fallback (not "objective"). Adds a non-code pipeline
  variant.
- Trust threshold / fair-share: unchanged numbers; flags the §7 metrics
  dependency and SKIP LOCKED claim nuance.
- File scope: reframed from "locking granularity" to claim-time scope-overlap
  policy (no lock layer exists; per-item worktrees isolate; rebase_conflict
  is the signal).
- Budget: fixed cap, default N=3 (was 5), kept per-row configurable;
  spec_ambiguous (awaiting_human) does NOT consume the budget.
- Coordination store: reversed the per-node SQLite injection; Postgres stays
  the sole store, with a lighter filesystem/append-table remediation only if
  telemetry volume is demonstrated.
- Wiki fact trigger: added per-project configurable glob list + manual
  override; notes the §11 merge-gate writer is unbuilt.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-23 18:55:37 +00:00