Files
damascus-orchestrator/docs/VERIFICATION.md
2026-06-27 16:38:32 +00:00

8.1 KiB

Damascus Entry Points v1 — Verification

The P6a verification recipe for v1 of the entry points. Short on purpose so an operator can run it without an agent.

TL;DR (30-second check)

The script covers the full happy path — preflight, MCP handshake, ingest, UI reflection, cycle drive, and cleanup — so a single run takes ~10 seconds against a warm stack:

bash scripts/verify.sh

Exit code is 0 on full success, non-zero on the first failed check. Re-runs are safe (the script deletes its own rows).

What it checks

# Section Proves
1 preflight damascus-api is healthy; /healthz and /v1/items respond 200
2 stack-up docker compose up -d db damascus-api damascus-ui-build succeeds; /healthz stays responsive (30s budget for cold starts)
3 mcp-stdio python -m damascus.mcp_server answers initialize + tools/list over stdio; server.name == "damascus-mcp"; 7 tools visible
4 ingest-via-mcp A story is ingested via tools/call ingest_story; the returned item has phase=spec
5 ui-shows-it GET /v1/items returns the new row, phase=spec
6 drive-cycle Direct SQL UPDATE walks the row spec → build → review → merged; merged_at is populated; /v1/items/{id} reflects each transition
7 cleanup DELETE FROM work_items WHERE project='verify-smoke' removes the row(s) so re-runs stay tidy
8 summary Green/red checklist of every section above

Each section gates the next — the script exits on the first failure and prints which section tripped.

Running the full recipe by hand

If verify.sh flags a regression and you want to walk the same path yourself, here is the equivalent curl + psql sequence:

# Preflight
curl -fsS http://127.0.0.1:9110/healthz
curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:9110/v1/items   # expect 200

# Ingest a story (token in /root/.hermes/.env)
TOKEN=$(awk -F= '/^DAMASCUS_API_TOKEN/ {print $2}' /root/.hermes/.env | tr -d '"' | tr -d "'")
INGEST=$(curl -fsS -X POST http://127.0.0.1:9110/v1/items \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Content-Type: application/json' \
  -d '{"project":"manual","story_id":"manual-1","title":"Manual recipe","priority":200}')
ITEM_ID=$(echo "$INGEST" | python3 -c "import sys, json; print(json.load(sys.stdin)['item']['id'])")
echo "phase:" $(curl -fsS http://127.0.0.1:9110/v1/items/$ITEM_ID | python3 -c "import sys, json; print(json.load(sys.stdin)['item']['phase'])")

# Drive the cycle via direct SQL (orchestrator worker is bypassed)
for PHASE in build review merged; do
  if [ "$PHASE" = "merged" ]; then
    docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
      -c "UPDATE work_items SET phase='$PHASE', claimed_by=NULL, claimed_at=NULL, merged_at=NOW(), updated_at=NOW() WHERE id='$ITEM_ID'"
  else
    docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
      -c "UPDATE work_items SET phase='$PHASE', claimed_by=NULL, claimed_at=NULL, updated_at=NOW() WHERE id='$ITEM_ID'"
  fi
done

# Cleanup
docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
  -c "DELETE FROM work_items WHERE project='manual'"

What success looks like at each phase

Phase UI signal DB signal
spec (post-ingest) Phase chip = spec work_items.phase='spec', no merged_at
build Phase chip = build work_items.phase='build'
review Phase chip = review work_items.phase='review'
merged Phase chip = merged work_items.phase='merged', merged_at set

For the human-issue flow (P6: awaiting_human + answer), see tests/e2e/test_entry_points_e2e.py::test_phase4_answer_question. That assertion lives in pytest, not in this bash recipe — verify.sh covers the merge-gate happy path only.

Why direct SQL for the cycle drive (not state.set_phase)

The orchestrator worker is alive and polling. A state.set_phase call on a freshly-ingested spec row races the worker's claim loop — the worker can grab the row mid-transition and start refining it. The SQL UPDATE bypasses the claim filter (SELECT ... FOR UPDATE SKIP LOCKED) entirely and stamps claimed_by=NULL, so the row matches the shape of one the cycle produced and the API reflects the change immediately.

If you want to drive transitions via state.set_phase for debugging, stop the orchestrator first (docker compose stop orchestrator) and restart after.

Architecture notes (relevant when verify.sh fails)

  • Token source: DAMASCUS_API_TOKEN is read from the shell env, falling back to /root/.hermes/.env (the same source damascus-api reads). The placeholder in the host .env is ignored; the live value lives in the file. See damascus-orchestrator-operator skill pitfall "DAMASCUS_API_TOKEN in host .env is a placeholder."
  • MCP upstream: the helper launches the MCP process via docker compose exec damascus-api python -m damascus.mcp_server with DAMASCUS_API_BASE=http://damascus-api:9110. Container DNS resolves the upstream; do NOT change it to localhost from the host perspective.
  • Idempotency: ingest_story is idempotent on (project, story_id). verify.sh uses a unique timestamped story_id per run so the helper's own re-ingest (during a failure-recovery flow) won't collide.
  • damascus-ui-build: a one-shot (restart: "no") that copies the Vite bundle into the named damascus_ui volume. docker compose up -d on an exited one-shot re-runs it; the cp is idempotent on a populated volume.

Failure modes

  • /healthz returns non-ok: damascus-api failed to boot. Check docker logs damascus-orchestrator-damascus-api-1. Usually means DAMASCUS_API_TOKEN is empty (fail-closed at startup).
  • /v1/items returns 500: the API container is up but cannot reach Postgres. Verify the db container is healthy (docker compose ps db).
  • MCP initialize fails with "no such service": the damascus-api container is not running. Restart via docker compose up -d damascus-api.
  • MCP tools/list returns fewer than 7: MCP server failed to build its catalog (likely a Python import error). Re-run docker compose logs damascus-api for the traceback.
  • Cycle-drive UPDATE hangs: the db container is unreachable or out of disk. Check docker compose ps db and df -h $(docker volume inspect damascus-orchestrator_dbdata --format '{{ .Mountpoint }}').
  • Item not visible in /v1/items after MCP ingest: the orchestrator worker may have already moved the row past spec before section 5 ran. Re-run the script — each run uses a fresh story_id.

Screenshots

UI screenshots are produced by the P6 Playwright spec (tests/e2e/test_entry_points_e2e.py) and saved to .hermes/evidence/p6/screenshots/. verify.sh is bash-only by design — adding Playwright would expand it past the "manual recipe in <1 minute" budget this page targets.

ADR-005: transient vs structural tests_failed

Added 2026-06-27. The build phase classifies 6 known transient error patterns (project repo not found at, worktree setup:, Connection refused, Could not resolve host, TLS handshake timeout, rate limit) and sets feedback.transient = true for matching errors. The cycle function's loop-breaker skips those:

  • Within 24h of first_attempted_at: row stays in the same phase, no human_issue, emits phase.transient_retry event. Stale-claim window (default 30m) provides natural backoff.
  • After 24h of persistent transient retries: row escalates to blocked + human_issue is opened.

The column work_items.first_attempted_at (TIMESTAMPTZ, nullable) is set by state.claim_for_* on the first claim for a row. Migration src/damascus/db/migrations/0007_first_attempted_at.sql adds the column and backfills it from updated_at for existing rows. Forward-compatible: nullable + default NULL, so older orchestrator binaries can still read the table.

Evidence log

Each run of verify.sh writes its full output to .hermes/evidence/p6a/verify.log when piped via tee:

bash scripts/verify.sh 2>&1 | tee .hermes/evidence/p6a/verify.log

The script prints the absolute log path on success.