8.1 KiB
Damascus Entry Points v1 — Verification
The P6a verification recipe for v1 of the entry points. Short on purpose so an operator can run it without an agent.
TL;DR (30-second check)
The script covers the full happy path — preflight, MCP handshake, ingest, UI reflection, cycle drive, and cleanup — so a single run takes ~10 seconds against a warm stack:
bash scripts/verify.sh
Exit code is 0 on full success, non-zero on the first failed check.
Re-runs are safe (the script deletes its own rows).
What it checks
| # | Section | Proves |
|---|---|---|
| 1 | preflight | damascus-api is healthy; /healthz and /v1/items respond 200 |
| 2 | stack-up | docker compose up -d db damascus-api damascus-ui-build succeeds; /healthz stays responsive (30s budget for cold starts) |
| 3 | mcp-stdio | python -m damascus.mcp_server answers initialize + tools/list over stdio; server.name == "damascus-mcp"; 7 tools visible |
| 4 | ingest-via-mcp | A story is ingested via tools/call ingest_story; the returned item has phase=spec |
| 5 | ui-shows-it | GET /v1/items returns the new row, phase=spec |
| 6 | drive-cycle | Direct SQL UPDATE walks the row spec → build → review → merged; merged_at is populated; /v1/items/{id} reflects each transition |
| 7 | cleanup | DELETE FROM work_items WHERE project='verify-smoke' removes the row(s) so re-runs stay tidy |
| 8 | summary | Green/red checklist of every section above |
Each section gates the next — the script exits on the first failure and prints which section tripped.
Running the full recipe by hand
If verify.sh flags a regression and you want to walk the same path
yourself, here is the equivalent curl + psql sequence:
# Preflight
curl -fsS http://127.0.0.1:9110/healthz
curl -fsS -o /dev/null -w '%{http_code}\n' http://127.0.0.1:9110/v1/items # expect 200
# Ingest a story (token in /root/.hermes/.env)
TOKEN=$(awk -F= '/^DAMASCUS_API_TOKEN/ {print $2}' /root/.hermes/.env | tr -d '"' | tr -d "'")
INGEST=$(curl -fsS -X POST http://127.0.0.1:9110/v1/items \
-H "Authorization: Bearer ${TOKEN}" \
-H 'Content-Type: application/json' \
-d '{"project":"manual","story_id":"manual-1","title":"Manual recipe","priority":200}')
ITEM_ID=$(echo "$INGEST" | python3 -c "import sys, json; print(json.load(sys.stdin)['item']['id'])")
echo "phase:" $(curl -fsS http://127.0.0.1:9110/v1/items/$ITEM_ID | python3 -c "import sys, json; print(json.load(sys.stdin)['item']['phase'])")
# Drive the cycle via direct SQL (orchestrator worker is bypassed)
for PHASE in build review merged; do
if [ "$PHASE" = "merged" ]; then
docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
-c "UPDATE work_items SET phase='$PHASE', claimed_by=NULL, claimed_at=NULL, merged_at=NOW(), updated_at=NOW() WHERE id='$ITEM_ID'"
else
docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
-c "UPDATE work_items SET phase='$PHASE', claimed_by=NULL, claimed_at=NULL, updated_at=NOW() WHERE id='$ITEM_ID'"
fi
done
# Cleanup
docker exec damascus-orchestrator-db-1 psql -U damascus -d damascus \
-c "DELETE FROM work_items WHERE project='manual'"
What success looks like at each phase
| Phase | UI signal | DB signal |
|---|---|---|
spec (post-ingest) |
Phase chip = spec |
work_items.phase='spec', no merged_at |
build |
Phase chip = build |
work_items.phase='build' |
review |
Phase chip = review |
work_items.phase='review' |
merged |
Phase chip = merged |
work_items.phase='merged', merged_at set |
For the human-issue flow (P6: awaiting_human + answer), see
tests/e2e/test_entry_points_e2e.py::test_phase4_answer_question.
That assertion lives in pytest, not in this bash recipe — verify.sh
covers the merge-gate happy path only.
Why direct SQL for the cycle drive (not state.set_phase)
The orchestrator worker is alive and polling. A state.set_phase call
on a freshly-ingested spec row races the worker's claim loop — the
worker can grab the row mid-transition and start refining it. The
SQL UPDATE bypasses the claim filter (SELECT ... FOR UPDATE SKIP LOCKED) entirely and stamps claimed_by=NULL, so the row matches
the shape of one the cycle produced and the API reflects the change
immediately.
If you want to drive transitions via state.set_phase for debugging,
stop the orchestrator first (docker compose stop orchestrator) and
restart after.
Architecture notes (relevant when verify.sh fails)
- Token source:
DAMASCUS_API_TOKENis read from the shell env, falling back to/root/.hermes/.env(the same sourcedamascus-apireads). The placeholder in the host.envis ignored; the live value lives in the file. Seedamascus-orchestrator-operatorskill pitfall "DAMASCUS_API_TOKEN in host .env is a placeholder." - MCP upstream: the helper launches the MCP process via
docker compose exec damascus-api python -m damascus.mcp_serverwithDAMASCUS_API_BASE=http://damascus-api:9110. Container DNS resolves the upstream; do NOT change it tolocalhostfrom the host perspective. - Idempotency:
ingest_storyis idempotent on(project, story_id).verify.shuses a unique timestampedstory_idper run so the helper's own re-ingest (during a failure-recovery flow) won't collide. damascus-ui-build: a one-shot (restart: "no") that copies the Vite bundle into the nameddamascus_uivolume.docker compose up -don an exited one-shot re-runs it; thecpis idempotent on a populated volume.
Failure modes
- /healthz returns non-ok:
damascus-apifailed to boot. Checkdocker logs damascus-orchestrator-damascus-api-1. Usually meansDAMASCUS_API_TOKENis empty (fail-closed at startup). /v1/itemsreturns 500: the API container is up but cannot reach Postgres. Verify thedbcontainer ishealthy(docker compose ps db).- MCP
initializefails with "no such service": thedamascus-apicontainer is not running. Restart viadocker compose up -d damascus-api. - MCP tools/list returns fewer than 7: MCP server failed to
build its catalog (likely a Python import error). Re-run
docker compose logs damascus-apifor the traceback. - Cycle-drive UPDATE hangs: the
dbcontainer is unreachable or out of disk. Checkdocker compose ps dbanddf -h $(docker volume inspect damascus-orchestrator_dbdata --format '{{ .Mountpoint }}'). - Item not visible in /v1/items after MCP ingest: the
orchestrator worker may have already moved the row past
specbefore section 5 ran. Re-run the script — each run uses a freshstory_id.
Screenshots
UI screenshots are produced by the P6 Playwright spec
(tests/e2e/test_entry_points_e2e.py) and saved to
.hermes/evidence/p6/screenshots/. verify.sh is bash-only by
design — adding Playwright would expand it past the "manual recipe
in <1 minute" budget this page targets.
ADR-005: transient vs structural tests_failed
Added 2026-06-27. The build phase classifies 6 known transient error patterns
(project repo not found at, worktree setup:, Connection refused,
Could not resolve host, TLS handshake timeout, rate limit) and sets
feedback.transient = true for matching errors. The cycle function's
loop-breaker skips those:
- Within 24h of
first_attempted_at: row stays in the same phase, no human_issue, emitsphase.transient_retryevent. Stale-claim window (default 30m) provides natural backoff. - After 24h of persistent transient retries: row escalates to
blocked+ human_issue is opened.
The column work_items.first_attempted_at (TIMESTAMPTZ, nullable) is
set by state.claim_for_* on the first claim for a row. Migration
src/damascus/db/migrations/0007_first_attempted_at.sql adds the column
and backfills it from updated_at for existing rows. Forward-compatible:
nullable + default NULL, so older orchestrator binaries can still read the
table.
Evidence log
Each run of verify.sh writes its full output to
.hermes/evidence/p6a/verify.log when piped via tee:
bash scripts/verify.sh 2>&1 | tee .hermes/evidence/p6a/verify.log
The script prints the absolute log path on success.