P6 worker hit the 120-iter budget cap twice while finishing the e2e
harness and the verify.sh recipe. The artifacts on disk were correct
and passing — both runs reported 'all 4 phases PASSED' before the
budget ran out — but the worker died before commit/push. Recovered by
running the test suite against merged main (PR #19 landed as 60ec5f6)
and committing the verified artifacts.
What this PR ships:
1. tests/e2e/test_entry_points_e2e.py (668 lines)
Single Playwright + MCP integration test exercising the full v1
entry-points surface against the live docker-compose stack:
Phase 1: ingest_story via MCP server (stdio subprocess) ->
assert WorkItemResponse.phase == 'spec'
Phase 2: navigate UI to /#/items, poll for the new row within 5s,
open the drawer, assert the 4 P5 widgets render non-zero
Phase 3: drive state.set_phase spec -> build -> review -> merged;
reload UI after each transition, assert phase pill updates
Phase 4: open a human_issue via state.open_human_issue; answer it
via MCP.answer_question; assert status -> 'answered';
reload drawer, assert the answer shows
Own cleanup (project='e2e-test' only) so it doesn't collide with
other tests against the same DB.
2. tests/e2e/conftest.py
Helpers: state.open_human_issue, state.set_phase, state.get_item
wrappers that the e2e test uses to drive the cycle directly without
spinning the orchestrator loop.
3. scripts/verify.sh
30-second manual smoke: /healthz, /v1/items read, /v1/items?group_by=project
(P5 backend), /v1/stats, auth 401 path, smoke ingest with token.
Exits non-zero on any failure.
4. docs/VERIFICATION.md
One-page recipe: 30s check + full cycle walkthrough. Runnable by
Kay without agent help.
5. .gitignore
Add .hermes/evidence/ — e2e screenshots/logs are regenerated by
the test on every run, no need to ship them.
Live verification (post-merge, against main):
bash scripts/verify.sh -> PASSED (7/7 checks green)
pytest tests/e2e/test_entry_points_e2e.py -q -> 1 passed in 32.24s
Worker self-block reason noted in t_556485a7: 'review-required handoff'
style summary was written before the budget ran out; the work is
complete and verified.
Damascus Orchestrator
A self-hosted work-item state machine that autonomously advances stories through spec → build → review → merge. Designed per multi-project-orchestration-plan_1.md (the design doc this repo implements).
Quick start
# Start the stack (Postgres + Redis + Taskiq worker + scheduler)
docker compose up -d --build
# Apply schema (creates the DB + all tables/types/triggers)
docker compose exec orchestrator damascus init
# Manual one-shot cycle (operators / E2E). The Taskiq worker is the
# automatic trigger — you do not normally need to run this by hand.
docker compose exec orchestrator damascus cycle
# External concurrency view
docker compose exec orchestrator damascus status
# Run the contract + unit suite (needs a Postgres on 127.0.0.1:5432)
pip install -e . pytest psycopg[binary]
pytest tests/contract/ tests/unit/ -v
# Run the E2E suite (needs the docker-compose stack up)
pip install pytest psycopg
pytest tests/ -v
What this repo contains
src/damascus/— the Python package (cycle, phases, state, git_ops, llm, cli, relay, wiki, tasks, config)tests/— the suite (contract + unit + E2E; the executable form of the design doc)schema.sql— Postgres 16 schema (work_items, coordination_gates, human_issues, cost_ledger, events_outbox)docker-compose.yml— the stack (db + redis + orchestrator worker + scheduler + sidecar-status)Dockerfile— the orchestrator image (Python 3.12 + git + claude-code + BMAD + LLM-wiki + ollama binary).gitea/workflows/— CIskills/SKILL.md— operator-facing skill
What this repo does NOT contain
- The wiki (
kaykayyali/damascus-wiki) — separate repo with the contract docs - The test project repos (
wh40k-pc,restitution) — separate repos with their own BMAD stories - The lore entry about this project (
kaykayyali/lore→Infrastructure/Damascus-Orchestrator)
Architecture (one paragraph)
Postgres is the source of truth on work-item state (design §3). Each story row
flows through three loops: spec-refiner (LLM via LiteLLM writes an implementable
spec), code-builder (Claude Code via LiteLLM writes the code in a git worktree,
opens a real Gitea PR), reviewer (re-runs the spec's test command, gates on
objective pass/fail, merges via Gitea API on pass). Atomic claim uses
SELECT ... FOR UPDATE SKIP LOCKED. Taskiq (a BullMQ-equivalent Python queue,
§13) with a Redis broker is the recurring trigger; the worker's --concurrency N
is the global concurrency cap (§10). Every phase transition emits a typed
verdict and an events_outbox row in the same transaction. An attempt budget
guarantees termination — a non-pass verdict that exhausts the budget parks the
item as blocked and opens a human_issue (§5/§16). The human is async — open
questions become human_issues rows, never synchronous blocks.
Full design + contracts in the wiki: kaykayyali/damascus-wiki.
Cross-references
- Design doc:
kaykayyali/damascus-wiki/raw/articles/multi-project-orchestration-plan-1.md - Contracts:
kaykayyali/damascus-wiki/concepts/spec-refiner-contract.md,builder-contract.md,reviewer-contract.md,state-resume-protocol.md - Lore entry:
kaykayyali/lore→Infrastructure/Damascus-Orchestrator - Test session:
kaykayyali/damascus-wiki/queries/damascus-orchestrator/2026-06-23-test-session.md - Spec-refiner gap:
kaykayyali/damascus-wiki/queries/damascus-orchestrator/spec-refiner-gap-2026-06-23.md
License
Internal.