Adds the performance metrics endpoint and React Query hooks for the dashboard. Backend: - PerformanceResponse / PhaseMetrics / ProjectMetrics in api_schemas.py - GET /v1/performance?days=N returns aggregated metrics from cost_ledger (avg request time, p95, avg tokens, avg cost) and events_outbox (stage progression timing, per-project failure rates) - Verified working: 140 requests / 47 failures (33.6%), spec p95 9409s, build p95 3374s, mindmaps 26.8% failure rate Frontend: - usePerformance() hook with TypeScript interfaces - Ready for widget creation (PerfPhaseTable, PerfStageProgression, PerfFailureRates, PerfTokenSparkline) — pending UI build Build/test infra: - Dockerfile and docker-compose.yml updates for the perf schema
Damascus Orchestrator — E2E Test Suite
This is the executable form of the design doc (multi-project-orchestration-plan_1.md). Every acceptance criterion in the design doc is encoded as a test in this suite. A passing test = a working claim. A failing test = either a bug, a missing feature, or a contract violation.
The contracts
The wiki at /root/damascus-orchestrator/wiki/ is the canonical contract. Every test in this suite maps to an acceptance criterion in one of:
concepts/spec-refiner-contract.md— Loop Aconcepts/builder-contract.md— Loop Bconcepts/reviewer-contract.md— Loop Cconcepts/state-resume-protocol.md— resumabilityentities/damascus-orchestrator.md— system overview
How to run
# Install pytest + psycopg (one time)
pip install pytest psycopg[binary]
# Run the full suite
cd /root/damascus-orchestrator
pytest tests/ -v
# Run just the spec-refiner tests
pytest tests/e2e/test_spec_refiner.py -v
# Run just the contract tests (fast, no cycles)
pytest tests/contract/ -v
# Run with the slow builder tests included
pytest tests/ -v -m ""
What each test file covers
| File | Phase | What it tests |
|---|---|---|
tests/contract/test_contracts_match_source.py |
All | Code matches the contracts (schema, modules, SQL patterns) |
tests/unit/test_typed_verdicts.py |
All | Verdict → phase routing table, loop-breaker math |
tests/e2e/test_spec_refiner.py |
Phase 1 | Loop A: spec claims, writes spec, honors scope, emits events, costs |
tests/e2e/test_builder.py |
Phase 1 | Loop B: worktree, pr_url before review, rebase, files_changed |
tests/e2e/test_reviewer.py |
Phase 1 | Loop C: no_pr guard, validate/assess separation, loop-breaker |
tests/e2e/test_state_resume.py |
Phase 1 | Stale claim recovery, awaiting_human, blocked, idempotency |
tests/e2e/test_concurrency.py |
Phase 2 | Worktree isolation, scope-disjoint dispatch, exclusive paths |
tests/e2e/test_human_channel.py |
Phase 1/2 | human_issues inbox, non-blocking human input |
tests/e2e/test_multiproject_and_wiki.py |
Phase 3-5 | Multi-project, wiki facts, metrics, fair-share |
TDD discipline
This suite was scaffolded as a living spec (all tests at once, mapped 1-to-1 to the design doc). Going forward, follow the TDD discipline:
- Vertical slices, not horizontal. Don't write all tests for phase N, then implement. One test → one fix → next test.
- RED first. Confirm the test fails for the right reason (the contract is violated, not the test is broken).
- GREEN with minimal change. Don't refactor while RED.
- Refactor after GREEN. When all tests for a slice pass, look for cleanup.
State of the suite (as of 2026-06-23)
| Section | Tests | Expected status |
|---|---|---|
| Contract tests | 10 | Mostly green (schema + module structure is in place) |
| Typed verdict unit tests | 8 | Mostly green (verdict table is correct) |
| Spec refiner E2E | 6 | 1 RED (test_spec_refiner_03_honors_declared_file_scope — the known gap) |
| Builder E2E | 5 | Most marked slow; need a real run to confirm |
| Reviewer E2E | 4 | test_reviewer_01_no_pr_defensive_guard should be green; others depend on pipeline |
| State resume E2E | 7 | test_resume_04_blocked_requires_manual_unblock and test_resume_06_already_merged_not_reclaimed should be green; others depend on schema/cycle behavior |
| Concurrency E2E | 6 | 3 xfail (Phase 2 roadmap) |
| Human channel E2E | 4 | Mostly green; some depend on LLM behavior |
| Multi-project + wiki E2E | 11 | All xfail (Phase 3-5 roadmap) |
Total: 61 tests. The xfail count is the build roadmap. Run the suite to see the actual state.
What "passing the suite" means
Phase 1 (single project, single loop, strict serial — design doc §13): all test_spec_refiner.py, test_builder.py, test_reviewer.py, test_state_resume.py tests pass, plus the unit and contract tests.
Phase 2 (concurrency): all test_concurrency.py non-xfail tests pass.
Phase 3-5: all test_multiproject_and_wiki.py xfail tests pass.
A green E2E suite is the proof that the system works. A red E2E suite is the build list.
Adding new tests
When the design doc adds a new claim:
- Add an acceptance criterion to the appropriate contract wiki page
- Add a test in the appropriate
tests/e2e/test_*.pyfile - Reference the wiki section in the test docstring
- Run pytest, confirm RED (the test is missing or the feature isn't built)
- Build the feature, run pytest, confirm GREEN
The audit trail is: design doc → wiki contract → E2E test → implementation. Anyone (human or LLM) can read the chain end-to-end.