Files
Kay Kayyali 78bdee686f
Some checks failed
test / contract-and-unit (push) Failing after 15s
feat(orchestrator): /v1/performance endpoint + dashboard widgets (P7)
Adds the performance metrics endpoint and React Query hooks for the dashboard.

Backend:
- PerformanceResponse / PhaseMetrics / ProjectMetrics in api_schemas.py
- GET /v1/performance?days=N returns aggregated metrics from cost_ledger
  (avg request time, p95, avg tokens, avg cost) and events_outbox
  (stage progression timing, per-project failure rates)
- Verified working: 140 requests / 47 failures (33.6%), spec p95 9409s,
  build p95 3374s, mindmaps 26.8% failure rate

Frontend:
- usePerformance() hook with TypeScript interfaces
- Ready for widget creation (PerfPhaseTable, PerfStageProgression,
  PerfFailureRates, PerfTokenSparkline) — pending UI build

Build/test infra:
- Dockerfile and docker-compose.yml updates for the perf schema
2026-06-27 16:43:11 +00:00
..

Damascus Orchestrator — E2E Test Suite

This is the executable form of the design doc (multi-project-orchestration-plan_1.md). Every acceptance criterion in the design doc is encoded as a test in this suite. A passing test = a working claim. A failing test = either a bug, a missing feature, or a contract violation.

The contracts

The wiki at /root/damascus-orchestrator/wiki/ is the canonical contract. Every test in this suite maps to an acceptance criterion in one of:

  • concepts/spec-refiner-contract.md — Loop A
  • concepts/builder-contract.md — Loop B
  • concepts/reviewer-contract.md — Loop C
  • concepts/state-resume-protocol.md — resumability
  • entities/damascus-orchestrator.md — system overview

How to run

# Install pytest + psycopg (one time)
pip install pytest psycopg[binary]

# Run the full suite
cd /root/damascus-orchestrator
pytest tests/ -v

# Run just the spec-refiner tests
pytest tests/e2e/test_spec_refiner.py -v

# Run just the contract tests (fast, no cycles)
pytest tests/contract/ -v

# Run with the slow builder tests included
pytest tests/ -v -m ""

What each test file covers

File Phase What it tests
tests/contract/test_contracts_match_source.py All Code matches the contracts (schema, modules, SQL patterns)
tests/unit/test_typed_verdicts.py All Verdict → phase routing table, loop-breaker math
tests/e2e/test_spec_refiner.py Phase 1 Loop A: spec claims, writes spec, honors scope, emits events, costs
tests/e2e/test_builder.py Phase 1 Loop B: worktree, pr_url before review, rebase, files_changed
tests/e2e/test_reviewer.py Phase 1 Loop C: no_pr guard, validate/assess separation, loop-breaker
tests/e2e/test_state_resume.py Phase 1 Stale claim recovery, awaiting_human, blocked, idempotency
tests/e2e/test_concurrency.py Phase 2 Worktree isolation, scope-disjoint dispatch, exclusive paths
tests/e2e/test_human_channel.py Phase 1/2 human_issues inbox, non-blocking human input
tests/e2e/test_multiproject_and_wiki.py Phase 3-5 Multi-project, wiki facts, metrics, fair-share

TDD discipline

This suite was scaffolded as a living spec (all tests at once, mapped 1-to-1 to the design doc). Going forward, follow the TDD discipline:

  1. Vertical slices, not horizontal. Don't write all tests for phase N, then implement. One test → one fix → next test.
  2. RED first. Confirm the test fails for the right reason (the contract is violated, not the test is broken).
  3. GREEN with minimal change. Don't refactor while RED.
  4. Refactor after GREEN. When all tests for a slice pass, look for cleanup.

State of the suite (as of 2026-06-23)

Section Tests Expected status
Contract tests 10 Mostly green (schema + module structure is in place)
Typed verdict unit tests 8 Mostly green (verdict table is correct)
Spec refiner E2E 6 1 RED (test_spec_refiner_03_honors_declared_file_scope — the known gap)
Builder E2E 5 Most marked slow; need a real run to confirm
Reviewer E2E 4 test_reviewer_01_no_pr_defensive_guard should be green; others depend on pipeline
State resume E2E 7 test_resume_04_blocked_requires_manual_unblock and test_resume_06_already_merged_not_reclaimed should be green; others depend on schema/cycle behavior
Concurrency E2E 6 3 xfail (Phase 2 roadmap)
Human channel E2E 4 Mostly green; some depend on LLM behavior
Multi-project + wiki E2E 11 All xfail (Phase 3-5 roadmap)

Total: 61 tests. The xfail count is the build roadmap. Run the suite to see the actual state.

What "passing the suite" means

Phase 1 (single project, single loop, strict serial — design doc §13): all test_spec_refiner.py, test_builder.py, test_reviewer.py, test_state_resume.py tests pass, plus the unit and contract tests.

Phase 2 (concurrency): all test_concurrency.py non-xfail tests pass.

Phase 3-5: all test_multiproject_and_wiki.py xfail tests pass.

A green E2E suite is the proof that the system works. A red E2E suite is the build list.

Adding new tests

When the design doc adds a new claim:

  1. Add an acceptance criterion to the appropriate contract wiki page
  2. Add a test in the appropriate tests/e2e/test_*.py file
  3. Reference the wiki section in the test docstring
  4. Run pytest, confirm RED (the test is missing or the feature isn't built)
  5. Build the feature, run pytest, confirm GREEN

The audit trail is: design doc → wiki contract → E2E test → implementation. Anyone (human or LLM) can read the chain end-to-end.