Per docs/plan/exec/07-harness.md sub-slice 7.1:
- tests/harness/questions.yaml — the human-friendly
YAML source. 50 questions across the 5 design-doc
types (10 each): identity, time_fact, world_state,
causal, narrative. Each question pins id, type,
query, expected_tools, expected_answer_shape, and
expected_citations. Targets the Mardonari codex
(the slice 0 fixture) so the harness can run
end-to-end against the real graph.
- tests/harness/questions.json — the compiled JSON
(committed so the runner reads it without rebuilding).
- scripts/harness/build_questions.py — the strict
compiler. Validates the YAML schema, counts questions
per type, enforces uniqueness, writes the JSON.
Validation errors fail loudly with field paths.
- tests/harness/test_questions.py — 6 tests pinning the
contract: schema, 50 total, 10 per type, expected_tools
non-empty, ids unique, version set.
Track A only (no API key needed). Track B (executing
against the live LLM) is gated on $OLLAMA_API_KEY.
Suite: 761 → 767 (+6).
Co-Authored-By: Claude <noreply@anthropic.com>