Files
Lore Engine Dev 2367960540 slice 7.1: 50-question reasoning harness test set
Per docs/plan/exec/07-harness.md sub-slice 7.1:

  - tests/harness/questions.yaml — the human-friendly
    YAML source. 50 questions across the 5 design-doc
    types (10 each): identity, time_fact, world_state,
    causal, narrative. Each question pins id, type,
    query, expected_tools, expected_answer_shape, and
    expected_citations. Targets the Mardonari codex
    (the slice 0 fixture) so the harness can run
    end-to-end against the real graph.
  - tests/harness/questions.json — the compiled JSON
    (committed so the runner reads it without rebuilding).
  - scripts/harness/build_questions.py — the strict
    compiler. Validates the YAML schema, counts questions
    per type, enforces uniqueness, writes the JSON.
    Validation errors fail loudly with field paths.
  - tests/harness/test_questions.py — 6 tests pinning the
    contract: schema, 50 total, 10 per type, expected_tools
    non-empty, ids unique, version set.

Track A only (no API key needed). Track B (executing
against the live LLM) is gated on $OLLAMA_API_KEY.

Suite: 761 → 767 (+6).

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-19 20:51:18 -04:00
..