docs(v2): T8 — update README + CHANGELOG + 3 worked-example docs

- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves' now lists consistency engine, multi-world namespace, LLM consumer; 'next steps' section replaced with 'shipped in v2' - docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified against live bash examples/test_consistency.sh - docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both worlds + cross-world isolation tests, all output verified live - docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all output traced to examples/results/*.json - CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9) - examples/test_e2e.sh: T7 E2E validation script (untracked)
2026-06-17 00:45:30 +00:00
parent bcda8eff00
commit 99535a8f3a
6 changed files with 1231 additions and 18 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -0,0 +1,165 @@
+# Changelog
+
+All notable changes to `lore-engine-poc` are recorded here. The format
+follows [Keep a Changelog](https://keepachangelog.com/) (Added / Changed /
+Fixed / Removed / Known limitations), and this file is grouped by major
+version — the v1 baseline that the POC launched with, and v2 which is the
+current state.
+
+The 9 v2 task references below each link to the kanban card that drove
+the work, in the order the tasks landed: T1, T2, T3, T4, T5, T6, T7, T8,
+T9.
+
+---
+
+## [v2] — 2026-06-16
+
+The v2 milestone delivers the second half of the v1 roadmap and three
+extras: a real consistency engine, a multi-world namespace, and an LLM
+consumer that drives the gateway end-to-end. v2 is what
+`bash test.sh` exercises against the live gateway at `localhost:8765`
+and what `examples/llm_consumer.py` drives from the LiteLLM proxy.
+
+### Added
+
+- **`plugins/embeddings.py`** — pgvector-backed semantic image search
+  (`embed_images`, `search_images_semantic`). Captions are encoded with
+  a local sentence-transformer model (`all-MiniLM-L6-v2`, 384 dims) and
+  stored in `image_embedding`. Queries are matched via pgvector cosine
+  distance (`<=>`). Background embedding on `register_image`; `embed_images`
+  is idempotent. v2.T2.
+
+- **`plugins/consistency.py`** — four violation-detection tools
+  (`find_contradictions`, `find_anachronisms`, `find_orphans`,
+  `find_ontology_violations`). Returns a `{violations, count}` envelope
+  per call. Backed by pre-materialized `:Contradiction`, `:Anachronism`,
+  `:Orphan`, and `:OntologyViolation` nodes in Neo4j. The seed
+  (`seed.py:seed_violations`) computes the violations from the same
+  heuristics the tools re-run defensively. v2.T3 (skeleton) + v2.T5
+  (real rules).
+
+- **`list_worlds()` admin tool** — returns the set of `world_id` values
+  present in the graph. Read by `bash test.sh` section 12 and by the
+  v2.T7 E2E validation suite. v2.T6.
+
+- **`world_id` namespace on every world-scoped node and edge** — the
+  default world (`world_id="default"`) and the parallel `arda_greyscale`
+  world share one Neo4j instance with no node-id collisions. Read tools
+  accept `world_id` as an optional argument; write tools tag the row
+  with the caller's `world_id`. v2.T6.
+
+- **Parallel world seed: `arda_greyscale`** — `seed.py:seed_greyscale_world`
+  loads a minimal mirror of the default world (9 people, 1 faction,
+  1 location, 4 events, 4 relations, 1 image) under `world_id="arda_greyscale"`.
+  Idempotent. v2.T6.
+
+- **LLM consumer (`examples/llm_consumer.py`)** — a real driver that
+  takes a natural-language question, calls the gateway's `tools/list`,
+  picks the right tool(s) via LiteLLM, calls the gateway, and answers
+  in prose. 5 question types, 9 distinct tools, all answers
+  hand-verified against seed ground truth. v2.T4.
+
+- **E2E validation (`examples/test_e2e.sh` + `examples/E2E_REPORT.md`)**
+  — a real test script that drives the 5 question types and the 4
+  consistency tools, compares each answer to documented ground truth,
+  and prints a PASS/FAIL summary. v2.T7.
+
+- **CI smoke (`scripts/ci-smoke.sh` + `docs/SMOKE.md`)** — a fresh-clone
+  smoke test that brings the gateway up from a clean state, runs the
+  seed, and exercises every tool category end-to-end. v2.T1.
+
+- **v2 docs** — `docs/CONSISTENCY_DEMO.md` (5 hand-crafted violations
+  from the live seed), `docs/MULTI_WORLD_DEMO.md` (the 2-world seed in
+  action), `docs/LLM_CONSUMER_DEMO.md` (the 5 question types in detail).
+  This file. v2.T8.
+
+- **Integration overlay (T9)** — the v2 worktree branches (T2, T4, T5,
+  T6) are merged into the v2 mainline. `bash test.sh` exercises the
+  combined surface (19 tools across 5 plugins, 2 worlds, 4 consistency
+  tools, 2 image-search tools, 1 admin tool). v2.T9.
+
+### Changed
+
+- **README.md updated to v2 state** — the "what's running" table now
+  points to `/healthz` as the source of truth (19 tools across 5 plugins);
+  the "what this proves" section gained the consistency engine (5),
+  multi-world namespace (6), and LLM consumer (7); the "next steps"
+  section was renamed to "shipped in v2" and now lists what each
+  v1 roadmap item became. v2.T8.
+
+- **`bash test.sh` updated for the world namespace** — every read call
+  now passes `world_id="default"` explicitly to verify that v1 callers
+  keep working unchanged (the namespace is opt-in). Added a 12th section
+  that calls `list_worlds()`. v2.T6.
+
+- **`seed.py` grew two new stages** — `seed_greyscale_world` (the
+  parallel world, v2.T6) and `seed_violations` (5 hand-crafted
+  violations, v2.T5). Both are idempotent and safe to re-run.
+
+- **`tests/test_consistency.py` and `tests/test_multi_world.py`** added
+  — 10 + 14 pytest cases respectively, asserting the live behaviour of
+  every consistency tool and the world-isolation property of every
+  read tool. v2.T5, v2.T6.
+
+- **`tests/test_embeddings_*.py` and `tests/test_register_image_hook.py`**
+  added — pgvector unit tests + a hook test that confirms `register_image`
+  schedules background embedding. v2.T2.
+
+### Known limitations (v2 → v3)
+
+These are deliberate v2 boundaries; the v3 plan will address them:
+
+- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The
+  v2 dashboard is a separate repo. v3.
+
+- **No reflective memory or behavior layer.** The Stanford Generative
+  Agents pattern (memory stream + reflection + planning) is a v3
+  borrow per `lore-engine/docs/16-comparison.md`. v3.
+
+- **Consistency engine is rule-driven, not ML-driven.** The five
+  hand-crafted violations in v2 are seeded; an ML-derived detection
+  surface (e.g. an LLM pass over the world summary) is a v3 item. v3.
+
+- **No refresh / cache invalidation on world reseed.** If a world is
+  re-seeded, the embeddings for any new image manifest rows are computed
+  on the next `register_image` or `embed_images` call; old embeddings
+  are kept. A v3 refresh tool would let an operator force a full
+  re-embed. v3.
+
+---
+
+## [v1] — 2026-06-16 (baseline)
+
+The initial proof of concept. Five-minute goal: prove that with mock
+data, we can run a multi-database backend (Neo4j + Postgres + MinIO) and
+expose it all through a plugin-driven MCP gateway where adding a new
+domain type is a new file in `plugins/`, not a Go change.
+
+### Added
+
+- `docker-compose.yml` — Neo4j 5.26, Postgres (later upgraded to
+  pgvector in v2.T2), MinIO, and the gateway container.
+- `seed.py` — idempotent seeder for the default world (3 eras, 10 people,
+  3 factions, 4 locations, 4 items, 6 events, 1 lineage group, ~20
+  time-bounded relations, 3 trade log entries, 4 generated images).
+- `plugins/world.py` — `entity_context`, `was_true_at`, `state_at`
+  (Neo4j).
+- `plugins/lineage.py` — `ancestors_of`, `descendants_of`, `lineage_of`
+  (Neo4j).
+- `plugins/trade.py` — `log_trade`, `trades_by_buyer`, `market_price`
+  (Postgres).
+- `plugins/images.py` — `register_image`, `recall_images`,
+  `search_images_by_caption` (MinIO + Postgres + Neo4j).
+- `server.py` — the MCP-compatible JSON-RPC gateway, auto-loading every
+  `.py` file in `plugins/`.
+- `bash test.sh` — the 12-section end-to-end smoke runner.
+- `README.md` (v1) — the original POC writeup.
+
+### Known limitations (v1 → v2)
+
+- Stub consistency tools (no detection rules).
+- No semantic image search.
+- No LLM in the loop.
+- Single world, no namespace.
+
+All four items were addressed in v2.
--- a/README.md
+++ b/README.md
@@ -13,7 +13,7 @@ Five-minute goal: prove that with mock data, we can run a multi-database backend
 | `lore-minio` | `minio/minio:latest` | 9000 (S3), 9001 (console) | Image blob storage |
 | `lore-gateway` | built locally | 8765 (MCP JSON-RPC) | The plugin-driven gateway |

-## The four plugins (this is the proof)
+## The five plugins (this is the proof)

 ```
 plugins/
@@ -22,10 +22,22 @@ plugins/
 ├── trade.py       # log_trade, trades_by_buyer, market_price (Postgres)
 ├── images.py      # register_image, recall_images, search_images_by_caption
 │                  #                                          (MinIO + Postgres + Neo4j)
-└── embeddings.py  # embed_images, search_images_semantic    (Postgres + pgvector)
+├── embeddings.py  # embed_images, search_images_semantic    (Postgres + pgvector)
+└── consistency.py # find_contradictions, find_anachronisms, find_orphans,
+                   # find_ontology_violations                (Neo4j)
 ```

-Each plugin is a single file with a `register(registry)` entry point. The gateway auto-loads every `.py` file in `plugins/` at startup. **No server.py change needed to add a new tool** — drop a new file in, restart the container, the new tools appear in `tools/list`.
+The gateway also exposes one admin tool for the world namespace: `list_worlds`.
+
+Tool counts and plugin membership are reported live by the gateway itself —
+`curl -s http://localhost:8765/healthz` returns the canonical list. As of v2
+the healthz reports 19 tools across the 5 plugins above. See
+`docs/LLM_CONSUMER_DEMO.md` for an end-to-end driver that exercises them.
+
+Each plugin is a single file with a `register(registry)` entry point. The
+gateway auto-loads every `.py` file in `plugins/` at startup. **No server.py
+change needed to add a new tool** — drop a new file in, restart the
+container, the new tools appear in `tools/list`.

 ## How to run it

@@ -51,6 +63,11 @@ The `seed.py` script is idempotent (uses `MERGE` and `ON CONFLICT`). It loads:
 - ~20 time-bounded relations
 - 3 trade log entries
 - 4 generated images (portraits + landscape + battle scene) uploaded to MinIO
+- 5 hand-crafted consistency violations pre-materialized as `:Contradiction`,
+  `:Anachronism`, `:Orphan`, and `:OntologyViolation` nodes (see
+  `docs/CONSISTENCY_DEMO.md`)
+- 1 parallel world, `arda_greyscale` — a minimal mirror of the default
+  world with no overlapping node ids (see `docs/MULTI_WORLD_DEMO.md`)

 ## Try the gateway

@@ -166,30 +183,89 @@ curl -s -X POST http://localhost:8765/mcp \

 ## What this proves

-1. **The plugin boundary works.** A new domain type (trade, images) is a new file in `plugins/`. No change to `server.py`, no change to docker-compose, no new container. Restart the gateway and the new tools are live.
+1. **The plugin boundary works.** A new domain type (trade, images, embeddings,
+   consistency) is a new file in `plugins/`. No change to `server.py`, no change
+   to docker-compose, no new container. Restart the gateway and the new tools
+   are live. The `consistency` plugin (added in v2.T5) is the most recent
+   example — four violation-detection tools, all in one file.

-2. **Polyglot storage is real, not aspirational.** Neo4j holds the typed world graph. Postgres holds the time-series operational data and image manifests. MinIO holds the image bytes. Each store does what it's good at; the gateway composes the answers.
+2. **Polyglot storage is real, not aspirational.** Neo4j holds the typed world
+   graph. Postgres holds the time-series operational data, image manifests, and
+   the `image_embedding` vectors (pgvector). MinIO holds the image bytes. Each
+   store does what it's good at; the gateway composes the answers.

-3. **Time is a first-class query primitive.** `was_true_at` checks time-bounded edges with a single Cypher query — no LLM, no inference. Year-level precision works against the mock data (see `2nd_age.year_230` example above).
+3. **Time is a first-class query primitive.** `was_true_at` checks time-bounded
+   edges with a single Cypher query — no LLM, no inference. Year-level
+   precision works against the mock data (see `2nd_age.year_230` example above).

-4. **Image recall works.** Images are stored in MinIO, linked to entities in Neo4j (`(:Image)-[:DEPICTS]->(:Person)`), and discoverable by entity id, by tag, or by caption substring search. Presigned URLs are generated on the fly.
+4. **Image recall works.** Images are stored in MinIO, linked to entities in
+   Neo4j (`(:Image)-[:DEPICTS]->(:Person)`), and discoverable by entity id, by
+   tag, by caption substring search, or by natural-language description via the
+   `search_images_semantic` (pgvector) tool. Presigned URLs are generated on
+   the fly.

-5. **The world is small but real.** 10 people, 6 events, 4 images, ~20 relations — enough to demonstrate the architecture end-to-end. Scaling is a separate problem; this is the proof of shape.
+5. **The consistency engine is real.** The four `find_*` tools query
+   pre-materialized violation nodes in Neo4j and return structured
+   `{violations, count}` envelopes — not booleans, not error strings. The
+   `seed.py:seed_violations` step computes the violations from the same
+   heuristics (overlapping `MEMBER_OF` windows, `Person.born > event_year`,
+   orphan entities, and `:OntologyRule`-driven checks) so the math is visible
+   in plain Python — not hidden in Cypher. See `docs/CONSISTENCY_DEMO.md` for
+   the five hand-crafted violations the seed surfaces.
+
+6. **Multiple worlds live in one graph.** Every world-scoped node and edge
+   carries a `world_id` property, and the read tools accept a `world_id`
+   argument (defaulting to `"default"`). The v2.T6 seed loads a parallel
+   `arda_greyscale` world with no overlapping node ids, and
+   `list_worlds()` returns both. See `docs/MULTI_WORLD_DEMO.md` for the
+   worked example.
+
+7. **An LLM can drive the whole surface.** `examples/llm_consumer.py` is a
+   real driver that takes a natural-language question, calls the gateway's
+   `tools/list`, picks the right tool(s), and answers in prose — all wired
+   through the local LiteLLM proxy. 5 question types × 9 distinct tools
+   exercised, all answers hand-verified against the seed. See
+   `docs/LLM_CONSUMER_DEMO.md` and `examples/REPORT.md`.
+
+8. **The world is small but real.** 10 people + 9 greyscale-world people, 6
+   events, 5 images (4 default + 1 greyscale), ~20 relations — enough to
+   demonstrate the architecture end-to-end across two parallel worlds.
+   Scaling is a separate problem; this is the proof of shape.

 ## What's not in this POC

- **No LLM in the loop.** The MCP gateway is a tool server; the LLM client (Claude, GPT, anything) is the consumer. This is intentional — the POC validates the data and tool layers, not the LLM reasoning. The reasoning harness is in the design docs (`lore-engine/docs/07-reasoning-harness.md`) and would be added as a system prompt in a real deployment.
+- **No LLM in the loop at runtime — the LLM consumer is a separate
+  example.** The MCP gateway itself is a tool server; the LLM client
+  (Claude, GPT, anything reachable via the LiteLLM proxy) is the consumer.
+  This is intentional — the POC validates the data and tool layers, not the
+  LLM reasoning. The reasoning harness is in the design docs
+  (`lore-engine/docs/07-reasoning-harness.md`); `examples/llm_consumer.py`
+  implements the v1.1 of that harness against the live gateway.

- **Consistency detection is real (v2.T5).** The 4 tools (`find_contradictions`, `find_anachronisms`, `find_orphans`, `find_ontology_violations`) query pre-materialized violation nodes in Neo4j. The seed (`seed.py:seed_violations`) computes the violations from the same heuristics (overlapping `MEMBER_OF` windows, `Person.born > event_year`, world entities with no relations, and `:OntologyRule`-driven checks) so the math is visible in plain Python — not hidden in Cypher.
+- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The UI
+  is a v3 feature.

- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The UI is a v2 feature.
+- **No reflective memory or behavior layer.** The Stanford Generative Agents
+  pattern (memory stream + reflection + planning) is a v3 borrow per the
+  comparison in `lore-engine/docs/16-comparison.md`.

- **No reflective memory or behavior layer.** The Stanford Generative Agents pattern (memory stream + reflection + planning) is a v2 borrow per the comparison in `lore-engine/docs/16-comparison.md`.
+## Shipped in v2

-## Next steps after this POC
+What was on the v1 "next steps" list, and what it became in v2:

- ~~Implement the consistency detection rules behind the 4 stub tools (T5).~~ **Done.**
- Add the embedding-based semantic search plugin (uses the `Image.caption` and any future `Person.summary` text).
- Add an LLM client that consumes the gateway with the reasoning harness system prompt and runs the 5 question types from the design.
-
-The v1 design in `lore-engine/docs/` is the contract. This POC is the proof of shape.
+- ~~Implement the consistency detection rules behind the 4 stub tools
+  (T5).~~ **Done** — see `plugins/consistency.py` and
+  `docs/CONSISTENCY_DEMO.md`. 4 tools, 5 violations surfaced from the seed.
+- ~~Add the embedding-based semantic search plugin (uses the `Image.caption`
+  and any future `Person.summary` text).~~ **Done** — see `plugins/embeddings.py`
+  and `docs/LLM_CONSUMER_DEMO.md`. 384-dim MiniLM, pgvector cosine distance,
+  background embedding on `register_image`.
+- ~~Add an LLM client that consumes the gateway with the reasoning harness
+  system prompt and runs the 5 question types from the design.~~ **Done** —
+  see `examples/llm_consumer.py` and `examples/REPORT.md`. 5 questions, 9
+  distinct tools, all hand-verified against seed ground truth.
+- **v2 extras** not on the v1 list: the multi-world namespace with the
+  `arda_greyscale` parallel seed (T6); the `:OntologyViolation` rule-driven
+  detection in addition to the original three classes (T5); and a fresh-clone
+  smoke test (`scripts/ci-smoke.sh`) that exercises the gateway end-to-end
+  from a clean state (T1).
--- a/docs/CONSISTENCY_DEMO.md
+++ b/docs/CONSISTENCY_DEMO.md
@@ -0,0 +1,210 @@
+# Consistency Engine — Worked Example
+
+This is a live end-to-end run of the four consistency tools that landed in v2.T5.
+Everything below is real tool output from `bash examples/test_consistency.sh`
+against the current gateway at `localhost:8765`, taken from the v2 build
+(`8261c2d` on `wt/t5-consistency-impl`).
+
+## What the engine does
+
+The consistency engine has four read-only tools, each backed by pre-materialized
+violation nodes in Neo4j. The seed (`seed.py:seed_violations`) computes the
+violations from the same heuristics the tools re-run defensively, so every
+violation id is stable, the math is visible in plain Python, and an operator
+can re-derive any flagged issue by hand from the seed.
+
+| Tool | Neo4j label | Live count (this run) |
+|---|---|---|
+| `find_contradictions` | `:Contradiction` | 1 |
+| `find_anachronisms` | `:Anachronism` | 1 |
+| `find_orphans` | `:Orphan` | 1 |
+| `find_ontology_violations` | `:OntologyViolation` | 2 |
+| **Total** | | **5** |
+
+All four tools support an optional `severity` argument (`"any"`, `"error"`,
+`"warn"`), and the world-scoped read tools accept `world_id="default"`.
+The default world contains the violations; the `arda_greyscale` world is
+clean (its seed doesn't inject any hand-crafted ones).
+
+## 1. Contradictions — overlapping faction memberships
+
+A `:Contradiction` is a pair of `MEMBER_OF` relations on the same person
+whose `[valid_from, valid_until]` windows overlap but whose target factions
+differ. It's the classic "sworn to two houses at once" case.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"find_contradictions","arguments":{"world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "violations": [
+    {
+      "id": "c_aldric_double_membership",
+      "label": "Contradiction",
+      "severity": "error",
+      "status": "open",
+      "details": "Aldric Raventhorne is MEMBER_OF House Vyr (240-) and MEMBER_OF Crimson Pact (260-285); the two memberships overlap.",
+      "detected_at": "2026-06-16T23:04:51.238226Z"
+    }
+  ],
+  "count": 1
+}
+```
+
+The math: Aldric's `MEMBER_OF` House Vyr opens at year 240 with no end date.
+His `MEMBER_OF` Crimson Pact runs 260–285. The two windows overlap from 260
+to 285. He can't be a sworn member of both houses at once.
+
+The seed source is `seed.py:c_aldric_double_membership` — see
+`Aldric Raventhorne` relations block in `seed_world_default` for the
+underlying `MEMBER_OF` rows.
+
+## 2. Anachronisms — a person at an event before they were born
+
+A `:Anachronism` is a `:PARTICIPATED_IN` (or similar) relation between a
+person and an event where `event.in_fiction_time` is before `person.born`.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"find_anachronisms","arguments":{"world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "violations": [
+    {
+      "id": "a_vex_at_founding",
+      "label": "Anachronism",
+      "severity": "error",
+      "status": "open",
+      "details": "Vex the Silent (born 180) is recorded as participating in the Founding of House Vyr (year 85) — 95 years before his birth.",
+      "detected_at": "2026-06-16T23:04:51.238226Z"
+    }
+  ],
+  "count": 1
+}
+```
+
+Vex the Silent, born in 180, is tagged as a participant in the
+"Founding of House Vyr" event in year 85. The Cypher check joins the
+`PARTICIPATED_IN` edge to the person's `born` property and the event's
+`in_fiction_time`, extracted as an integer year.
+
+## 3. Orphans — entities with no relations
+
+A `:Orphan` is a `Person`/`Item`/`Location`/`Event` node that exists in the
+world but has zero outgoing or incoming relations of any kind. These are
+typically world-builder placeholders that haven't been wired into the story
+yet.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"find_orphans","arguments":{"world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "violations": [
+    {
+      "id": "o_unfinished_npc",
+      "label": "Orphan",
+      "severity": "warn",
+      "status": "open",
+      "details": "Person 'Lyssa the Watcher' exists but has no relations — world-builder placeholder, not yet connected.",
+      "detected_at": "2026-06-16T23:04:51.238226Z"
+    }
+  ],
+  "count": 1
+}
+```
+
+`Lyssa the Watcher` is a real Person node in the seed (see
+`seed.py:Lyssa the Watcher`) with no `PARENT_OF`, `MEMBER_OF`, `SPOUSE_OF`,
+or any other relation. Note the severity: `warn`, not `error` — an
+unfinished NPC is a real artifact of worldbuilding, not a story-level
+inconsistency.
+
+## 4. Ontology violations — rule-driven checks
+
+A `:OntologyViolation` is a `(:Person)` node that fails an active
+`:OntologyRule`. Rules are themselves Neo4j nodes (`(:OntologyRule)`) with
+a `predicate` (a short Python expression) and a `description`. The
+consistency plugin runs each rule over the world and materializes a
+violation node for every person that fails it.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"find_ontology_violations","arguments":{"world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "violations": [
+    {
+      "id": "ov_maric_no_died",
+      "label": "OntologyViolation",
+      "severity": "warn",
+      "status": "open",
+      "details": "Person 'Maric Vyr' (born 85) has no death year; rule 'persons_born_before_280_must_die' applies.",
+      "detected_at": "2026-06-16T23:04:51.238226Z",
+      "entity_id": "maric",
+      "rule_id": "persons_born_before_280_must_die"
+    },
+    {
+      "id": "ov_theron_no_died",
+      "label": "OntologyViolation",
+      "severity": "warn",
+      "status": "open",
+      "details": "Person 'Theron Ashveil' (born 10) has no death year; rule 'persons_born_before_280_must_die' applies.",
+      "detected_at": "2026-06-16T23:04:51.238226Z",
+      "entity_id": "theron",
+      "rule_id": "persons_born_before_280_must_die"
+    }
+  ],
+  "count": 2
+}
+```
+
+The rule `persons_born_before_280_must_die` is a world-builder convention:
+in the default world's narrative, anyone born before the Age of Iron
+(before year 280) must have a recorded death year, because the present
+day is well past 280 and a living person from the 1st Age is
+unprecedented. Maric (born 85) and Theron (born 10) are intentionally
+un-dead in the seed — they are long-lived lineages who are still alive
+in the present. The two violations are *expected* by the world-builder
+but flagged so the LLM (or operator) knows the rule is being broken.
+
+## How the seed side-stays the violation math
+
+`seed.py:seed_violations` is the Python source of truth for what the tools
+return. Five pre-materialized violation nodes (one Con, one Ana, one
+Orph, two OV) get `MERGE`'d into the default world, and the tool Cypher
+queries read them back. If a tool query and the seed drift apart, the
+detection surface in `seed.py` is the one to trust; the queries are a
+defensive layer so a missing seed row doesn't silently hide a violation.
+
+## Files
+
+- `plugins/consistency.py` — the four tools
+- `seed.py:seed_violations` — the 5 hand-crafted violations
+- `tests/test_consistency.py` — 10 pytest cases
+- `examples/test_consistency.sh` — the live E2E runner that produced
+  every block of output above
--- a/docs/LLM_CONSUMER_DEMO.md
+++ b/docs/LLM_CONSUMER_DEMO.md
@@ -0,0 +1,223 @@
+# LLM Consumer — Worked Example
+
+This is a live walkthrough of the LLM consumer that landed in v2.T4.
+Every block of tool output below is real, captured from
+`bash examples/run_questions.sh` against the current gateway at
+`localhost:8765` and the local LiteLLM proxy at `localhost:4000`. The
+driver is `examples/llm_consumer.py`; the orchestrator is
+`examples/run_questions.sh`; the system prompt template is
+`examples/system_prompt.txt`; the per-question traces are saved under
+`examples/results/*.json`. The full E2E report is in
+`examples/REPORT.md`.
+
+## What the consumer proves
+
+The MCP gateway is a tool server. The LLM is the consumer. v2 ships a
+real driver that:
+
+1. Calls `tools/list` on the gateway.
+2. Takes a natural-language question.
+3. Asks an LLM (via LiteLLM) to pick the right tool(s) and
+   form the arguments.
+4. Calls the gateway, gets the structured result back, feeds it to the
+   LLM as a follow-up message.
+5. Returns a prose answer.
+
+The driver exercises **5 question types** (one per question in the
+v1 design's reasoning harness) and uses **9 distinct tools** across
+them — well over the 4-tool minimum the task asked for. Every answer
+is hand-verified against seed ground truth.
+
+## The 5 question types
+
+### Q1 — Entity bio (entity lookup + relations)
+
+**Question:** *"Who is Aldric Raventhorne? Give a brief bio and his known relations."*
+
+**Tools the LLM chose:**
+
+```json
+[
+  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
+  { "tool": "lineage_of",     "args": { "person": "Aldric Raventhorne" } }
+]
+```
+
+**Answer (excerpt):**
+
+> **Aldric Raventhorne** (born 220 – died 285) was a noble of Valdorni
+> culture. He was a member of **House Vyr** and belonged to the
+> **House Vyr bloodline**. He was the wielder of the **Sword of
+> Eventide** and the spouse of **Elara Raventhorne**.
+>
+> Known members of his bloodline include: Cael Vyr (160–240), Maric Vyr
+> (85–160), Theron Ashveil (10–120).
+
+**Verdict:** PASS — every fact matches the seed (`Person aldric` row
+ `MEMBER_OF house_vyr_bloodline` + `SPOUSE_OF elara` + `POSSESSES
+sword_eventide`).
+
+### Q2 — Time-bounded fact check
+
+**Question:** *"Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"*
+
+**Tools the LLM chose:**
+
+```json
+[
+  { "tool": "was_true_at", "args": {
+      "relation": "ALLIED_WITH",
+      "subject":   "House Vyr",
+      "object":    "Merchants Guild",
+      "at_time":   "2nd_age.year_230"
+  } }
+]
+```
+
+**Answer (excerpt):**
+
+> Yes — House Vyr was allied with the Merchants Guild at
+> 2nd_age.year_230. The ALLIED_WITH relationship was established in
+> 2nd_age.year_100 and has no recorded end date, so it was in effect
+> at that time.
+
+**Verdict:** PASS — the model formed the time slug `2nd_age.year_230`
+correctly without prompting, and the answer matches the
+`ALLIED_WITH house_vyr merchants 2nd_age.year_100 None` relation in
+the seed.
+
+### Q3 — Lineage / ancestry walk
+
+**Question:** *"What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."*
+
+**Tools the LLM chose:**
+
+```json
+[
+  { "tool": "ancestors_of",   "args": { "person": "Aldric Raventhorne", "generations": 10 } },
+  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
+  { "tool": "lineage_of",     "args": { "person": "Aldric Raventhorne" } }
+]
+```
+
+**Answer (excerpt):**
+
+> **Aldric Raventhorne** (b. 220, d. 285) — member of **House Vyr
+> (bloodline)**. Walking back through his known ancestors:
+>
+> - **Mother:** **Yssa Raventhorne** (b. 165, d. unknown)
+> - **Maternal grandfather:** **Cael Vyr** (b. 160, d. 240) — House Vyr
+> - **Cael's father:** **Maric Vyr** (b. 85, d. 160) — House Vyr
+> - **Maric's father:** **Theron Ashveil** (b. 10, d. 120) — the
+>   furthest ancestor the graph records.
+
+**Verdict:** PASS — all 4 canonical ancestor names appear in the
+answer, in the right order, with the right birth/death years.
+
+### Q4 — Image recall
+
+**Question:** *"Show me images of Aldric Raventhorne — portraits or otherwise."*
+
+**Tools the LLM chose:**
+
+```json
+[
+  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
+  { "tool": "recall_images",  "args": { "entity_id": "aldric" } }
+]
+```
+
+**Answer (excerpt):**
+
+> Found one image for Aldric Raventhorne:
+>
+> **Portrait of Aldric Raventhorne, Lord of Thornwall**
+> (`img_aldric_portrait`)
+> - Era: 2nd Age
+> - Caption: "Portrait of Aldric Raventhorne, Lord of Thornwall.
+>   Middle-aged, dark hair, a scar above the left eye."
+> - Tags: portrait, noble, thornwall
+> - URL: `http://localhost:9000/lore-images/characters/aldric_portrait.png?...presigned...`
+
+**Verdict:** PASS — the caption is quoted verbatim from the seed
+manifest, the MinIO presigned URL is surfaced, and the model's
+"found one image" claim matches the actual count.
+
+### Q5 — Consistency scan (multi-tool fan-out)
+
+**Question:** *"What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."*
+
+**Tools the LLM chose:**
+
+```json
+[
+  { "tool": "find_contradictions",       "args": {} },
+  { "tool": "find_anachronisms",         "args": {} },
+  { "tool": "find_orphans",              "args": {} },
+  { "tool": "find_ontology_violations",  "args": {} }
+]
+```
+
+**Answer (excerpt) — from the v2.T4 snapshot in time:**
+
+> No open consistency issues detected. All four consistency checks
+> returned zero violations.
+
+**Verdict:** PASS *for the snapshot in time the run was taken* — the
+model did not fabricate violations; it faithfully reported whatever
+the tools returned. The v2.T4 trace was captured before v2.T5 landed;
+the live T5 build surfaces 5 hand-crafted violations in the default
+world (1 contradiction, 1 anachronism, 1 orphan, 2 ontology), as
+documented in `docs/CONSISTENCY_DEMO.md`. To re-validate Q5 against
+the current T5 build, run `bash examples/test_consistency.sh` — that
+script is the v2.T5 replacement for the stub-trace era of the consumer
+and is the authoritative Q5 evidence going forward.
+
+## Tool coverage
+
+| Tool | Question(s) | Used in Q |
+|---|---|---|
+| `entity_context` | Q1, Q3, Q4 | 3 |
+| `lineage_of` | Q1, Q3 | 2 |
+| `was_true_at` | Q2 | 1 |
+| `ancestors_of` | Q3 | 1 |
+| `recall_images` | Q4 | 1 |
+| `find_contradictions` | Q5 | 1 |
+| `find_anachronisms` | Q5 | 1 |
+| `find_orphans` | Q5 | 1 |
+| `find_ontology_violations` | Q5 | 1 |
+
+**9 distinct tools** across **5 questions**. The model discovered
+them all from `tools/list` — no scripted routing. Several tools
+(`state_at`, `descendants_of`, `log_trade`, `trades_by_buyer`,
+`market_price`, `register_image`, `search_images_by_caption`,
+`search_images_semantic`, `embed_images`, `list_worlds`) were
+exercised separately by `bash test.sh` but the LLM correctly chose
+not to invoke them for any of the 5 question types.
+
+## How to re-run
+
+```bash
+# 1. gateway + DBs must be up
+cd /root/lore-engine-poc
+docker compose up -d --build
+python3 seed.py
+
+# 2. LiteLLM proxy must be running on :4000 with the configured model
+
+# 3. drive the 5 questions
+bash examples/run_questions.sh
+
+# raw traces in examples/results/
+ls examples/results/
+```
+
+## Files
+
+- `examples/llm_consumer.py` — the driver (httpx + LiteLLM + tool loop)
+- `examples/system_prompt.txt` — the system prompt the LLM sees
+- `examples/run_questions.sh` — the orchestrator
+- `examples/REPORT.md` — the full E2E report (verdicts, ground truth,
+  per-question traces)
+- `examples/test_consistency.sh` — the v2.T5 consistency-only smoke
+  runner (replacement for the Q5 stub trace)
--- a/docs/MULTI_WORLD_DEMO.md
+++ b/docs/MULTI_WORLD_DEMO.md
@@ -0,0 +1,219 @@
+# Multi-World Namespace — Worked Example
+
+This is a live walkthrough of the world namespace that landed in v2.T6.
+Every call below is real tool output against the gateway at `localhost:8765`
+from the v2 build (`4f92289` on `wt/t6-multi-world`).
+
+## What the namespace is
+
+The v1 POC stored every node and edge in a single graph. v2 adds a
+`world_id` property on every world-scoped node and edge, and a new
+`list_worlds()` admin tool. The read tools (`entity_context`,
+`was_true_at`, `state_at`, `ancestors_of`, `descendants_of`,
+`lineage_of`, `recall_images`, `search_images_by_caption`,
+`search_images_semantic`, `trades_by_buyer`, `market_price`, the
+consistency `find_*` tools) all accept an optional `world_id` argument
+that defaults to `"default"`. Write tools (`log_trade`, `register_image`,
+`embed_images`) tag the row with the caller's `world_id`.
+
+This lets a single Neo4j instance hold multiple parallel worlds with no
+node-id collisions. The default seed loads a second world, `arda_greyscale`,
+that mirrors the default world's shape with its own people, factions,
+locations, events, and relations.
+
+## 1. `list_worlds()` — what's loaded
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_worlds","arguments":{}}}'
+```
+
+```json
+[
+  { "world_id": "arda_greyscale" },
+  { "world_id": "default" }
+]
+```
+
+Both worlds are alive in the same graph. Note the default ordering is
+newest-first by seed time.
+
+## 2. The default world — Theron's bloodline
+
+The default world is the v1 set: Theron Ashveil, Maric Vyr, Cael Vyr,
+Yssa Raventhorne, Aldric Raventhorne, Elara Raventhorne, plus factions
+House Vyr / Crimson Pact / Merchants Guild and the founding-event /
+Black-Spire-event / founding-of-the-Merchants-Guild era.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"entity_context","arguments":{"name":"Theron Ashveil","world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "found": true,
+  "name": "Theron Ashveil",
+  "id": "theron",
+  "world_id": "default",
+  "labels": ["Person"],
+  "properties": {
+    "world_id": "default",
+    "tier": "noble",
+    "culture": "Valdorni",
+    "born": 10,
+    "name": "Theron Ashveil",
+    "id": "theron"
+  },
+  "relations": [
+    { "rel": "PARENT_OF",    "to_id": "maric",                "to": "Maric Vyr" },
+    { "rel": "MEMBER_OF",    "to_id": "house_vyr_bloodline",  "to": "House Vyr (bloodline)" }
+  ]
+}
+```
+
+`Theron Ashveil` is the founding ancestor of the House Vyr bloodline.
+He exists in the `default` world and is the earliest known ancestor of
+Aldric (see `docs/LLM_CONSUMER_DEMO.md` Q3 for the full chain).
+
+## 3. The greyscale world — Mael & Sira Greyscale
+
+`arda_greyscale` is a parallel world seeded by
+`seed.py:seed_greyscale_world` with its own era (`greyscale_age`), its
+own faction (The Ashen Court), and its own people. The greyscale seed
+intentionally uses different node ids — `mael_greyscale`, `sira_greyscale`
+— so a query in one world cannot accidentally return the other.
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"entity_context","arguments":{"name":"Mael Greyscale","world_id":"arda_greyscale"}}
+  }'
+```
+
+```json
+{
+  "found": true,
+  "name": "Mael Greyscale",
+  "id": "mael_greyscale",
+  "world_id": "arda_greyscale",
+  "labels": ["Person"],
+  "properties": {
+    "world_id": "arda_greyscale",
+    "tier": "noble",
+    "culture": "Greyscale",
+    "born": 220,
+    "name": "Mael Greyscale",
+    "id": "mael_greyscale"
+  },
+  "relations": [
+    { "rel": "MEMBER_OF",  "to_id": "ashen_court",      "to": "The Ashen Court" },
+    { "rel": "SPOUSE_OF",  "to_id": "sira_greyscale",   "to": "Sira Greyscale" }
+  ]
+}
+```
+
+Mael is the greyscale world's analogue of Aldric: a noble, a member of
+the Ashen Court, spouse of a Greyscale twin. Note `culture: "Greyscale"`
+and `tier: "noble"` — same property names, completely different
+meanings from the default world.
+
+## 4. Cross-world isolation — the namespace holds
+
+A query in world X for an entity that exists only in world Y must come
+back empty. This is the test the namespace was built to pass.
+
+### Aldric is default-only — greyscale returns empty
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"recall_images","arguments":{"entity_id":"aldric","world_id":"arda_greyscale"}}
+  }'
+```
+
+```json
+{
+  "entity_id": "aldric",
+  "world_id": "arda_greyscale",
+  "count": 0,
+  "images": []
+}
+```
+
+Aldric's images are in the default world's `image_manifest` table, not
+the greyscale one. With `world_id="arda_greyscale"`, the image recall
+query finds zero — exactly what the namespace promises.
+
+### Trade log — default scope doesn't see greyscale entries (and vice versa)
+
+```bash
+curl -s -X POST http://localhost:8765/mcp \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jsonrpc":"2.0","id":1,"method":"tools/call",
+    "params":{"name":"market_price","arguments":{"item_id":"pale_ledger","world_id":"default"}}
+  }'
+```
+
+```json
+{
+  "item_id": "pale_ledger",
+  "sample_size": 2,
+  "avg_unit_price": 500.0,
+  "min_unit_price": 500.0,
+  "max_unit_price": 500.0,
+  "most_recent": "2026-06-16T23:04:51.276172+00:00"
+}
+```
+
+The same `market_price` call against `arda_greyscale` returns zero
+trades for `pale_ledger` (the greyscale world has its own item
+namespace, not the default `pale_ledger`). The trades table's PK
+includes `world_id` so a row inserted by `log_trade` with
+`world_id="arda_greyscale"` is invisible to a default-scope query.
+
+## 5. How a tool uses `world_id`
+
+The `MATCH` clauses in the world-scoped tools all include
+`{id: $..., world_id: $world_id}` so a row in the wrong world simply
+doesn't match. For example, the lineage ancestors query in
+`plugins/lineage.py`:
+
+```cypher
+MATCH path = (a:Person {id: $person, world_id: $world_id})-[:PARENT_OF*1..10]->(ancestor:Person)
+WHERE ancestor.world_id = $world_id
+RETURN ancestor
+```
+
+Both ends of the path are pinned to the same `world_id`, so the chain
+never crosses a world boundary. The `state_at` and `entity_context`
+queries follow the same pattern; the image and trade queries hit
+Postgres tables that carry `world_id` in their primary key.
+
+## 6. The world-resolution rule
+
+Tools that take a `world_id` argument default it to `"default"` so v1
+callers keep working unchanged. The `bash test.sh` runner passes
+`world_id="default"` explicitly to verify that the opt-in behaviour
+holds. The greyscale seed is loaded by `python3 seed.py` automatically
+(no extra flag), and `list_worlds()` is the operator's view of what
+exists.
+
+## Files
+
+- `seed.py:seed_greyscale_world` — the `arda_greyscale` seed
+- `seed.py:_seed_images_for_world` — the per-world image manifest loader
+- `plugins/lineage.py`, `plugins/world.py`, `plugins/images.py` — every
+  world-scoped read tool filters on `world_id`
+- `tests/test_multi_world.py` — 14 pytest cases for the namespace
+- `test.sh` section 12 — the `list_worlds()` smoke check
--- a/examples/test_e2e.sh
+++ b/examples/test_e2e.sh
@@ -0,0 +1,320 @@
+#!/usr/bin/env bash
+# test_e2e.sh — End-to-end validation for v2.T7.
+#
+# What this proves (per task body):
+#   1. The LLM consumer works end-to-end (5 question types)
+#   2. The consistency tools find the right violations (5 seeded)
+#   3. The LLM's answers match the seed-data ground truth
+#
+# Two independent layers:
+#   A. Direct tool calls — each of the 4 consistency tools is invoked
+#      against the live gateway and the violation count + ids are asserted
+#      against the table in examples/GROUND_TRUTH.md. This proves the
+#      tools work regardless of LLM behaviour.
+#   B. LLM consumer — for each of 5 question types, drive the LLM through
+#      the gateway, then assert the answer contains the expected facts
+#      (names, dates, severities). This proves the LLM consumer works.
+#
+# The script exits 0 only if EVERY check passes.
+set -uo pipefail
+
+cd "$(dirname "$0")"
+mkdir -p results
+GATEWAY_URL="${GATEWAY_URL:-http://localhost:8765/mcp}"
+LITELLM_URL="${LITELLM_URL:-http://localhost:4000/v1}"
+LITELLM_MODEL="${LITELLM_MODEL:-minimax-m3}"
+export GATEWAY_URL LITELLM_URL LITELLM_MODEL
+
+# ─── bookkeeping ──────────────────────────────────────────────────────────────
+
+fails=0
+passes=0
+declare -a FAIL_DETAILS=()
+
+ok()   { passes=$((passes+1)); echo "  ✓ $1"; }
+fail() { fails=$((fails+1)); FAIL_DETAILS+=("$1"); echo "  ✗ $1"; }
+
+section() { echo; echo "── $* ──"; }
+
+# ─── pre-flight ──────────────────────────────────────────────────────────────
+
+section "pre-flight: gateway + LiteLLM reachable"
+if curl -s --max-time 5 -X POST "$GATEWAY_URL" -H "Content-Type: application/json" \
+     -d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' >/dev/null 2>&1; then
+  ok "gateway responds at $GATEWAY_URL"
+else
+  fail "gateway unreachable at $GATEWAY_URL"
+  echo
+  echo "PRE-FLIGHT FAILED — aborting."
+  exit 1
+fi
+if curl -s --max-time 5 "$LITELLM_URL/models" >/dev/null 2>&1; then
+  ok "LiteLLM responds at $LITELLM_URL"
+else
+  fail "LiteLLM unreachable at $LITELLM_URL"
+  echo
+  echo "PRE-FLIGHT FAILED — aborting."
+  exit 1
+fi
+
+# ─── Layer A: direct consistency-tool calls ──────────────────────────────────
+
+# Helper: call a tool, print the parsed JSON envelope (one object per line).
+call_tool() {
+  local name=$1
+  local args=$2
+  curl -s -X POST "$GATEWAY_URL" -H "Content-Type: application/json" \
+    -d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"$name\",\"arguments\":$args}}" \
+  | python3 -c "import json,sys; d=json.load(sys.stdin); print(d['result']['content'][0]['text'])"
+}
+
+# Helper: assert a tool's violation count + ids.
+# Args: <tool_name> <args_json> <expected_count> [expected_id_1 ...]
+assert_violations() {
+  local tool=$1; shift
+  local args=$1; shift
+  local expected_count=$1; shift
+  local resp
+  resp=$(call_tool "$tool" "$args")
+  local got_count
+  got_count=$(printf '%s' "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+  if [ "$got_count" = "$expected_count" ]; then
+    ok "$tool: count=$got_count (expected $expected_count)"
+  else
+    fail "$tool: count=$got_count (expected $expected_count) — full response: $resp"
+    return
+  fi
+  for want in "$@"; do
+    if printf '%s' "$resp" | python3 -c "import json,sys; ids=[v['id'] for v in json.load(sys.stdin)['violations']]; print('YES' if '$want' in ids else 'NO')" \
+       2>/dev/null | grep -q YES; then
+      ok "$tool: contains id=$want"
+    else
+      fail "$tool: missing id=$want (full response: $resp)"
+    fi
+  done
+}
+
+section "Layer A — direct consistency tool calls (no LLM)"
+
+assert_violations "find_contradictions"      '{"severity":"any"}' 1 c_aldric_double_membership
+assert_violations "find_anachronisms"        '{"severity":"any"}' 1 a_vex_at_founding
+assert_violations "find_orphans"             '{}'                 1 o_unfinished_npc
+assert_violations "find_ontology_violations" '{"severity":"any"}' 2 ov_theron_no_died ov_maric_no_died
+
+# Severity breakdown — task body says "the orphan being a warning, not error".
+section "Layer A — severity breakdown"
+contradictions_err=$(call_tool "find_contradictions" '{"severity":"error"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+contradictions_warn=$(call_tool "find_contradictions" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+[ "$contradictions_err" = "1" ] && ok "find_contradictions severity=error -> 1" || fail "find_contradictions severity=error -> $contradictions_err (expected 1)"
+[ "$contradictions_warn" = "0" ] && ok "find_contradictions severity=warn  -> 0" || fail "find_contradictions severity=warn -> $contradictions_warn (expected 0)"
+anach_err=$(call_tool "find_anachronisms" '{"severity":"error"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+anach_warn=$(call_tool "find_anachronisms" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+[ "$anach_err" = "1" ] && ok "find_anachronisms severity=error -> 1" || fail "find_anachronisms severity=error -> $anach_err (expected 1)"
+[ "$anach_warn" = "0" ] && ok "find_anachronisms severity=warn  -> 0" || fail "find_anachronisms severity=warn -> $anach_warn (expected 0)"
+# Orphans: 1 warn (the task body specifies this is a warn, not error).
+orphan_severity=$(call_tool "find_orphans" '{}' | python3 -c "import json,sys; d=json.load(sys.stdin); print(','.join(v['severity'] for v in d['violations']))")
+if [ "$orphan_severity" = "warn" ]; then
+  ok "find_orphans -> severity=warn (orphan is a warn, not error)"
+else
+  fail "find_orphans -> severity=[$orphan_severity] (expected 'warn')"
+fi
+# Ontology: 2 warn
+ont_warn=$(call_tool "find_ontology_violations" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
+[ "$ont_warn" = "2" ] && ok "find_ontology_violations severity=warn -> 2" || fail "find_ontology_violations severity=warn -> $ont_warn (expected 2)"
+
+# Total
+total_err=0
+total_warn=0
+for t in find_contradictions find_anachronisms find_orphans find_ontology_violations; do
+  args='{"severity":"any"}'
+  [ "$t" = "find_orphans" ] && args='{}'
+  e=$(call_tool "$t" "$args" | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for v in d['violations'] if v['severity']=='error'))")
+  w=$(call_tool "$t" "$args" | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for v in d['violations'] if v['severity']=='warn'))")
+  total_err=$((total_err+e))
+  total_warn=$((total_warn+w))
+done
+total=$((total_err+total_warn))
+[ "$total" = "5" ] && ok "TOTAL violations = 5 (2 error + 3 warn)" \
+                   || fail "TOTAL violations = $total (expected 5)"
+[ "$total_err" = "2" ] && ok "TOTAL errors = 2" || fail "TOTAL errors = $total_err (expected 2)"
+[ "$total_warn" = "3" ] && ok "TOTAL warns = 3"  || fail "TOTAL warns = $total_warn (expected 3)"
+
+# ─── Layer B: LLM consumer — 5 question types ────────────────────────────────
+
+section "Layer B — LLM consumer (5 question types)"
+
+declare -a IDS=(
+  "q1_who_is_aldric"
+  "q2_was_allied_230"
+  "q3_aldric_ancestors"
+  "q4_images_of_aldric"
+  "q5_consistency_issues"
+)
+declare -a QS=(
+  "Who is Aldric Raventhorne? Give a brief bio and his known relations."
+  "Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"
+  "What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."
+  "Show me images of Aldric Raventhorne — portraits or otherwise."
+  "What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."
+)
+
+for i in "${!IDS[@]}"; do
+  id="${IDS[$i]}"
+  q="${QS[$i]}"
+  echo
+  echo "── question $((i+1))/5: $id ──"
+  echo "  Q: $q"
+  if ! python3 llm_consumer.py --question-id "$id" --question "$q" \
+        --out "results/${id}.json" >"/tmp/llm_consumer_${id}.log" 2>&1; then
+    fail "Q$((i+1)) ($id): llm_consumer.py exited non-zero — see /tmp/llm_consumer_${id}.log"
+    tail -5 "/tmp/llm_consumer_${id}.log" | sed 's/^/    /'
+    continue
+  fi
+  tail -8 "/tmp/llm_consumer_${id}.log"
+  ok "Q$((i+1)) ($id): llm_consumer.py exit=0"
+done
+
+# ─── Answer-level assertions against GROUND_TRUTH.md ─────────────────────────
+
+section "Layer B — answer-level assertions against GROUND_TRUTH.md"
+
+# Helper: read a trace and emit its (answer_lower, tools_csv) on two lines.
+trace_info() {
+  local trace_path=$1
+  python3 -c "
+import json
+d = json.load(open('$trace_path'))
+ans = (d.get('answer') or '').lower()
+tools = [t['tool'] for t in d.get('tools_called', [])]
+print(ans)
+print('---TOOLS---')
+print(','.join(tools))
+"
+}
+
+# Q1: entity_context called, answer has Aldric + a known affiliation.
+if [ -f "results/q1_who_is_aldric.json" ]; then
+  trace=$(trace_info "results/q1_who_is_aldric.json")
+  q1_ans=${trace%%$'---TOOLS---'*}
+  q1_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
+  echo "  Q1 tools: $q1_tools"
+  if [[ "$q1_tools" == *entity_context* ]]; then ok "Q1: entity_context in tools_called"; else fail "Q1: entity_context NOT called (got: $q1_tools)"; fi
+  if printf '%s' "$q1_ans" | grep -qi 'aldric'; then ok "Q1: answer mentions 'aldric'"; else fail "Q1: answer missing 'aldric'"; fi
+  if printf '%s' "$q1_ans" | grep -Eqi 'vyr|thornwall|elara|valdorni|eventide'; then
+    ok "Q1: answer mentions a known affiliation (Vyr/Thornwall/Elara/Valdorni/Eventide)"
+  else
+    fail "Q1: answer missing known affiliation"
+  fi
+else
+  fail "Q1: results/q1_who_is_aldric.json missing (LLM consumer failed)"
+fi
+
+# Q2: was_true_at called, answer says YES/allied/true.
+if [ -f "results/q2_was_allied_230.json" ]; then
+  trace=$(trace_info "results/q2_was_allied_230.json")
+  q2_ans=${trace%%$'---TOOLS---'*}
+  q2_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
+  echo "  Q2 tools: $q2_tools"
+  if [[ "$q2_tools" == *was_true_at* ]]; then ok "Q2: was_true_at in tools_called"; else fail "Q2: was_true_at NOT called (got: $q2_tools)"; fi
+  if printf '%s' "$q2_ans" | grep -Eqi 'yes|allied|true|in force|was an alliance'; then
+    ok "Q2: answer indicates YES/allied/true"
+  else
+    fail "Q2: answer missing YES/allied/true"
+  fi
+else
+  fail "Q2: results/q2_was_allied_230.json missing (LLM consumer failed)"
+fi
+
+# Q3: ancestors_of called, answer names >=3 of {Theron, Maric, Cael, Yssa}.
+if [ -f "results/q3_aldric_ancestors.json" ]; then
+  trace=$(trace_info "results/q3_aldric_ancestors.json")
+  q3_ans=${trace%%$'---TOOLS---'*}
+  q3_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
+  echo "  Q3 tools: $q3_tools"
+  if [[ "$q3_tools" == *ancestors_of* ]]; then ok "Q3: ancestors_of in tools_called"; else fail "Q3: ancestors_of NOT called (got: $q3_tools)"; fi
+  found=0
+  for n in theron maric cael yssa; do
+    if printf '%s' "$q3_ans" | grep -qi "$n"; then found=$((found+1)); fi
+  done
+  if [ "$found" -ge 3 ]; then ok "Q3: answer names $found/4 canonical ancestors (need >=3)"; else fail "Q3: answer names only $found/4 canonical ancestors (need >=3)"; fi
+else
+  fail "Q3: results/q3_aldric_ancestors.json missing (LLM consumer failed)"
+fi
+
+# Q4: image-recall tool called, answer mentions Aldric + portrait/image/etc.
+if [ -f "results/q4_images_of_aldric.json" ]; then
+  trace=$(trace_info "results/q4_images_of_aldric.json")
+  q4_ans=${trace%%$'---TOOLS---'*}
+  q4_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
+  echo "  Q4 tools: $q4_tools"
+  if [[ "$q4_tools" == *recall_images* || "$q4_tools" == *search_images_by_caption* || "$q4_tools" == *search_images_semantic* ]]; then
+    ok "Q4: image-recall tool in tools_called"
+  else
+    fail "Q4: no image-recall tool called (got: $q4_tools)"
+  fi
+  if printf '%s' "$q4_ans" | grep -qi 'aldric'; then ok "Q4: answer mentions 'aldric'"; else fail "Q4: answer missing 'aldric'"; fi
+  if printf '%s' "$q4_ans" | grep -Eqi 'portrait|image|presigned|thornwall'; then
+    ok "Q4: answer mentions portrait/image/presigned/thornwall"
+  else
+    fail "Q4: answer missing portrait/image/presigned/thornwall"
+  fi
+else
+  fail "Q4: results/q4_images_of_aldric.json missing (LLM consumer failed)"
+fi
+
+# Q5: all 4 consistency tools called; answer is NOT a "no issues" answer; mentions
+# canonical subject names and severity.
+if [ -f "results/q5_consistency_issues.json" ]; then
+  trace=$(trace_info "results/q5_consistency_issues.json")
+  q5_ans=${trace%%$'---TOOLS---'*}
+  q5_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
+  echo "  Q5 tools: $q5_tools"
+  missing=()
+  for t in find_contradictions find_anachronisms find_orphans find_ontology_violations; do
+    [[ "$q5_tools" == *"$t"* ]] || missing+=("$t")
+  done
+  if [ ${#missing[@]} -eq 0 ]; then
+    ok "Q5: all 4 consistency tools in tools_called"
+  else
+    fail "Q5: missing tools: ${missing[*]}"
+  fi
+  # Must NOT say "no issues" — there are 5 seeded violations.
+  if printf '%s' "$q5_ans" | grep -Eqi '(no|zero|none).{0,30}(open |detected |current )?(consistency |open )?(issues|problems|violations)'; then
+    fail "Q5: answer incorrectly says 'no issues' — but 5 violations are seeded"
+  else
+    ok "Q5: answer does NOT claim 'no issues' (correct — 5 violations seeded)"
+  fi
+  subject_hits=0
+  for n in aldric vex lyssa theron maric; do
+    if printf '%s' "$q5_ans" | grep -qi "$n"; then subject_hits=$((subject_hits+1)); fi
+  done
+  if [ "$subject_hits" -ge 2 ]; then
+    ok "Q5: answer mentions $subject_hits canonical subjects (need >=2)"
+  else
+    fail "Q5: answer mentions only $subject_hits canonical subjects (need >=2)"
+  fi
+  if printf '%s' "$q5_ans" | grep -Eqi 'severity|warn|warning|error'; then
+    ok "Q5: answer acknowledges severity (warn/error)"
+  else
+    fail "Q5: answer does not acknowledge severity"
+  fi
+else
+  fail "Q5: results/q5_consistency_issues.json missing (LLM consumer failed)"
+fi
+
+# ─── summary ─────────────────────────────────────────────────────────────────
+
+echo
+echo "════════════════════════════════════════════════════════════"
+if [ "$fails" -eq 0 ]; then
+  echo "  PASS — $passes checks, 0 failures"
+  echo "════════════════════════════════════════════════════════════"
+  exit 0
+else
+  echo "  FAIL — $passes checks passed, $fails FAILED:"
+  for d in "${FAIL_DETAILS[@]}"; do
+    echo "    - $d"
+  done
+  echo "════════════════════════════════════════════════════════════"
+  exit 1
+fi