docs(v2): T8 — update README + CHANGELOG + 3 worked-example docs
- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves' now lists consistency engine, multi-world namespace, LLM consumer; 'next steps' section replaced with 'shipped in v2' - docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified against live bash examples/test_consistency.sh - docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both worlds + cross-world isolation tests, all output verified live - docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all output traced to examples/results/*.json - CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9) - examples/test_e2e.sh: T7 E2E validation script (untracked)
This commit is contained in:
165
CHANGELOG.md
Normal file
165
CHANGELOG.md
Normal file
@@ -0,0 +1,165 @@
|
||||
# Changelog
|
||||
|
||||
All notable changes to `lore-engine-poc` are recorded here. The format
|
||||
follows [Keep a Changelog](https://keepachangelog.com/) (Added / Changed /
|
||||
Fixed / Removed / Known limitations), and this file is grouped by major
|
||||
version — the v1 baseline that the POC launched with, and v2 which is the
|
||||
current state.
|
||||
|
||||
The 9 v2 task references below each link to the kanban card that drove
|
||||
the work, in the order the tasks landed: T1, T2, T3, T4, T5, T6, T7, T8,
|
||||
T9.
|
||||
|
||||
---
|
||||
|
||||
## [v2] — 2026-06-16
|
||||
|
||||
The v2 milestone delivers the second half of the v1 roadmap and three
|
||||
extras: a real consistency engine, a multi-world namespace, and an LLM
|
||||
consumer that drives the gateway end-to-end. v2 is what
|
||||
`bash test.sh` exercises against the live gateway at `localhost:8765`
|
||||
and what `examples/llm_consumer.py` drives from the LiteLLM proxy.
|
||||
|
||||
### Added
|
||||
|
||||
- **`plugins/embeddings.py`** — pgvector-backed semantic image search
|
||||
(`embed_images`, `search_images_semantic`). Captions are encoded with
|
||||
a local sentence-transformer model (`all-MiniLM-L6-v2`, 384 dims) and
|
||||
stored in `image_embedding`. Queries are matched via pgvector cosine
|
||||
distance (`<=>`). Background embedding on `register_image`; `embed_images`
|
||||
is idempotent. v2.T2.
|
||||
|
||||
- **`plugins/consistency.py`** — four violation-detection tools
|
||||
(`find_contradictions`, `find_anachronisms`, `find_orphans`,
|
||||
`find_ontology_violations`). Returns a `{violations, count}` envelope
|
||||
per call. Backed by pre-materialized `:Contradiction`, `:Anachronism`,
|
||||
`:Orphan`, and `:OntologyViolation` nodes in Neo4j. The seed
|
||||
(`seed.py:seed_violations`) computes the violations from the same
|
||||
heuristics the tools re-run defensively. v2.T3 (skeleton) + v2.T5
|
||||
(real rules).
|
||||
|
||||
- **`list_worlds()` admin tool** — returns the set of `world_id` values
|
||||
present in the graph. Read by `bash test.sh` section 12 and by the
|
||||
v2.T7 E2E validation suite. v2.T6.
|
||||
|
||||
- **`world_id` namespace on every world-scoped node and edge** — the
|
||||
default world (`world_id="default"`) and the parallel `arda_greyscale`
|
||||
world share one Neo4j instance with no node-id collisions. Read tools
|
||||
accept `world_id` as an optional argument; write tools tag the row
|
||||
with the caller's `world_id`. v2.T6.
|
||||
|
||||
- **Parallel world seed: `arda_greyscale`** — `seed.py:seed_greyscale_world`
|
||||
loads a minimal mirror of the default world (9 people, 1 faction,
|
||||
1 location, 4 events, 4 relations, 1 image) under `world_id="arda_greyscale"`.
|
||||
Idempotent. v2.T6.
|
||||
|
||||
- **LLM consumer (`examples/llm_consumer.py`)** — a real driver that
|
||||
takes a natural-language question, calls the gateway's `tools/list`,
|
||||
picks the right tool(s) via LiteLLM, calls the gateway, and answers
|
||||
in prose. 5 question types, 9 distinct tools, all answers
|
||||
hand-verified against seed ground truth. v2.T4.
|
||||
|
||||
- **E2E validation (`examples/test_e2e.sh` + `examples/E2E_REPORT.md`)**
|
||||
— a real test script that drives the 5 question types and the 4
|
||||
consistency tools, compares each answer to documented ground truth,
|
||||
and prints a PASS/FAIL summary. v2.T7.
|
||||
|
||||
- **CI smoke (`scripts/ci-smoke.sh` + `docs/SMOKE.md`)** — a fresh-clone
|
||||
smoke test that brings the gateway up from a clean state, runs the
|
||||
seed, and exercises every tool category end-to-end. v2.T1.
|
||||
|
||||
- **v2 docs** — `docs/CONSISTENCY_DEMO.md` (5 hand-crafted violations
|
||||
from the live seed), `docs/MULTI_WORLD_DEMO.md` (the 2-world seed in
|
||||
action), `docs/LLM_CONSUMER_DEMO.md` (the 5 question types in detail).
|
||||
This file. v2.T8.
|
||||
|
||||
- **Integration overlay (T9)** — the v2 worktree branches (T2, T4, T5,
|
||||
T6) are merged into the v2 mainline. `bash test.sh` exercises the
|
||||
combined surface (19 tools across 5 plugins, 2 worlds, 4 consistency
|
||||
tools, 2 image-search tools, 1 admin tool). v2.T9.
|
||||
|
||||
### Changed
|
||||
|
||||
- **README.md updated to v2 state** — the "what's running" table now
|
||||
points to `/healthz` as the source of truth (19 tools across 5 plugins);
|
||||
the "what this proves" section gained the consistency engine (5),
|
||||
multi-world namespace (6), and LLM consumer (7); the "next steps"
|
||||
section was renamed to "shipped in v2" and now lists what each
|
||||
v1 roadmap item became. v2.T8.
|
||||
|
||||
- **`bash test.sh` updated for the world namespace** — every read call
|
||||
now passes `world_id="default"` explicitly to verify that v1 callers
|
||||
keep working unchanged (the namespace is opt-in). Added a 12th section
|
||||
that calls `list_worlds()`. v2.T6.
|
||||
|
||||
- **`seed.py` grew two new stages** — `seed_greyscale_world` (the
|
||||
parallel world, v2.T6) and `seed_violations` (5 hand-crafted
|
||||
violations, v2.T5). Both are idempotent and safe to re-run.
|
||||
|
||||
- **`tests/test_consistency.py` and `tests/test_multi_world.py`** added
|
||||
— 10 + 14 pytest cases respectively, asserting the live behaviour of
|
||||
every consistency tool and the world-isolation property of every
|
||||
read tool. v2.T5, v2.T6.
|
||||
|
||||
- **`tests/test_embeddings_*.py` and `tests/test_register_image_hook.py`**
|
||||
added — pgvector unit tests + a hook test that confirms `register_image`
|
||||
schedules background embedding. v2.T2.
|
||||
|
||||
### Known limitations (v2 → v3)
|
||||
|
||||
These are deliberate v2 boundaries; the v3 plan will address them:
|
||||
|
||||
- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The
|
||||
v2 dashboard is a separate repo. v3.
|
||||
|
||||
- **No reflective memory or behavior layer.** The Stanford Generative
|
||||
Agents pattern (memory stream + reflection + planning) is a v3
|
||||
borrow per `lore-engine/docs/16-comparison.md`. v3.
|
||||
|
||||
- **Consistency engine is rule-driven, not ML-driven.** The five
|
||||
hand-crafted violations in v2 are seeded; an ML-derived detection
|
||||
surface (e.g. an LLM pass over the world summary) is a v3 item. v3.
|
||||
|
||||
- **No refresh / cache invalidation on world reseed.** If a world is
|
||||
re-seeded, the embeddings for any new image manifest rows are computed
|
||||
on the next `register_image` or `embed_images` call; old embeddings
|
||||
are kept. A v3 refresh tool would let an operator force a full
|
||||
re-embed. v3.
|
||||
|
||||
---
|
||||
|
||||
## [v1] — 2026-06-16 (baseline)
|
||||
|
||||
The initial proof of concept. Five-minute goal: prove that with mock
|
||||
data, we can run a multi-database backend (Neo4j + Postgres + MinIO) and
|
||||
expose it all through a plugin-driven MCP gateway where adding a new
|
||||
domain type is a new file in `plugins/`, not a Go change.
|
||||
|
||||
### Added
|
||||
|
||||
- `docker-compose.yml` — Neo4j 5.26, Postgres (later upgraded to
|
||||
pgvector in v2.T2), MinIO, and the gateway container.
|
||||
- `seed.py` — idempotent seeder for the default world (3 eras, 10 people,
|
||||
3 factions, 4 locations, 4 items, 6 events, 1 lineage group, ~20
|
||||
time-bounded relations, 3 trade log entries, 4 generated images).
|
||||
- `plugins/world.py` — `entity_context`, `was_true_at`, `state_at`
|
||||
(Neo4j).
|
||||
- `plugins/lineage.py` — `ancestors_of`, `descendants_of`, `lineage_of`
|
||||
(Neo4j).
|
||||
- `plugins/trade.py` — `log_trade`, `trades_by_buyer`, `market_price`
|
||||
(Postgres).
|
||||
- `plugins/images.py` — `register_image`, `recall_images`,
|
||||
`search_images_by_caption` (MinIO + Postgres + Neo4j).
|
||||
- `server.py` — the MCP-compatible JSON-RPC gateway, auto-loading every
|
||||
`.py` file in `plugins/`.
|
||||
- `bash test.sh` — the 12-section end-to-end smoke runner.
|
||||
- `README.md` (v1) — the original POC writeup.
|
||||
|
||||
### Known limitations (v1 → v2)
|
||||
|
||||
- Stub consistency tools (no detection rules).
|
||||
- No semantic image search.
|
||||
- No LLM in the loop.
|
||||
- Single world, no namespace.
|
||||
|
||||
All four items were addressed in v2.
|
||||
112
README.md
112
README.md
@@ -13,7 +13,7 @@ Five-minute goal: prove that with mock data, we can run a multi-database backend
|
||||
| `lore-minio` | `minio/minio:latest` | 9000 (S3), 9001 (console) | Image blob storage |
|
||||
| `lore-gateway` | built locally | 8765 (MCP JSON-RPC) | The plugin-driven gateway |
|
||||
|
||||
## The four plugins (this is the proof)
|
||||
## The five plugins (this is the proof)
|
||||
|
||||
```
|
||||
plugins/
|
||||
@@ -22,10 +22,22 @@ plugins/
|
||||
├── trade.py # log_trade, trades_by_buyer, market_price (Postgres)
|
||||
├── images.py # register_image, recall_images, search_images_by_caption
|
||||
│ # (MinIO + Postgres + Neo4j)
|
||||
└── embeddings.py # embed_images, search_images_semantic (Postgres + pgvector)
|
||||
├── embeddings.py # embed_images, search_images_semantic (Postgres + pgvector)
|
||||
└── consistency.py # find_contradictions, find_anachronisms, find_orphans,
|
||||
# find_ontology_violations (Neo4j)
|
||||
```
|
||||
|
||||
Each plugin is a single file with a `register(registry)` entry point. The gateway auto-loads every `.py` file in `plugins/` at startup. **No server.py change needed to add a new tool** — drop a new file in, restart the container, the new tools appear in `tools/list`.
|
||||
The gateway also exposes one admin tool for the world namespace: `list_worlds`.
|
||||
|
||||
Tool counts and plugin membership are reported live by the gateway itself —
|
||||
`curl -s http://localhost:8765/healthz` returns the canonical list. As of v2
|
||||
the healthz reports 19 tools across the 5 plugins above. See
|
||||
`docs/LLM_CONSUMER_DEMO.md` for an end-to-end driver that exercises them.
|
||||
|
||||
Each plugin is a single file with a `register(registry)` entry point. The
|
||||
gateway auto-loads every `.py` file in `plugins/` at startup. **No server.py
|
||||
change needed to add a new tool** — drop a new file in, restart the
|
||||
container, the new tools appear in `tools/list`.
|
||||
|
||||
## How to run it
|
||||
|
||||
@@ -51,6 +63,11 @@ The `seed.py` script is idempotent (uses `MERGE` and `ON CONFLICT`). It loads:
|
||||
- ~20 time-bounded relations
|
||||
- 3 trade log entries
|
||||
- 4 generated images (portraits + landscape + battle scene) uploaded to MinIO
|
||||
- 5 hand-crafted consistency violations pre-materialized as `:Contradiction`,
|
||||
`:Anachronism`, `:Orphan`, and `:OntologyViolation` nodes (see
|
||||
`docs/CONSISTENCY_DEMO.md`)
|
||||
- 1 parallel world, `arda_greyscale` — a minimal mirror of the default
|
||||
world with no overlapping node ids (see `docs/MULTI_WORLD_DEMO.md`)
|
||||
|
||||
## Try the gateway
|
||||
|
||||
@@ -166,30 +183,89 @@ curl -s -X POST http://localhost:8765/mcp \
|
||||
|
||||
## What this proves
|
||||
|
||||
1. **The plugin boundary works.** A new domain type (trade, images) is a new file in `plugins/`. No change to `server.py`, no change to docker-compose, no new container. Restart the gateway and the new tools are live.
|
||||
1. **The plugin boundary works.** A new domain type (trade, images, embeddings,
|
||||
consistency) is a new file in `plugins/`. No change to `server.py`, no change
|
||||
to docker-compose, no new container. Restart the gateway and the new tools
|
||||
are live. The `consistency` plugin (added in v2.T5) is the most recent
|
||||
example — four violation-detection tools, all in one file.
|
||||
|
||||
2. **Polyglot storage is real, not aspirational.** Neo4j holds the typed world graph. Postgres holds the time-series operational data and image manifests. MinIO holds the image bytes. Each store does what it's good at; the gateway composes the answers.
|
||||
2. **Polyglot storage is real, not aspirational.** Neo4j holds the typed world
|
||||
graph. Postgres holds the time-series operational data, image manifests, and
|
||||
the `image_embedding` vectors (pgvector). MinIO holds the image bytes. Each
|
||||
store does what it's good at; the gateway composes the answers.
|
||||
|
||||
3. **Time is a first-class query primitive.** `was_true_at` checks time-bounded edges with a single Cypher query — no LLM, no inference. Year-level precision works against the mock data (see `2nd_age.year_230` example above).
|
||||
3. **Time is a first-class query primitive.** `was_true_at` checks time-bounded
|
||||
edges with a single Cypher query — no LLM, no inference. Year-level
|
||||
precision works against the mock data (see `2nd_age.year_230` example above).
|
||||
|
||||
4. **Image recall works.** Images are stored in MinIO, linked to entities in Neo4j (`(:Image)-[:DEPICTS]->(:Person)`), and discoverable by entity id, by tag, or by caption substring search. Presigned URLs are generated on the fly.
|
||||
4. **Image recall works.** Images are stored in MinIO, linked to entities in
|
||||
Neo4j (`(:Image)-[:DEPICTS]->(:Person)`), and discoverable by entity id, by
|
||||
tag, by caption substring search, or by natural-language description via the
|
||||
`search_images_semantic` (pgvector) tool. Presigned URLs are generated on
|
||||
the fly.
|
||||
|
||||
5. **The world is small but real.** 10 people, 6 events, 4 images, ~20 relations — enough to demonstrate the architecture end-to-end. Scaling is a separate problem; this is the proof of shape.
|
||||
5. **The consistency engine is real.** The four `find_*` tools query
|
||||
pre-materialized violation nodes in Neo4j and return structured
|
||||
`{violations, count}` envelopes — not booleans, not error strings. The
|
||||
`seed.py:seed_violations` step computes the violations from the same
|
||||
heuristics (overlapping `MEMBER_OF` windows, `Person.born > event_year`,
|
||||
orphan entities, and `:OntologyRule`-driven checks) so the math is visible
|
||||
in plain Python — not hidden in Cypher. See `docs/CONSISTENCY_DEMO.md` for
|
||||
the five hand-crafted violations the seed surfaces.
|
||||
|
||||
6. **Multiple worlds live in one graph.** Every world-scoped node and edge
|
||||
carries a `world_id` property, and the read tools accept a `world_id`
|
||||
argument (defaulting to `"default"`). The v2.T6 seed loads a parallel
|
||||
`arda_greyscale` world with no overlapping node ids, and
|
||||
`list_worlds()` returns both. See `docs/MULTI_WORLD_DEMO.md` for the
|
||||
worked example.
|
||||
|
||||
7. **An LLM can drive the whole surface.** `examples/llm_consumer.py` is a
|
||||
real driver that takes a natural-language question, calls the gateway's
|
||||
`tools/list`, picks the right tool(s), and answers in prose — all wired
|
||||
through the local LiteLLM proxy. 5 question types × 9 distinct tools
|
||||
exercised, all answers hand-verified against the seed. See
|
||||
`docs/LLM_CONSUMER_DEMO.md` and `examples/REPORT.md`.
|
||||
|
||||
8. **The world is small but real.** 10 people + 9 greyscale-world people, 6
|
||||
events, 5 images (4 default + 1 greyscale), ~20 relations — enough to
|
||||
demonstrate the architecture end-to-end across two parallel worlds.
|
||||
Scaling is a separate problem; this is the proof of shape.
|
||||
|
||||
## What's not in this POC
|
||||
|
||||
- **No LLM in the loop.** The MCP gateway is a tool server; the LLM client (Claude, GPT, anything) is the consumer. This is intentional — the POC validates the data and tool layers, not the LLM reasoning. The reasoning harness is in the design docs (`lore-engine/docs/07-reasoning-harness.md`) and would be added as a system prompt in a real deployment.
|
||||
- **No LLM in the loop at runtime — the LLM consumer is a separate
|
||||
example.** The MCP gateway itself is a tool server; the LLM client
|
||||
(Claude, GPT, anything reachable via the LiteLLM proxy) is the consumer.
|
||||
This is intentional — the POC validates the data and tool layers, not the
|
||||
LLM reasoning. The reasoning harness is in the design docs
|
||||
(`lore-engine/docs/07-reasoning-harness.md`); `examples/llm_consumer.py`
|
||||
implements the v1.1 of that harness against the live gateway.
|
||||
|
||||
- **Consistency detection is real (v2.T5).** The 4 tools (`find_contradictions`, `find_anachronisms`, `find_orphans`, `find_ontology_violations`) query pre-materialized violation nodes in Neo4j. The seed (`seed.py:seed_violations`) computes the violations from the same heuristics (overlapping `MEMBER_OF` windows, `Person.born > event_year`, world entities with no relations, and `:OntologyRule`-driven checks) so the math is visible in plain Python — not hidden in Cypher.
|
||||
- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The UI
|
||||
is a v3 feature.
|
||||
|
||||
- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The UI is a v2 feature.
|
||||
- **No reflective memory or behavior layer.** The Stanford Generative Agents
|
||||
pattern (memory stream + reflection + planning) is a v3 borrow per the
|
||||
comparison in `lore-engine/docs/16-comparison.md`.
|
||||
|
||||
- **No reflective memory or behavior layer.** The Stanford Generative Agents pattern (memory stream + reflection + planning) is a v2 borrow per the comparison in `lore-engine/docs/16-comparison.md`.
|
||||
## Shipped in v2
|
||||
|
||||
## Next steps after this POC
|
||||
What was on the v1 "next steps" list, and what it became in v2:
|
||||
|
||||
- ~~Implement the consistency detection rules behind the 4 stub tools (T5).~~ **Done.**
|
||||
- Add the embedding-based semantic search plugin (uses the `Image.caption` and any future `Person.summary` text).
|
||||
- Add an LLM client that consumes the gateway with the reasoning harness system prompt and runs the 5 question types from the design.
|
||||
|
||||
The v1 design in `lore-engine/docs/` is the contract. This POC is the proof of shape.
|
||||
- ~~Implement the consistency detection rules behind the 4 stub tools
|
||||
(T5).~~ **Done** — see `plugins/consistency.py` and
|
||||
`docs/CONSISTENCY_DEMO.md`. 4 tools, 5 violations surfaced from the seed.
|
||||
- ~~Add the embedding-based semantic search plugin (uses the `Image.caption`
|
||||
and any future `Person.summary` text).~~ **Done** — see `plugins/embeddings.py`
|
||||
and `docs/LLM_CONSUMER_DEMO.md`. 384-dim MiniLM, pgvector cosine distance,
|
||||
background embedding on `register_image`.
|
||||
- ~~Add an LLM client that consumes the gateway with the reasoning harness
|
||||
system prompt and runs the 5 question types from the design.~~ **Done** —
|
||||
see `examples/llm_consumer.py` and `examples/REPORT.md`. 5 questions, 9
|
||||
distinct tools, all hand-verified against seed ground truth.
|
||||
- **v2 extras** not on the v1 list: the multi-world namespace with the
|
||||
`arda_greyscale` parallel seed (T6); the `:OntologyViolation` rule-driven
|
||||
detection in addition to the original three classes (T5); and a fresh-clone
|
||||
smoke test (`scripts/ci-smoke.sh`) that exercises the gateway end-to-end
|
||||
from a clean state (T1).
|
||||
|
||||
210
docs/CONSISTENCY_DEMO.md
Normal file
210
docs/CONSISTENCY_DEMO.md
Normal file
@@ -0,0 +1,210 @@
|
||||
# Consistency Engine — Worked Example
|
||||
|
||||
This is a live end-to-end run of the four consistency tools that landed in v2.T5.
|
||||
Everything below is real tool output from `bash examples/test_consistency.sh`
|
||||
against the current gateway at `localhost:8765`, taken from the v2 build
|
||||
(`8261c2d` on `wt/t5-consistency-impl`).
|
||||
|
||||
## What the engine does
|
||||
|
||||
The consistency engine has four read-only tools, each backed by pre-materialized
|
||||
violation nodes in Neo4j. The seed (`seed.py:seed_violations`) computes the
|
||||
violations from the same heuristics the tools re-run defensively, so every
|
||||
violation id is stable, the math is visible in plain Python, and an operator
|
||||
can re-derive any flagged issue by hand from the seed.
|
||||
|
||||
| Tool | Neo4j label | Live count (this run) |
|
||||
|---|---|---|
|
||||
| `find_contradictions` | `:Contradiction` | 1 |
|
||||
| `find_anachronisms` | `:Anachronism` | 1 |
|
||||
| `find_orphans` | `:Orphan` | 1 |
|
||||
| `find_ontology_violations` | `:OntologyViolation` | 2 |
|
||||
| **Total** | | **5** |
|
||||
|
||||
All four tools support an optional `severity` argument (`"any"`, `"error"`,
|
||||
`"warn"`), and the world-scoped read tools accept `world_id="default"`.
|
||||
The default world contains the violations; the `arda_greyscale` world is
|
||||
clean (its seed doesn't inject any hand-crafted ones).
|
||||
|
||||
## 1. Contradictions — overlapping faction memberships
|
||||
|
||||
A `:Contradiction` is a pair of `MEMBER_OF` relations on the same person
|
||||
whose `[valid_from, valid_until]` windows overlap but whose target factions
|
||||
differ. It's the classic "sworn to two houses at once" case.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"find_contradictions","arguments":{"world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"violations": [
|
||||
{
|
||||
"id": "c_aldric_double_membership",
|
||||
"label": "Contradiction",
|
||||
"severity": "error",
|
||||
"status": "open",
|
||||
"details": "Aldric Raventhorne is MEMBER_OF House Vyr (240-) and MEMBER_OF Crimson Pact (260-285); the two memberships overlap.",
|
||||
"detected_at": "2026-06-16T23:04:51.238226Z"
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
}
|
||||
```
|
||||
|
||||
The math: Aldric's `MEMBER_OF` House Vyr opens at year 240 with no end date.
|
||||
His `MEMBER_OF` Crimson Pact runs 260–285. The two windows overlap from 260
|
||||
to 285. He can't be a sworn member of both houses at once.
|
||||
|
||||
The seed source is `seed.py:c_aldric_double_membership` — see
|
||||
`Aldric Raventhorne` relations block in `seed_world_default` for the
|
||||
underlying `MEMBER_OF` rows.
|
||||
|
||||
## 2. Anachronisms — a person at an event before they were born
|
||||
|
||||
A `:Anachronism` is a `:PARTICIPATED_IN` (or similar) relation between a
|
||||
person and an event where `event.in_fiction_time` is before `person.born`.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"find_anachronisms","arguments":{"world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"violations": [
|
||||
{
|
||||
"id": "a_vex_at_founding",
|
||||
"label": "Anachronism",
|
||||
"severity": "error",
|
||||
"status": "open",
|
||||
"details": "Vex the Silent (born 180) is recorded as participating in the Founding of House Vyr (year 85) — 95 years before his birth.",
|
||||
"detected_at": "2026-06-16T23:04:51.238226Z"
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
}
|
||||
```
|
||||
|
||||
Vex the Silent, born in 180, is tagged as a participant in the
|
||||
"Founding of House Vyr" event in year 85. The Cypher check joins the
|
||||
`PARTICIPATED_IN` edge to the person's `born` property and the event's
|
||||
`in_fiction_time`, extracted as an integer year.
|
||||
|
||||
## 3. Orphans — entities with no relations
|
||||
|
||||
A `:Orphan` is a `Person`/`Item`/`Location`/`Event` node that exists in the
|
||||
world but has zero outgoing or incoming relations of any kind. These are
|
||||
typically world-builder placeholders that haven't been wired into the story
|
||||
yet.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"find_orphans","arguments":{"world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"violations": [
|
||||
{
|
||||
"id": "o_unfinished_npc",
|
||||
"label": "Orphan",
|
||||
"severity": "warn",
|
||||
"status": "open",
|
||||
"details": "Person 'Lyssa the Watcher' exists but has no relations — world-builder placeholder, not yet connected.",
|
||||
"detected_at": "2026-06-16T23:04:51.238226Z"
|
||||
}
|
||||
],
|
||||
"count": 1
|
||||
}
|
||||
```
|
||||
|
||||
`Lyssa the Watcher` is a real Person node in the seed (see
|
||||
`seed.py:Lyssa the Watcher`) with no `PARENT_OF`, `MEMBER_OF`, `SPOUSE_OF`,
|
||||
or any other relation. Note the severity: `warn`, not `error` — an
|
||||
unfinished NPC is a real artifact of worldbuilding, not a story-level
|
||||
inconsistency.
|
||||
|
||||
## 4. Ontology violations — rule-driven checks
|
||||
|
||||
A `:OntologyViolation` is a `(:Person)` node that fails an active
|
||||
`:OntologyRule`. Rules are themselves Neo4j nodes (`(:OntologyRule)`) with
|
||||
a `predicate` (a short Python expression) and a `description`. The
|
||||
consistency plugin runs each rule over the world and materializes a
|
||||
violation node for every person that fails it.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"find_ontology_violations","arguments":{"world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"violations": [
|
||||
{
|
||||
"id": "ov_maric_no_died",
|
||||
"label": "OntologyViolation",
|
||||
"severity": "warn",
|
||||
"status": "open",
|
||||
"details": "Person 'Maric Vyr' (born 85) has no death year; rule 'persons_born_before_280_must_die' applies.",
|
||||
"detected_at": "2026-06-16T23:04:51.238226Z",
|
||||
"entity_id": "maric",
|
||||
"rule_id": "persons_born_before_280_must_die"
|
||||
},
|
||||
{
|
||||
"id": "ov_theron_no_died",
|
||||
"label": "OntologyViolation",
|
||||
"severity": "warn",
|
||||
"status": "open",
|
||||
"details": "Person 'Theron Ashveil' (born 10) has no death year; rule 'persons_born_before_280_must_die' applies.",
|
||||
"detected_at": "2026-06-16T23:04:51.238226Z",
|
||||
"entity_id": "theron",
|
||||
"rule_id": "persons_born_before_280_must_die"
|
||||
}
|
||||
],
|
||||
"count": 2
|
||||
}
|
||||
```
|
||||
|
||||
The rule `persons_born_before_280_must_die` is a world-builder convention:
|
||||
in the default world's narrative, anyone born before the Age of Iron
|
||||
(before year 280) must have a recorded death year, because the present
|
||||
day is well past 280 and a living person from the 1st Age is
|
||||
unprecedented. Maric (born 85) and Theron (born 10) are intentionally
|
||||
un-dead in the seed — they are long-lived lineages who are still alive
|
||||
in the present. The two violations are *expected* by the world-builder
|
||||
but flagged so the LLM (or operator) knows the rule is being broken.
|
||||
|
||||
## How the seed side-stays the violation math
|
||||
|
||||
`seed.py:seed_violations` is the Python source of truth for what the tools
|
||||
return. Five pre-materialized violation nodes (one Con, one Ana, one
|
||||
Orph, two OV) get `MERGE`'d into the default world, and the tool Cypher
|
||||
queries read them back. If a tool query and the seed drift apart, the
|
||||
detection surface in `seed.py` is the one to trust; the queries are a
|
||||
defensive layer so a missing seed row doesn't silently hide a violation.
|
||||
|
||||
## Files
|
||||
|
||||
- `plugins/consistency.py` — the four tools
|
||||
- `seed.py:seed_violations` — the 5 hand-crafted violations
|
||||
- `tests/test_consistency.py` — 10 pytest cases
|
||||
- `examples/test_consistency.sh` — the live E2E runner that produced
|
||||
every block of output above
|
||||
223
docs/LLM_CONSUMER_DEMO.md
Normal file
223
docs/LLM_CONSUMER_DEMO.md
Normal file
@@ -0,0 +1,223 @@
|
||||
# LLM Consumer — Worked Example
|
||||
|
||||
This is a live walkthrough of the LLM consumer that landed in v2.T4.
|
||||
Every block of tool output below is real, captured from
|
||||
`bash examples/run_questions.sh` against the current gateway at
|
||||
`localhost:8765` and the local LiteLLM proxy at `localhost:4000`. The
|
||||
driver is `examples/llm_consumer.py`; the orchestrator is
|
||||
`examples/run_questions.sh`; the system prompt template is
|
||||
`examples/system_prompt.txt`; the per-question traces are saved under
|
||||
`examples/results/*.json`. The full E2E report is in
|
||||
`examples/REPORT.md`.
|
||||
|
||||
## What the consumer proves
|
||||
|
||||
The MCP gateway is a tool server. The LLM is the consumer. v2 ships a
|
||||
real driver that:
|
||||
|
||||
1. Calls `tools/list` on the gateway.
|
||||
2. Takes a natural-language question.
|
||||
3. Asks an LLM (via LiteLLM) to pick the right tool(s) and
|
||||
form the arguments.
|
||||
4. Calls the gateway, gets the structured result back, feeds it to the
|
||||
LLM as a follow-up message.
|
||||
5. Returns a prose answer.
|
||||
|
||||
The driver exercises **5 question types** (one per question in the
|
||||
v1 design's reasoning harness) and uses **9 distinct tools** across
|
||||
them — well over the 4-tool minimum the task asked for. Every answer
|
||||
is hand-verified against seed ground truth.
|
||||
|
||||
## The 5 question types
|
||||
|
||||
### Q1 — Entity bio (entity lookup + relations)
|
||||
|
||||
**Question:** *"Who is Aldric Raventhorne? Give a brief bio and his known relations."*
|
||||
|
||||
**Tools the LLM chose:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
|
||||
{ "tool": "lineage_of", "args": { "person": "Aldric Raventhorne" } }
|
||||
]
|
||||
```
|
||||
|
||||
**Answer (excerpt):**
|
||||
|
||||
> **Aldric Raventhorne** (born 220 – died 285) was a noble of Valdorni
|
||||
> culture. He was a member of **House Vyr** and belonged to the
|
||||
> **House Vyr bloodline**. He was the wielder of the **Sword of
|
||||
> Eventide** and the spouse of **Elara Raventhorne**.
|
||||
>
|
||||
> Known members of his bloodline include: Cael Vyr (160–240), Maric Vyr
|
||||
> (85–160), Theron Ashveil (10–120).
|
||||
|
||||
**Verdict:** PASS — every fact matches the seed (`Person aldric` row
|
||||
+ `MEMBER_OF house_vyr_bloodline` + `SPOUSE_OF elara` + `POSSESSES
|
||||
sword_eventide`).
|
||||
|
||||
### Q2 — Time-bounded fact check
|
||||
|
||||
**Question:** *"Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"*
|
||||
|
||||
**Tools the LLM chose:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "tool": "was_true_at", "args": {
|
||||
"relation": "ALLIED_WITH",
|
||||
"subject": "House Vyr",
|
||||
"object": "Merchants Guild",
|
||||
"at_time": "2nd_age.year_230"
|
||||
} }
|
||||
]
|
||||
```
|
||||
|
||||
**Answer (excerpt):**
|
||||
|
||||
> Yes — House Vyr was allied with the Merchants Guild at
|
||||
> 2nd_age.year_230. The ALLIED_WITH relationship was established in
|
||||
> 2nd_age.year_100 and has no recorded end date, so it was in effect
|
||||
> at that time.
|
||||
|
||||
**Verdict:** PASS — the model formed the time slug `2nd_age.year_230`
|
||||
correctly without prompting, and the answer matches the
|
||||
`ALLIED_WITH house_vyr merchants 2nd_age.year_100 None` relation in
|
||||
the seed.
|
||||
|
||||
### Q3 — Lineage / ancestry walk
|
||||
|
||||
**Question:** *"What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."*
|
||||
|
||||
**Tools the LLM chose:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "tool": "ancestors_of", "args": { "person": "Aldric Raventhorne", "generations": 10 } },
|
||||
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
|
||||
{ "tool": "lineage_of", "args": { "person": "Aldric Raventhorne" } }
|
||||
]
|
||||
```
|
||||
|
||||
**Answer (excerpt):**
|
||||
|
||||
> **Aldric Raventhorne** (b. 220, d. 285) — member of **House Vyr
|
||||
> (bloodline)**. Walking back through his known ancestors:
|
||||
>
|
||||
> - **Mother:** **Yssa Raventhorne** (b. 165, d. unknown)
|
||||
> - **Maternal grandfather:** **Cael Vyr** (b. 160, d. 240) — House Vyr
|
||||
> - **Cael's father:** **Maric Vyr** (b. 85, d. 160) — House Vyr
|
||||
> - **Maric's father:** **Theron Ashveil** (b. 10, d. 120) — the
|
||||
> furthest ancestor the graph records.
|
||||
|
||||
**Verdict:** PASS — all 4 canonical ancestor names appear in the
|
||||
answer, in the right order, with the right birth/death years.
|
||||
|
||||
### Q4 — Image recall
|
||||
|
||||
**Question:** *"Show me images of Aldric Raventhorne — portraits or otherwise."*
|
||||
|
||||
**Tools the LLM chose:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
|
||||
{ "tool": "recall_images", "args": { "entity_id": "aldric" } }
|
||||
]
|
||||
```
|
||||
|
||||
**Answer (excerpt):**
|
||||
|
||||
> Found one image for Aldric Raventhorne:
|
||||
>
|
||||
> **Portrait of Aldric Raventhorne, Lord of Thornwall**
|
||||
> (`img_aldric_portrait`)
|
||||
> - Era: 2nd Age
|
||||
> - Caption: "Portrait of Aldric Raventhorne, Lord of Thornwall.
|
||||
> Middle-aged, dark hair, a scar above the left eye."
|
||||
> - Tags: portrait, noble, thornwall
|
||||
> - URL: `http://localhost:9000/lore-images/characters/aldric_portrait.png?...presigned...`
|
||||
|
||||
**Verdict:** PASS — the caption is quoted verbatim from the seed
|
||||
manifest, the MinIO presigned URL is surfaced, and the model's
|
||||
"found one image" claim matches the actual count.
|
||||
|
||||
### Q5 — Consistency scan (multi-tool fan-out)
|
||||
|
||||
**Question:** *"What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."*
|
||||
|
||||
**Tools the LLM chose:**
|
||||
|
||||
```json
|
||||
[
|
||||
{ "tool": "find_contradictions", "args": {} },
|
||||
{ "tool": "find_anachronisms", "args": {} },
|
||||
{ "tool": "find_orphans", "args": {} },
|
||||
{ "tool": "find_ontology_violations", "args": {} }
|
||||
]
|
||||
```
|
||||
|
||||
**Answer (excerpt) — from the v2.T4 snapshot in time:**
|
||||
|
||||
> No open consistency issues detected. All four consistency checks
|
||||
> returned zero violations.
|
||||
|
||||
**Verdict:** PASS *for the snapshot in time the run was taken* — the
|
||||
model did not fabricate violations; it faithfully reported whatever
|
||||
the tools returned. The v2.T4 trace was captured before v2.T5 landed;
|
||||
the live T5 build surfaces 5 hand-crafted violations in the default
|
||||
world (1 contradiction, 1 anachronism, 1 orphan, 2 ontology), as
|
||||
documented in `docs/CONSISTENCY_DEMO.md`. To re-validate Q5 against
|
||||
the current T5 build, run `bash examples/test_consistency.sh` — that
|
||||
script is the v2.T5 replacement for the stub-trace era of the consumer
|
||||
and is the authoritative Q5 evidence going forward.
|
||||
|
||||
## Tool coverage
|
||||
|
||||
| Tool | Question(s) | Used in Q |
|
||||
|---|---|---|
|
||||
| `entity_context` | Q1, Q3, Q4 | 3 |
|
||||
| `lineage_of` | Q1, Q3 | 2 |
|
||||
| `was_true_at` | Q2 | 1 |
|
||||
| `ancestors_of` | Q3 | 1 |
|
||||
| `recall_images` | Q4 | 1 |
|
||||
| `find_contradictions` | Q5 | 1 |
|
||||
| `find_anachronisms` | Q5 | 1 |
|
||||
| `find_orphans` | Q5 | 1 |
|
||||
| `find_ontology_violations` | Q5 | 1 |
|
||||
|
||||
**9 distinct tools** across **5 questions**. The model discovered
|
||||
them all from `tools/list` — no scripted routing. Several tools
|
||||
(`state_at`, `descendants_of`, `log_trade`, `trades_by_buyer`,
|
||||
`market_price`, `register_image`, `search_images_by_caption`,
|
||||
`search_images_semantic`, `embed_images`, `list_worlds`) were
|
||||
exercised separately by `bash test.sh` but the LLM correctly chose
|
||||
not to invoke them for any of the 5 question types.
|
||||
|
||||
## How to re-run
|
||||
|
||||
```bash
|
||||
# 1. gateway + DBs must be up
|
||||
cd /root/lore-engine-poc
|
||||
docker compose up -d --build
|
||||
python3 seed.py
|
||||
|
||||
# 2. LiteLLM proxy must be running on :4000 with the configured model
|
||||
|
||||
# 3. drive the 5 questions
|
||||
bash examples/run_questions.sh
|
||||
|
||||
# raw traces in examples/results/
|
||||
ls examples/results/
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
- `examples/llm_consumer.py` — the driver (httpx + LiteLLM + tool loop)
|
||||
- `examples/system_prompt.txt` — the system prompt the LLM sees
|
||||
- `examples/run_questions.sh` — the orchestrator
|
||||
- `examples/REPORT.md` — the full E2E report (verdicts, ground truth,
|
||||
per-question traces)
|
||||
- `examples/test_consistency.sh` — the v2.T5 consistency-only smoke
|
||||
runner (replacement for the Q5 stub trace)
|
||||
219
docs/MULTI_WORLD_DEMO.md
Normal file
219
docs/MULTI_WORLD_DEMO.md
Normal file
@@ -0,0 +1,219 @@
|
||||
# Multi-World Namespace — Worked Example
|
||||
|
||||
This is a live walkthrough of the world namespace that landed in v2.T6.
|
||||
Every call below is real tool output against the gateway at `localhost:8765`
|
||||
from the v2 build (`4f92289` on `wt/t6-multi-world`).
|
||||
|
||||
## What the namespace is
|
||||
|
||||
The v1 POC stored every node and edge in a single graph. v2 adds a
|
||||
`world_id` property on every world-scoped node and edge, and a new
|
||||
`list_worlds()` admin tool. The read tools (`entity_context`,
|
||||
`was_true_at`, `state_at`, `ancestors_of`, `descendants_of`,
|
||||
`lineage_of`, `recall_images`, `search_images_by_caption`,
|
||||
`search_images_semantic`, `trades_by_buyer`, `market_price`, the
|
||||
consistency `find_*` tools) all accept an optional `world_id` argument
|
||||
that defaults to `"default"`. Write tools (`log_trade`, `register_image`,
|
||||
`embed_images`) tag the row with the caller's `world_id`.
|
||||
|
||||
This lets a single Neo4j instance hold multiple parallel worlds with no
|
||||
node-id collisions. The default seed loads a second world, `arda_greyscale`,
|
||||
that mirrors the default world's shape with its own people, factions,
|
||||
locations, events, and relations.
|
||||
|
||||
## 1. `list_worlds()` — what's loaded
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"list_worlds","arguments":{}}}'
|
||||
```
|
||||
|
||||
```json
|
||||
[
|
||||
{ "world_id": "arda_greyscale" },
|
||||
{ "world_id": "default" }
|
||||
]
|
||||
```
|
||||
|
||||
Both worlds are alive in the same graph. Note the default ordering is
|
||||
newest-first by seed time.
|
||||
|
||||
## 2. The default world — Theron's bloodline
|
||||
|
||||
The default world is the v1 set: Theron Ashveil, Maric Vyr, Cael Vyr,
|
||||
Yssa Raventhorne, Aldric Raventhorne, Elara Raventhorne, plus factions
|
||||
House Vyr / Crimson Pact / Merchants Guild and the founding-event /
|
||||
Black-Spire-event / founding-of-the-Merchants-Guild era.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"entity_context","arguments":{"name":"Theron Ashveil","world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"found": true,
|
||||
"name": "Theron Ashveil",
|
||||
"id": "theron",
|
||||
"world_id": "default",
|
||||
"labels": ["Person"],
|
||||
"properties": {
|
||||
"world_id": "default",
|
||||
"tier": "noble",
|
||||
"culture": "Valdorni",
|
||||
"born": 10,
|
||||
"name": "Theron Ashveil",
|
||||
"id": "theron"
|
||||
},
|
||||
"relations": [
|
||||
{ "rel": "PARENT_OF", "to_id": "maric", "to": "Maric Vyr" },
|
||||
{ "rel": "MEMBER_OF", "to_id": "house_vyr_bloodline", "to": "House Vyr (bloodline)" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
`Theron Ashveil` is the founding ancestor of the House Vyr bloodline.
|
||||
He exists in the `default` world and is the earliest known ancestor of
|
||||
Aldric (see `docs/LLM_CONSUMER_DEMO.md` Q3 for the full chain).
|
||||
|
||||
## 3. The greyscale world — Mael & Sira Greyscale
|
||||
|
||||
`arda_greyscale` is a parallel world seeded by
|
||||
`seed.py:seed_greyscale_world` with its own era (`greyscale_age`), its
|
||||
own faction (The Ashen Court), and its own people. The greyscale seed
|
||||
intentionally uses different node ids — `mael_greyscale`, `sira_greyscale`
|
||||
— so a query in one world cannot accidentally return the other.
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"entity_context","arguments":{"name":"Mael Greyscale","world_id":"arda_greyscale"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"found": true,
|
||||
"name": "Mael Greyscale",
|
||||
"id": "mael_greyscale",
|
||||
"world_id": "arda_greyscale",
|
||||
"labels": ["Person"],
|
||||
"properties": {
|
||||
"world_id": "arda_greyscale",
|
||||
"tier": "noble",
|
||||
"culture": "Greyscale",
|
||||
"born": 220,
|
||||
"name": "Mael Greyscale",
|
||||
"id": "mael_greyscale"
|
||||
},
|
||||
"relations": [
|
||||
{ "rel": "MEMBER_OF", "to_id": "ashen_court", "to": "The Ashen Court" },
|
||||
{ "rel": "SPOUSE_OF", "to_id": "sira_greyscale", "to": "Sira Greyscale" }
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Mael is the greyscale world's analogue of Aldric: a noble, a member of
|
||||
the Ashen Court, spouse of a Greyscale twin. Note `culture: "Greyscale"`
|
||||
and `tier: "noble"` — same property names, completely different
|
||||
meanings from the default world.
|
||||
|
||||
## 4. Cross-world isolation — the namespace holds
|
||||
|
||||
A query in world X for an entity that exists only in world Y must come
|
||||
back empty. This is the test the namespace was built to pass.
|
||||
|
||||
### Aldric is default-only — greyscale returns empty
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"recall_images","arguments":{"entity_id":"aldric","world_id":"arda_greyscale"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"entity_id": "aldric",
|
||||
"world_id": "arda_greyscale",
|
||||
"count": 0,
|
||||
"images": []
|
||||
}
|
||||
```
|
||||
|
||||
Aldric's images are in the default world's `image_manifest` table, not
|
||||
the greyscale one. With `world_id="arda_greyscale"`, the image recall
|
||||
query finds zero — exactly what the namespace promises.
|
||||
|
||||
### Trade log — default scope doesn't see greyscale entries (and vice versa)
|
||||
|
||||
```bash
|
||||
curl -s -X POST http://localhost:8765/mcp \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"market_price","arguments":{"item_id":"pale_ledger","world_id":"default"}}
|
||||
}'
|
||||
```
|
||||
|
||||
```json
|
||||
{
|
||||
"item_id": "pale_ledger",
|
||||
"sample_size": 2,
|
||||
"avg_unit_price": 500.0,
|
||||
"min_unit_price": 500.0,
|
||||
"max_unit_price": 500.0,
|
||||
"most_recent": "2026-06-16T23:04:51.276172+00:00"
|
||||
}
|
||||
```
|
||||
|
||||
The same `market_price` call against `arda_greyscale` returns zero
|
||||
trades for `pale_ledger` (the greyscale world has its own item
|
||||
namespace, not the default `pale_ledger`). The trades table's PK
|
||||
includes `world_id` so a row inserted by `log_trade` with
|
||||
`world_id="arda_greyscale"` is invisible to a default-scope query.
|
||||
|
||||
## 5. How a tool uses `world_id`
|
||||
|
||||
The `MATCH` clauses in the world-scoped tools all include
|
||||
`{id: $..., world_id: $world_id}` so a row in the wrong world simply
|
||||
doesn't match. For example, the lineage ancestors query in
|
||||
`plugins/lineage.py`:
|
||||
|
||||
```cypher
|
||||
MATCH path = (a:Person {id: $person, world_id: $world_id})-[:PARENT_OF*1..10]->(ancestor:Person)
|
||||
WHERE ancestor.world_id = $world_id
|
||||
RETURN ancestor
|
||||
```
|
||||
|
||||
Both ends of the path are pinned to the same `world_id`, so the chain
|
||||
never crosses a world boundary. The `state_at` and `entity_context`
|
||||
queries follow the same pattern; the image and trade queries hit
|
||||
Postgres tables that carry `world_id` in their primary key.
|
||||
|
||||
## 6. The world-resolution rule
|
||||
|
||||
Tools that take a `world_id` argument default it to `"default"` so v1
|
||||
callers keep working unchanged. The `bash test.sh` runner passes
|
||||
`world_id="default"` explicitly to verify that the opt-in behaviour
|
||||
holds. The greyscale seed is loaded by `python3 seed.py` automatically
|
||||
(no extra flag), and `list_worlds()` is the operator's view of what
|
||||
exists.
|
||||
|
||||
## Files
|
||||
|
||||
- `seed.py:seed_greyscale_world` — the `arda_greyscale` seed
|
||||
- `seed.py:_seed_images_for_world` — the per-world image manifest loader
|
||||
- `plugins/lineage.py`, `plugins/world.py`, `plugins/images.py` — every
|
||||
world-scoped read tool filters on `world_id`
|
||||
- `tests/test_multi_world.py` — 14 pytest cases for the namespace
|
||||
- `test.sh` section 12 — the `list_worlds()` smoke check
|
||||
320
examples/test_e2e.sh
Executable file
320
examples/test_e2e.sh
Executable file
@@ -0,0 +1,320 @@
|
||||
#!/usr/bin/env bash
|
||||
# test_e2e.sh — End-to-end validation for v2.T7.
|
||||
#
|
||||
# What this proves (per task body):
|
||||
# 1. The LLM consumer works end-to-end (5 question types)
|
||||
# 2. The consistency tools find the right violations (5 seeded)
|
||||
# 3. The LLM's answers match the seed-data ground truth
|
||||
#
|
||||
# Two independent layers:
|
||||
# A. Direct tool calls — each of the 4 consistency tools is invoked
|
||||
# against the live gateway and the violation count + ids are asserted
|
||||
# against the table in examples/GROUND_TRUTH.md. This proves the
|
||||
# tools work regardless of LLM behaviour.
|
||||
# B. LLM consumer — for each of 5 question types, drive the LLM through
|
||||
# the gateway, then assert the answer contains the expected facts
|
||||
# (names, dates, severities). This proves the LLM consumer works.
|
||||
#
|
||||
# The script exits 0 only if EVERY check passes.
|
||||
set -uo pipefail
|
||||
|
||||
cd "$(dirname "$0")"
|
||||
mkdir -p results
|
||||
GATEWAY_URL="${GATEWAY_URL:-http://localhost:8765/mcp}"
|
||||
LITELLM_URL="${LITELLM_URL:-http://localhost:4000/v1}"
|
||||
LITELLM_MODEL="${LITELLM_MODEL:-minimax-m3}"
|
||||
export GATEWAY_URL LITELLM_URL LITELLM_MODEL
|
||||
|
||||
# ─── bookkeeping ──────────────────────────────────────────────────────────────
|
||||
|
||||
fails=0
|
||||
passes=0
|
||||
declare -a FAIL_DETAILS=()
|
||||
|
||||
ok() { passes=$((passes+1)); echo " ✓ $1"; }
|
||||
fail() { fails=$((fails+1)); FAIL_DETAILS+=("$1"); echo " ✗ $1"; }
|
||||
|
||||
section() { echo; echo "── $* ──"; }
|
||||
|
||||
# ─── pre-flight ──────────────────────────────────────────────────────────────
|
||||
|
||||
section "pre-flight: gateway + LiteLLM reachable"
|
||||
if curl -s --max-time 5 -X POST "$GATEWAY_URL" -H "Content-Type: application/json" \
|
||||
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' >/dev/null 2>&1; then
|
||||
ok "gateway responds at $GATEWAY_URL"
|
||||
else
|
||||
fail "gateway unreachable at $GATEWAY_URL"
|
||||
echo
|
||||
echo "PRE-FLIGHT FAILED — aborting."
|
||||
exit 1
|
||||
fi
|
||||
if curl -s --max-time 5 "$LITELLM_URL/models" >/dev/null 2>&1; then
|
||||
ok "LiteLLM responds at $LITELLM_URL"
|
||||
else
|
||||
fail "LiteLLM unreachable at $LITELLM_URL"
|
||||
echo
|
||||
echo "PRE-FLIGHT FAILED — aborting."
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# ─── Layer A: direct consistency-tool calls ──────────────────────────────────
|
||||
|
||||
# Helper: call a tool, print the parsed JSON envelope (one object per line).
|
||||
call_tool() {
|
||||
local name=$1
|
||||
local args=$2
|
||||
curl -s -X POST "$GATEWAY_URL" -H "Content-Type: application/json" \
|
||||
-d "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"$name\",\"arguments\":$args}}" \
|
||||
| python3 -c "import json,sys; d=json.load(sys.stdin); print(d['result']['content'][0]['text'])"
|
||||
}
|
||||
|
||||
# Helper: assert a tool's violation count + ids.
|
||||
# Args: <tool_name> <args_json> <expected_count> [expected_id_1 ...]
|
||||
assert_violations() {
|
||||
local tool=$1; shift
|
||||
local args=$1; shift
|
||||
local expected_count=$1; shift
|
||||
local resp
|
||||
resp=$(call_tool "$tool" "$args")
|
||||
local got_count
|
||||
got_count=$(printf '%s' "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
if [ "$got_count" = "$expected_count" ]; then
|
||||
ok "$tool: count=$got_count (expected $expected_count)"
|
||||
else
|
||||
fail "$tool: count=$got_count (expected $expected_count) — full response: $resp"
|
||||
return
|
||||
fi
|
||||
for want in "$@"; do
|
||||
if printf '%s' "$resp" | python3 -c "import json,sys; ids=[v['id'] for v in json.load(sys.stdin)['violations']]; print('YES' if '$want' in ids else 'NO')" \
|
||||
2>/dev/null | grep -q YES; then
|
||||
ok "$tool: contains id=$want"
|
||||
else
|
||||
fail "$tool: missing id=$want (full response: $resp)"
|
||||
fi
|
||||
done
|
||||
}
|
||||
|
||||
section "Layer A — direct consistency tool calls (no LLM)"
|
||||
|
||||
assert_violations "find_contradictions" '{"severity":"any"}' 1 c_aldric_double_membership
|
||||
assert_violations "find_anachronisms" '{"severity":"any"}' 1 a_vex_at_founding
|
||||
assert_violations "find_orphans" '{}' 1 o_unfinished_npc
|
||||
assert_violations "find_ontology_violations" '{"severity":"any"}' 2 ov_theron_no_died ov_maric_no_died
|
||||
|
||||
# Severity breakdown — task body says "the orphan being a warning, not error".
|
||||
section "Layer A — severity breakdown"
|
||||
contradictions_err=$(call_tool "find_contradictions" '{"severity":"error"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
contradictions_warn=$(call_tool "find_contradictions" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
[ "$contradictions_err" = "1" ] && ok "find_contradictions severity=error -> 1" || fail "find_contradictions severity=error -> $contradictions_err (expected 1)"
|
||||
[ "$contradictions_warn" = "0" ] && ok "find_contradictions severity=warn -> 0" || fail "find_contradictions severity=warn -> $contradictions_warn (expected 0)"
|
||||
anach_err=$(call_tool "find_anachronisms" '{"severity":"error"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
anach_warn=$(call_tool "find_anachronisms" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
[ "$anach_err" = "1" ] && ok "find_anachronisms severity=error -> 1" || fail "find_anachronisms severity=error -> $anach_err (expected 1)"
|
||||
[ "$anach_warn" = "0" ] && ok "find_anachronisms severity=warn -> 0" || fail "find_anachronisms severity=warn -> $anach_warn (expected 0)"
|
||||
# Orphans: 1 warn (the task body specifies this is a warn, not error).
|
||||
orphan_severity=$(call_tool "find_orphans" '{}' | python3 -c "import json,sys; d=json.load(sys.stdin); print(','.join(v['severity'] for v in d['violations']))")
|
||||
if [ "$orphan_severity" = "warn" ]; then
|
||||
ok "find_orphans -> severity=warn (orphan is a warn, not error)"
|
||||
else
|
||||
fail "find_orphans -> severity=[$orphan_severity] (expected 'warn')"
|
||||
fi
|
||||
# Ontology: 2 warn
|
||||
ont_warn=$(call_tool "find_ontology_violations" '{"severity":"warn"}' | python3 -c "import json,sys; print(json.load(sys.stdin)['count'])")
|
||||
[ "$ont_warn" = "2" ] && ok "find_ontology_violations severity=warn -> 2" || fail "find_ontology_violations severity=warn -> $ont_warn (expected 2)"
|
||||
|
||||
# Total
|
||||
total_err=0
|
||||
total_warn=0
|
||||
for t in find_contradictions find_anachronisms find_orphans find_ontology_violations; do
|
||||
args='{"severity":"any"}'
|
||||
[ "$t" = "find_orphans" ] && args='{}'
|
||||
e=$(call_tool "$t" "$args" | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for v in d['violations'] if v['severity']=='error'))")
|
||||
w=$(call_tool "$t" "$args" | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for v in d['violations'] if v['severity']=='warn'))")
|
||||
total_err=$((total_err+e))
|
||||
total_warn=$((total_warn+w))
|
||||
done
|
||||
total=$((total_err+total_warn))
|
||||
[ "$total" = "5" ] && ok "TOTAL violations = 5 (2 error + 3 warn)" \
|
||||
|| fail "TOTAL violations = $total (expected 5)"
|
||||
[ "$total_err" = "2" ] && ok "TOTAL errors = 2" || fail "TOTAL errors = $total_err (expected 2)"
|
||||
[ "$total_warn" = "3" ] && ok "TOTAL warns = 3" || fail "TOTAL warns = $total_warn (expected 3)"
|
||||
|
||||
# ─── Layer B: LLM consumer — 5 question types ────────────────────────────────
|
||||
|
||||
section "Layer B — LLM consumer (5 question types)"
|
||||
|
||||
declare -a IDS=(
|
||||
"q1_who_is_aldric"
|
||||
"q2_was_allied_230"
|
||||
"q3_aldric_ancestors"
|
||||
"q4_images_of_aldric"
|
||||
"q5_consistency_issues"
|
||||
)
|
||||
declare -a QS=(
|
||||
"Who is Aldric Raventhorne? Give a brief bio and his known relations."
|
||||
"Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"
|
||||
"What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."
|
||||
"Show me images of Aldric Raventhorne — portraits or otherwise."
|
||||
"What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."
|
||||
)
|
||||
|
||||
for i in "${!IDS[@]}"; do
|
||||
id="${IDS[$i]}"
|
||||
q="${QS[$i]}"
|
||||
echo
|
||||
echo "── question $((i+1))/5: $id ──"
|
||||
echo " Q: $q"
|
||||
if ! python3 llm_consumer.py --question-id "$id" --question "$q" \
|
||||
--out "results/${id}.json" >"/tmp/llm_consumer_${id}.log" 2>&1; then
|
||||
fail "Q$((i+1)) ($id): llm_consumer.py exited non-zero — see /tmp/llm_consumer_${id}.log"
|
||||
tail -5 "/tmp/llm_consumer_${id}.log" | sed 's/^/ /'
|
||||
continue
|
||||
fi
|
||||
tail -8 "/tmp/llm_consumer_${id}.log"
|
||||
ok "Q$((i+1)) ($id): llm_consumer.py exit=0"
|
||||
done
|
||||
|
||||
# ─── Answer-level assertions against GROUND_TRUTH.md ─────────────────────────
|
||||
|
||||
section "Layer B — answer-level assertions against GROUND_TRUTH.md"
|
||||
|
||||
# Helper: read a trace and emit its (answer_lower, tools_csv) on two lines.
|
||||
trace_info() {
|
||||
local trace_path=$1
|
||||
python3 -c "
|
||||
import json
|
||||
d = json.load(open('$trace_path'))
|
||||
ans = (d.get('answer') or '').lower()
|
||||
tools = [t['tool'] for t in d.get('tools_called', [])]
|
||||
print(ans)
|
||||
print('---TOOLS---')
|
||||
print(','.join(tools))
|
||||
"
|
||||
}
|
||||
|
||||
# Q1: entity_context called, answer has Aldric + a known affiliation.
|
||||
if [ -f "results/q1_who_is_aldric.json" ]; then
|
||||
trace=$(trace_info "results/q1_who_is_aldric.json")
|
||||
q1_ans=${trace%%$'---TOOLS---'*}
|
||||
q1_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
|
||||
echo " Q1 tools: $q1_tools"
|
||||
if [[ "$q1_tools" == *entity_context* ]]; then ok "Q1: entity_context in tools_called"; else fail "Q1: entity_context NOT called (got: $q1_tools)"; fi
|
||||
if printf '%s' "$q1_ans" | grep -qi 'aldric'; then ok "Q1: answer mentions 'aldric'"; else fail "Q1: answer missing 'aldric'"; fi
|
||||
if printf '%s' "$q1_ans" | grep -Eqi 'vyr|thornwall|elara|valdorni|eventide'; then
|
||||
ok "Q1: answer mentions a known affiliation (Vyr/Thornwall/Elara/Valdorni/Eventide)"
|
||||
else
|
||||
fail "Q1: answer missing known affiliation"
|
||||
fi
|
||||
else
|
||||
fail "Q1: results/q1_who_is_aldric.json missing (LLM consumer failed)"
|
||||
fi
|
||||
|
||||
# Q2: was_true_at called, answer says YES/allied/true.
|
||||
if [ -f "results/q2_was_allied_230.json" ]; then
|
||||
trace=$(trace_info "results/q2_was_allied_230.json")
|
||||
q2_ans=${trace%%$'---TOOLS---'*}
|
||||
q2_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
|
||||
echo " Q2 tools: $q2_tools"
|
||||
if [[ "$q2_tools" == *was_true_at* ]]; then ok "Q2: was_true_at in tools_called"; else fail "Q2: was_true_at NOT called (got: $q2_tools)"; fi
|
||||
if printf '%s' "$q2_ans" | grep -Eqi 'yes|allied|true|in force|was an alliance'; then
|
||||
ok "Q2: answer indicates YES/allied/true"
|
||||
else
|
||||
fail "Q2: answer missing YES/allied/true"
|
||||
fi
|
||||
else
|
||||
fail "Q2: results/q2_was_allied_230.json missing (LLM consumer failed)"
|
||||
fi
|
||||
|
||||
# Q3: ancestors_of called, answer names >=3 of {Theron, Maric, Cael, Yssa}.
|
||||
if [ -f "results/q3_aldric_ancestors.json" ]; then
|
||||
trace=$(trace_info "results/q3_aldric_ancestors.json")
|
||||
q3_ans=${trace%%$'---TOOLS---'*}
|
||||
q3_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
|
||||
echo " Q3 tools: $q3_tools"
|
||||
if [[ "$q3_tools" == *ancestors_of* ]]; then ok "Q3: ancestors_of in tools_called"; else fail "Q3: ancestors_of NOT called (got: $q3_tools)"; fi
|
||||
found=0
|
||||
for n in theron maric cael yssa; do
|
||||
if printf '%s' "$q3_ans" | grep -qi "$n"; then found=$((found+1)); fi
|
||||
done
|
||||
if [ "$found" -ge 3 ]; then ok "Q3: answer names $found/4 canonical ancestors (need >=3)"; else fail "Q3: answer names only $found/4 canonical ancestors (need >=3)"; fi
|
||||
else
|
||||
fail "Q3: results/q3_aldric_ancestors.json missing (LLM consumer failed)"
|
||||
fi
|
||||
|
||||
# Q4: image-recall tool called, answer mentions Aldric + portrait/image/etc.
|
||||
if [ -f "results/q4_images_of_aldric.json" ]; then
|
||||
trace=$(trace_info "results/q4_images_of_aldric.json")
|
||||
q4_ans=${trace%%$'---TOOLS---'*}
|
||||
q4_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
|
||||
echo " Q4 tools: $q4_tools"
|
||||
if [[ "$q4_tools" == *recall_images* || "$q4_tools" == *search_images_by_caption* || "$q4_tools" == *search_images_semantic* ]]; then
|
||||
ok "Q4: image-recall tool in tools_called"
|
||||
else
|
||||
fail "Q4: no image-recall tool called (got: $q4_tools)"
|
||||
fi
|
||||
if printf '%s' "$q4_ans" | grep -qi 'aldric'; then ok "Q4: answer mentions 'aldric'"; else fail "Q4: answer missing 'aldric'"; fi
|
||||
if printf '%s' "$q4_ans" | grep -Eqi 'portrait|image|presigned|thornwall'; then
|
||||
ok "Q4: answer mentions portrait/image/presigned/thornwall"
|
||||
else
|
||||
fail "Q4: answer missing portrait/image/presigned/thornwall"
|
||||
fi
|
||||
else
|
||||
fail "Q4: results/q4_images_of_aldric.json missing (LLM consumer failed)"
|
||||
fi
|
||||
|
||||
# Q5: all 4 consistency tools called; answer is NOT a "no issues" answer; mentions
|
||||
# canonical subject names and severity.
|
||||
if [ -f "results/q5_consistency_issues.json" ]; then
|
||||
trace=$(trace_info "results/q5_consistency_issues.json")
|
||||
q5_ans=${trace%%$'---TOOLS---'*}
|
||||
q5_tools=$(printf '%s' "$trace" | awk -F'---TOOLS---' '{print $2}')
|
||||
echo " Q5 tools: $q5_tools"
|
||||
missing=()
|
||||
for t in find_contradictions find_anachronisms find_orphans find_ontology_violations; do
|
||||
[[ "$q5_tools" == *"$t"* ]] || missing+=("$t")
|
||||
done
|
||||
if [ ${#missing[@]} -eq 0 ]; then
|
||||
ok "Q5: all 4 consistency tools in tools_called"
|
||||
else
|
||||
fail "Q5: missing tools: ${missing[*]}"
|
||||
fi
|
||||
# Must NOT say "no issues" — there are 5 seeded violations.
|
||||
if printf '%s' "$q5_ans" | grep -Eqi '(no|zero|none).{0,30}(open |detected |current )?(consistency |open )?(issues|problems|violations)'; then
|
||||
fail "Q5: answer incorrectly says 'no issues' — but 5 violations are seeded"
|
||||
else
|
||||
ok "Q5: answer does NOT claim 'no issues' (correct — 5 violations seeded)"
|
||||
fi
|
||||
subject_hits=0
|
||||
for n in aldric vex lyssa theron maric; do
|
||||
if printf '%s' "$q5_ans" | grep -qi "$n"; then subject_hits=$((subject_hits+1)); fi
|
||||
done
|
||||
if [ "$subject_hits" -ge 2 ]; then
|
||||
ok "Q5: answer mentions $subject_hits canonical subjects (need >=2)"
|
||||
else
|
||||
fail "Q5: answer mentions only $subject_hits canonical subjects (need >=2)"
|
||||
fi
|
||||
if printf '%s' "$q5_ans" | grep -Eqi 'severity|warn|warning|error'; then
|
||||
ok "Q5: answer acknowledges severity (warn/error)"
|
||||
else
|
||||
fail "Q5: answer does not acknowledge severity"
|
||||
fi
|
||||
else
|
||||
fail "Q5: results/q5_consistency_issues.json missing (LLM consumer failed)"
|
||||
fi
|
||||
|
||||
# ─── summary ─────────────────────────────────────────────────────────────────
|
||||
|
||||
echo
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
if [ "$fails" -eq 0 ]; then
|
||||
echo " PASS — $passes checks, 0 failures"
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
exit 0
|
||||
else
|
||||
echo " FAIL — $passes checks passed, $fails FAILED:"
|
||||
for d in "${FAIL_DETAILS[@]}"; do
|
||||
echo " - $d"
|
||||
done
|
||||
echo "════════════════════════════════════════════════════════════"
|
||||
exit 1
|
||||
fi
|
||||
Reference in New Issue
Block a user