- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves' now lists consistency engine, multi-world namespace, LLM consumer; 'next steps' section replaced with 'shipped in v2' - docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified against live bash examples/test_consistency.sh - docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both worlds + cross-world isolation tests, all output verified live - docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all output traced to examples/results/*.json - CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9) - examples/test_e2e.sh: T7 E2E validation script (untracked)
7.2 KiB
Changelog
All notable changes to lore-engine-poc are recorded here. The format
follows Keep a Changelog (Added / Changed /
Fixed / Removed / Known limitations), and this file is grouped by major
version — the v1 baseline that the POC launched with, and v2 which is the
current state.
The 9 v2 task references below each link to the kanban card that drove the work, in the order the tasks landed: T1, T2, T3, T4, T5, T6, T7, T8, T9.
[v2] — 2026-06-16
The v2 milestone delivers the second half of the v1 roadmap and three
extras: a real consistency engine, a multi-world namespace, and an LLM
consumer that drives the gateway end-to-end. v2 is what
bash test.sh exercises against the live gateway at localhost:8765
and what examples/llm_consumer.py drives from the LiteLLM proxy.
Added
-
plugins/embeddings.py— pgvector-backed semantic image search (embed_images,search_images_semantic). Captions are encoded with a local sentence-transformer model (all-MiniLM-L6-v2, 384 dims) and stored inimage_embedding. Queries are matched via pgvector cosine distance (<=>). Background embedding onregister_image;embed_imagesis idempotent. v2.T2. -
plugins/consistency.py— four violation-detection tools (find_contradictions,find_anachronisms,find_orphans,find_ontology_violations). Returns a{violations, count}envelope per call. Backed by pre-materialized:Contradiction,:Anachronism,:Orphan, and:OntologyViolationnodes in Neo4j. The seed (seed.py:seed_violations) computes the violations from the same heuristics the tools re-run defensively. v2.T3 (skeleton) + v2.T5 (real rules). -
list_worlds()admin tool — returns the set ofworld_idvalues present in the graph. Read bybash test.shsection 12 and by the v2.T7 E2E validation suite. v2.T6. -
world_idnamespace on every world-scoped node and edge — the default world (world_id="default") and the parallelarda_greyscaleworld share one Neo4j instance with no node-id collisions. Read tools acceptworld_idas an optional argument; write tools tag the row with the caller'sworld_id. v2.T6. -
Parallel world seed:
arda_greyscale—seed.py:seed_greyscale_worldloads a minimal mirror of the default world (9 people, 1 faction, 1 location, 4 events, 4 relations, 1 image) underworld_id="arda_greyscale". Idempotent. v2.T6. -
LLM consumer (
examples/llm_consumer.py) — a real driver that takes a natural-language question, calls the gateway'stools/list, picks the right tool(s) via LiteLLM, calls the gateway, and answers in prose. 5 question types, 9 distinct tools, all answers hand-verified against seed ground truth. v2.T4. -
E2E validation (
examples/test_e2e.sh+examples/E2E_REPORT.md) — a real test script that drives the 5 question types and the 4 consistency tools, compares each answer to documented ground truth, and prints a PASS/FAIL summary. v2.T7. -
CI smoke (
scripts/ci-smoke.sh+docs/SMOKE.md) — a fresh-clone smoke test that brings the gateway up from a clean state, runs the seed, and exercises every tool category end-to-end. v2.T1. -
v2 docs —
docs/CONSISTENCY_DEMO.md(5 hand-crafted violations from the live seed),docs/MULTI_WORLD_DEMO.md(the 2-world seed in action),docs/LLM_CONSUMER_DEMO.md(the 5 question types in detail). This file. v2.T8. -
Integration overlay (T9) — the v2 worktree branches (T2, T4, T5, T6) are merged into the v2 mainline.
bash test.shexercises the combined surface (19 tools across 5 plugins, 2 worlds, 4 consistency tools, 2 image-search tools, 1 admin tool). v2.T9.
Changed
-
README.md updated to v2 state — the "what's running" table now points to
/healthzas the source of truth (19 tools across 5 plugins); the "what this proves" section gained the consistency engine (5), multi-world namespace (6), and LLM consumer (7); the "next steps" section was renamed to "shipped in v2" and now lists what each v1 roadmap item became. v2.T8. -
bash test.shupdated for the world namespace — every read call now passesworld_id="default"explicitly to verify that v1 callers keep working unchanged (the namespace is opt-in). Added a 12th section that callslist_worlds(). v2.T6. -
seed.pygrew two new stages —seed_greyscale_world(the parallel world, v2.T6) andseed_violations(5 hand-crafted violations, v2.T5). Both are idempotent and safe to re-run. -
tests/test_consistency.pyandtests/test_multi_world.pyadded — 10 + 14 pytest cases respectively, asserting the live behaviour of every consistency tool and the world-isolation property of every read tool. v2.T5, v2.T6. -
tests/test_embeddings_*.pyandtests/test_register_image_hook.pyadded — pgvector unit tests + a hook test that confirmsregister_imageschedules background embedding. v2.T2.
Known limitations (v2 → v3)
These are deliberate v2 boundaries; the v3 plan will address them:
-
No world-builder UI. Everything is
curlandcypher-shell. The v2 dashboard is a separate repo. v3. -
No reflective memory or behavior layer. The Stanford Generative Agents pattern (memory stream + reflection + planning) is a v3 borrow per
lore-engine/docs/16-comparison.md. v3. -
Consistency engine is rule-driven, not ML-driven. The five hand-crafted violations in v2 are seeded; an ML-derived detection surface (e.g. an LLM pass over the world summary) is a v3 item. v3.
-
No refresh / cache invalidation on world reseed. If a world is re-seeded, the embeddings for any new image manifest rows are computed on the next
register_imageorembed_imagescall; old embeddings are kept. A v3 refresh tool would let an operator force a full re-embed. v3.
[v1] — 2026-06-16 (baseline)
The initial proof of concept. Five-minute goal: prove that with mock
data, we can run a multi-database backend (Neo4j + Postgres + MinIO) and
expose it all through a plugin-driven MCP gateway where adding a new
domain type is a new file in plugins/, not a Go change.
Added
docker-compose.yml— Neo4j 5.26, Postgres (later upgraded to pgvector in v2.T2), MinIO, and the gateway container.seed.py— idempotent seeder for the default world (3 eras, 10 people, 3 factions, 4 locations, 4 items, 6 events, 1 lineage group, ~20 time-bounded relations, 3 trade log entries, 4 generated images).plugins/world.py—entity_context,was_true_at,state_at(Neo4j).plugins/lineage.py—ancestors_of,descendants_of,lineage_of(Neo4j).plugins/trade.py—log_trade,trades_by_buyer,market_price(Postgres).plugins/images.py—register_image,recall_images,search_images_by_caption(MinIO + Postgres + Neo4j).server.py— the MCP-compatible JSON-RPC gateway, auto-loading every.pyfile inplugins/.bash test.sh— the 12-section end-to-end smoke runner.README.md(v1) — the original POC writeup.
Known limitations (v1 → v2)
- Stub consistency tools (no detection rules).
- No semantic image search.
- No LLM in the loop.
- Single world, no namespace.
All four items were addressed in v2.