Files
lore-engine-poc/CHANGELOG.md
kanban-dev 99535a8f3a docs(v2): T8 — update README + CHANGELOG + 3 worked-example docs
- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves'
  now lists consistency engine, multi-world namespace, LLM consumer;
  'next steps' section replaced with 'shipped in v2'
- docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified
  against live bash examples/test_consistency.sh
- docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both
  worlds + cross-world isolation tests, all output verified live
- docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all
  output traced to examples/results/*.json
- CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9)
- examples/test_e2e.sh: T7 E2E validation script (untracked)
2026-06-17 00:45:30 +00:00

7.2 KiB

Changelog

All notable changes to lore-engine-poc are recorded here. The format follows Keep a Changelog (Added / Changed / Fixed / Removed / Known limitations), and this file is grouped by major version — the v1 baseline that the POC launched with, and v2 which is the current state.

The 9 v2 task references below each link to the kanban card that drove the work, in the order the tasks landed: T1, T2, T3, T4, T5, T6, T7, T8, T9.


[v2] — 2026-06-16

The v2 milestone delivers the second half of the v1 roadmap and three extras: a real consistency engine, a multi-world namespace, and an LLM consumer that drives the gateway end-to-end. v2 is what bash test.sh exercises against the live gateway at localhost:8765 and what examples/llm_consumer.py drives from the LiteLLM proxy.

Added

  • plugins/embeddings.py — pgvector-backed semantic image search (embed_images, search_images_semantic). Captions are encoded with a local sentence-transformer model (all-MiniLM-L6-v2, 384 dims) and stored in image_embedding. Queries are matched via pgvector cosine distance (<=>). Background embedding on register_image; embed_images is idempotent. v2.T2.

  • plugins/consistency.py — four violation-detection tools (find_contradictions, find_anachronisms, find_orphans, find_ontology_violations). Returns a {violations, count} envelope per call. Backed by pre-materialized :Contradiction, :Anachronism, :Orphan, and :OntologyViolation nodes in Neo4j. The seed (seed.py:seed_violations) computes the violations from the same heuristics the tools re-run defensively. v2.T3 (skeleton) + v2.T5 (real rules).

  • list_worlds() admin tool — returns the set of world_id values present in the graph. Read by bash test.sh section 12 and by the v2.T7 E2E validation suite. v2.T6.

  • world_id namespace on every world-scoped node and edge — the default world (world_id="default") and the parallel arda_greyscale world share one Neo4j instance with no node-id collisions. Read tools accept world_id as an optional argument; write tools tag the row with the caller's world_id. v2.T6.

  • Parallel world seed: arda_greyscaleseed.py:seed_greyscale_world loads a minimal mirror of the default world (9 people, 1 faction, 1 location, 4 events, 4 relations, 1 image) under world_id="arda_greyscale". Idempotent. v2.T6.

  • LLM consumer (examples/llm_consumer.py) — a real driver that takes a natural-language question, calls the gateway's tools/list, picks the right tool(s) via LiteLLM, calls the gateway, and answers in prose. 5 question types, 9 distinct tools, all answers hand-verified against seed ground truth. v2.T4.

  • E2E validation (examples/test_e2e.sh + examples/E2E_REPORT.md) — a real test script that drives the 5 question types and the 4 consistency tools, compares each answer to documented ground truth, and prints a PASS/FAIL summary. v2.T7.

  • CI smoke (scripts/ci-smoke.sh + docs/SMOKE.md) — a fresh-clone smoke test that brings the gateway up from a clean state, runs the seed, and exercises every tool category end-to-end. v2.T1.

  • v2 docsdocs/CONSISTENCY_DEMO.md (5 hand-crafted violations from the live seed), docs/MULTI_WORLD_DEMO.md (the 2-world seed in action), docs/LLM_CONSUMER_DEMO.md (the 5 question types in detail). This file. v2.T8.

  • Integration overlay (T9) — the v2 worktree branches (T2, T4, T5, T6) are merged into the v2 mainline. bash test.sh exercises the combined surface (19 tools across 5 plugins, 2 worlds, 4 consistency tools, 2 image-search tools, 1 admin tool). v2.T9.

Changed

  • README.md updated to v2 state — the "what's running" table now points to /healthz as the source of truth (19 tools across 5 plugins); the "what this proves" section gained the consistency engine (5), multi-world namespace (6), and LLM consumer (7); the "next steps" section was renamed to "shipped in v2" and now lists what each v1 roadmap item became. v2.T8.

  • bash test.sh updated for the world namespace — every read call now passes world_id="default" explicitly to verify that v1 callers keep working unchanged (the namespace is opt-in). Added a 12th section that calls list_worlds(). v2.T6.

  • seed.py grew two new stagesseed_greyscale_world (the parallel world, v2.T6) and seed_violations (5 hand-crafted violations, v2.T5). Both are idempotent and safe to re-run.

  • tests/test_consistency.py and tests/test_multi_world.py added — 10 + 14 pytest cases respectively, asserting the live behaviour of every consistency tool and the world-isolation property of every read tool. v2.T5, v2.T6.

  • tests/test_embeddings_*.py and tests/test_register_image_hook.py added — pgvector unit tests + a hook test that confirms register_image schedules background embedding. v2.T2.

Known limitations (v2 → v3)

These are deliberate v2 boundaries; the v3 plan will address them:

  • No world-builder UI. Everything is curl and cypher-shell. The v2 dashboard is a separate repo. v3.

  • No reflective memory or behavior layer. The Stanford Generative Agents pattern (memory stream + reflection + planning) is a v3 borrow per lore-engine/docs/16-comparison.md. v3.

  • Consistency engine is rule-driven, not ML-driven. The five hand-crafted violations in v2 are seeded; an ML-derived detection surface (e.g. an LLM pass over the world summary) is a v3 item. v3.

  • No refresh / cache invalidation on world reseed. If a world is re-seeded, the embeddings for any new image manifest rows are computed on the next register_image or embed_images call; old embeddings are kept. A v3 refresh tool would let an operator force a full re-embed. v3.


[v1] — 2026-06-16 (baseline)

The initial proof of concept. Five-minute goal: prove that with mock data, we can run a multi-database backend (Neo4j + Postgres + MinIO) and expose it all through a plugin-driven MCP gateway where adding a new domain type is a new file in plugins/, not a Go change.

Added

  • docker-compose.yml — Neo4j 5.26, Postgres (later upgraded to pgvector in v2.T2), MinIO, and the gateway container.
  • seed.py — idempotent seeder for the default world (3 eras, 10 people, 3 factions, 4 locations, 4 items, 6 events, 1 lineage group, ~20 time-bounded relations, 3 trade log entries, 4 generated images).
  • plugins/world.pyentity_context, was_true_at, state_at (Neo4j).
  • plugins/lineage.pyancestors_of, descendants_of, lineage_of (Neo4j).
  • plugins/trade.pylog_trade, trades_by_buyer, market_price (Postgres).
  • plugins/images.pyregister_image, recall_images, search_images_by_caption (MinIO + Postgres + Neo4j).
  • server.py — the MCP-compatible JSON-RPC gateway, auto-loading every .py file in plugins/.
  • bash test.sh — the 12-section end-to-end smoke runner.
  • README.md (v1) — the original POC writeup.

Known limitations (v1 → v2)

  • Stub consistency tools (no detection rules).
  • No semantic image search.
  • No LLM in the loop.
  • Single world, no namespace.

All four items were addressed in v2.