Files

kanban-dev 99535a8f3a docs(v2): T8 — update README + CHANGELOG + 3 worked-example docs

- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves'
  now lists consistency engine, multi-world namespace, LLM consumer;
  'next steps' section replaced with 'shipped in v2'
- docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified
  against live bash examples/test_consistency.sh
- docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both
  worlds + cross-world isolation tests, all output verified live
- docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all
  output traced to examples/results/*.json
- CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9)
- examples/test_e2e.sh: T7 E2E validation script (untracked)

2026-06-17 00:45:30 +00:00

7.2 KiB

Raw Permalink Blame History

Changelog

All notable changes to lore-engine-poc are recorded here. The format follows Keep a Changelog (Added / Changed / Fixed / Removed / Known limitations), and this file is grouped by major version — the v1 baseline that the POC launched with, and v2 which is the current state.

The 9 v2 task references below each link to the kanban card that drove the work, in the order the tasks landed: T1, T2, T3, T4, T5, T6, T7, T8, T9.

[v2] — 2026-06-16

The v2 milestone delivers the second half of the v1 roadmap and three extras: a real consistency engine, a multi-world namespace, and an LLM consumer that drives the gateway end-to-end. v2 is what bash test.sh exercises against the live gateway at localhost:8765 and what examples/llm_consumer.py drives from the LiteLLM proxy.

Added

plugins/embeddings.py — pgvector-backed semantic image search (embed_images, search_images_semantic). Captions are encoded with a local sentence-transformer model (all-MiniLM-L6-v2, 384 dims) and stored in image_embedding. Queries are matched via pgvector cosine distance (<=>). Background embedding on register_image; embed_images is idempotent. v2.T2.
plugins/consistency.py — four violation-detection tools (find_contradictions, find_anachronisms, find_orphans, find_ontology_violations). Returns a {violations, count} envelope per call. Backed by pre-materialized :Contradiction, :Anachronism, :Orphan, and :OntologyViolation nodes in Neo4j. The seed (seed.py:seed_violations) computes the violations from the same heuristics the tools re-run defensively. v2.T3 (skeleton) + v2.T5 (real rules).
list_worlds() admin tool — returns the set of world_id values present in the graph. Read by bash test.sh section 12 and by the v2.T7 E2E validation suite. v2.T6.
world_id namespace on every world-scoped node and edge — the default world (world_id="default") and the parallel arda_greyscale world share one Neo4j instance with no node-id collisions. Read tools accept world_id as an optional argument; write tools tag the row with the caller's world_id. v2.T6.
Parallel world seed: arda_greyscale — seed.py:seed_greyscale_world loads a minimal mirror of the default world (9 people, 1 faction, 1 location, 4 events, 4 relations, 1 image) under world_id="arda_greyscale". Idempotent. v2.T6.
LLM consumer (examples/llm_consumer.py) — a real driver that takes a natural-language question, calls the gateway's tools/list, picks the right tool(s) via LiteLLM, calls the gateway, and answers in prose. 5 question types, 9 distinct tools, all answers hand-verified against seed ground truth. v2.T4.
E2E validation (examples/test_e2e.sh + examples/E2E_REPORT.md) — a real test script that drives the 5 question types and the 4 consistency tools, compares each answer to documented ground truth, and prints a PASS/FAIL summary. v2.T7.
CI smoke (scripts/ci-smoke.sh + docs/SMOKE.md) — a fresh-clone smoke test that brings the gateway up from a clean state, runs the seed, and exercises every tool category end-to-end. v2.T1.
v2 docs — docs/CONSISTENCY_DEMO.md (5 hand-crafted violations from the live seed), docs/MULTI_WORLD_DEMO.md (the 2-world seed in action), docs/LLM_CONSUMER_DEMO.md (the 5 question types in detail). This file. v2.T8.
Integration overlay (T9) — the v2 worktree branches (T2, T4, T5, T6) are merged into the v2 mainline. bash test.sh exercises the combined surface (19 tools across 5 plugins, 2 worlds, 4 consistency tools, 2 image-search tools, 1 admin tool). v2.T9.

Changed

README.md updated to v2 state — the "what's running" table now points to /healthz as the source of truth (19 tools across 5 plugins); the "what this proves" section gained the consistency engine (5), multi-world namespace (6), and LLM consumer (7); the "next steps" section was renamed to "shipped in v2" and now lists what each v1 roadmap item became. v2.T8.
bash test.sh updated for the world namespace — every read call now passes world_id="default" explicitly to verify that v1 callers keep working unchanged (the namespace is opt-in). Added a 12th section that calls list_worlds(). v2.T6.
seed.py grew two new stages — seed_greyscale_world (the parallel world, v2.T6) and seed_violations (5 hand-crafted violations, v2.T5). Both are idempotent and safe to re-run.
tests/test_consistency.py and tests/test_multi_world.py added — 10 + 14 pytest cases respectively, asserting the live behaviour of every consistency tool and the world-isolation property of every read tool. v2.T5, v2.T6.
tests/test_embeddings_*.py and tests/test_register_image_hook.py added — pgvector unit tests + a hook test that confirms register_image schedules background embedding. v2.T2.

Known limitations (v2 → v3)

These are deliberate v2 boundaries; the v3 plan will address them:

No world-builder UI. Everything is curl and cypher-shell. The v2 dashboard is a separate repo. v3.
No reflective memory or behavior layer. The Stanford Generative Agents pattern (memory stream + reflection + planning) is a v3 borrow per lore-engine/docs/16-comparison.md. v3.
Consistency engine is rule-driven, not ML-driven. The five hand-crafted violations in v2 are seeded; an ML-derived detection surface (e.g. an LLM pass over the world summary) is a v3 item. v3.
No refresh / cache invalidation on world reseed. If a world is re-seeded, the embeddings for any new image manifest rows are computed on the next register_image or embed_images call; old embeddings are kept. A v3 refresh tool would let an operator force a full re-embed. v3.

[v1] — 2026-06-16 (baseline)

The initial proof of concept. Five-minute goal: prove that with mock data, we can run a multi-database backend (Neo4j + Postgres + MinIO) and expose it all through a plugin-driven MCP gateway where adding a new domain type is a new file in plugins/, not a Go change.

Added

docker-compose.yml — Neo4j 5.26, Postgres (later upgraded to pgvector in v2.T2), MinIO, and the gateway container.
seed.py — idempotent seeder for the default world (3 eras, 10 people, 3 factions, 4 locations, 4 items, 6 events, 1 lineage group, ~20 time-bounded relations, 3 trade log entries, 4 generated images).
plugins/world.py — entity_context, was_true_at, state_at (Neo4j).
plugins/lineage.py — ancestors_of, descendants_of, lineage_of (Neo4j).
plugins/trade.py — log_trade, trades_by_buyer, market_price (Postgres).
plugins/images.py — register_image, recall_images, search_images_by_caption (MinIO + Postgres + Neo4j).
server.py — the MCP-compatible JSON-RPC gateway, auto-loading every .py file in plugins/.
bash test.sh — the 12-section end-to-end smoke runner.
README.md (v1) — the original POC writeup.

Known limitations (v1 → v2)

Stub consistency tools (no detection rules).
No semantic image search.
No LLM in the loop.
Single world, no namespace.

All four items were addressed in v2.

7.2 KiB Raw Permalink Blame History

Changelog

[v2] — 2026-06-16

Added

Changed

Known limitations (v2 → v3)

[v1] — 2026-06-16 (baseline)

Added

Known limitations (v1 → v2)

7.2 KiB

Raw Permalink Blame History