Ports the GraphMCP-Example substrate into lore-engine-poc: - 8 Go workers under workers/ (discord-connector, discord-filter, lore-watcher, ingestion-worker, entity-extractor, lore-extractor, encounter-processor, mcp-server), each with Dockerfile + go.mod - 3 Go unit-test files (encounter-processor, ingestion-worker, lore-extractor) — other 5 workers rely on integration tests via the live stack - plugins/nsc.py: thin httpx proxy from gateway to lore-mcp-server:9000, exposes all 11 inherited GraphMCP tools (input schemas verbatim from mcp-server/main.go) - docker-compose.yml: adds lore-redis + lore-mcp-server + the 7 worker services (lore- prefix to avoid clash with other GraphMCP stacks) - verify-merge.sh (171 LOC, 7 pass conditions) + docs/VERIFICATION.md - tests/contract/test_graphmcp_tool_contracts.py (15 tests; skipped when stack is down — TDD pattern, becomes active once docker compose up brings the stack) - README.md + test.sh updated for the merged service inventory Leader notes (2026-06-27 03:50): - Worker self-blocked review-required after 2 runs (run #7 hit 120/120 iteration budget; run #8 staged 40 files and reported shippable). - Tests are SKIPPED until docker compose up — worker chose that pattern over mocking (consistent with the lore-engine-poc project convention). To activate, run `docker compose up -d --build && pytest tests/contract/`. - File Scope reconciliation: story said gateway/plugins/nsc/__init__.py; worker shipped plugins/nsc.py (flat file). Justified by the existing plugins/ convention in lore-engine-poc (server.py glob("*.py")). A future PR could split nsc into a package once server.py learns __init__.py discovery. - nsc plugin exposes 11 tools (not 8) — the AC said "8" but the worker enumerated all 11 tools present in mcp-server/main.go. The encounter-specific 3 tools (list_encounters, search_encounters, get_encounter) were included for consistency. Story AC #2 reads "≥ 8 GraphMCP tools" so this exceeds AC. Refs: S2-phase-1-substrate-merge, milestone #64 P1 — Substrate merge
292 lines
13 KiB
Markdown
292 lines
13 KiB
Markdown
# lore-engine-poc
|
||
|
||
**Proof of concept for the Lore Engine v1.1 architecture.**
|
||
|
||
Five-minute goal: prove that with mock data, we can run a multi-database backend (Neo4j for the world graph, Postgres for operational records, MinIO for blob/image storage) and expose it all through a **plugin-driven MCP gateway** — where adding a new domain type is a new file in `plugins/`, not a Go change.
|
||
|
||
## What's running
|
||
|
||
| Container | Image | Port | Role |
|
||
|---|---|---|---|
|
||
| `lore-neo4j` | `neo4j:5.26-community` | 7475 (browser), 7688 (bolt) | The world graph: people, factions, eras, events, lineage, time-bounded relations |
|
||
| `lore-postgres` | `pgvector/pgvector:pg16` | 5434 | Trade log, image manifests, audit, image embeddings |
|
||
| `lore-minio` | `minio/minio:latest` | 9002 (S3), 9003 (console) | Image blob storage |
|
||
| `lore-redis` | `redis:7-alpine` | 6379 | Stream broker — 4 streams (raw.discord / raw.messages / raw.lore / raw.encounters) |
|
||
| `lore-gateway` | built locally | 8766 (MCP JSON-RPC) | The plugin-driven gateway — 31 tools across 7 plugins |
|
||
| `lore-mcp-server` | built from `workers/mcp-server/` (Go) | 9004 | The Go MCP server backing the `nsc` plugin |
|
||
| `lore-discord-filter` | Go | — | `raw.discord` → `raw.messages` (relevance filter) |
|
||
| `lore-ingestion-worker` | Go | 8081 | `raw.messages` → Chunk + LoreDocument + `raw.lore` |
|
||
| `lore-entity-extractor` + `-2` | Go | — | `raw.messages` → Entity (LLM-backed, twin-replica arbitration) |
|
||
| `lore-lore-extractor` + `-2` | Go | — | `raw.lore` → Entity (LLM-backed) |
|
||
| `lore-encounter-processor` + `-2` | Go | — | `raw.encounters` → Encounter + WITNESSED edges |
|
||
| `lore-lore-watcher` | Go | — | Filesystem watcher → POST `/ingest/lore` |
|
||
| `lore-discord-connector` | Go | — | Discord gateway → `raw.discord` (Phase 1: disabled) |
|
||
|
||
Port remap note: the host already runs the damascus stack on 5432/5433,
|
||
7474, 7687, 9000, 9001. The lore stack uses 5434, 7475, 7688, 9002, 9003,
|
||
8766, 6379 to coexist. Containers communicate on the internal Docker
|
||
network using the in-network service names (neo4j, postgres, minio, redis).
|
||
|
||
## The plugins (this is the proof)
|
||
|
||
```
|
||
plugins/
|
||
├── world.py # entity_context, was_true_at, state_at (Neo4j)
|
||
├── lineage.py # ancestors_of, descendants_of, lineage_of (Neo4j)
|
||
├── trade.py # log_trade, trades_by_buyer, market_price (Postgres)
|
||
├── images.py # register_image, recall_images, search_images_by_caption
|
||
│ # (MinIO + Postgres + Neo4j)
|
||
├── embeddings.py # embed_images, search_images_semantic (Postgres + pgvector)
|
||
├── consistency.py # find_contradictions, find_anachronisms, find_orphans,
|
||
│ # find_ontology_violations (Neo4j)
|
||
└── nsc.py # semantic_search, graph_traverse, get_context, get_person_profile,
|
||
# query_as_npc, log_encounter, get_unresolved, get_contradictions,
|
||
# list_encounters, search_encounters, get_encounter
|
||
# (NPC Scoping — proxies to the Go mcp-server; Phase 1 of the
|
||
# Lore Engine × GraphMCP substrate merge)
|
||
```
|
||
|
||
The gateway also exposes one admin tool for the world namespace: `list_worlds`.
|
||
|
||
Tool counts and plugin membership are reported live by the gateway itself —
|
||
`curl -s http://localhost:8766/healthz` returns the canonical list. After
|
||
Phase 1 (S2 substrate merge) the gateway reports **31 tools** across the 7
|
||
plugins above. See `docs/VERIFICATION.md` for the per-tool contract test
|
||
suite and `docs/LLM_CONSUMER_DEMO.md` for an end-to-end driver.
|
||
|
||
Each plugin is a single file with a `register(registry)` entry point. The
|
||
gateway auto-loads every `.py` file in `plugins/` at startup. **No server.py
|
||
change needed to add a new tool** — drop a new file in, restart the
|
||
container, the new tools appear in `tools/list`.
|
||
|
||
## How to run it
|
||
|
||
```bash
|
||
cd /root/lore-engine-poc
|
||
docker compose up -d --build
|
||
# wait ~30s for neo4j + postgres + minio to be ready
|
||
docker exec -i lore-neo4j cypher-shell -u neo4j -p lore-dev-password < neo4j/init.cypher
|
||
docker compose exec -T postgres psql -U lore -d lore < postgres/init.sql
|
||
python3 seed.py
|
||
# gateway is now live on :8765
|
||
```
|
||
|
||
The `seed.py` script is idempotent (uses `MERGE` and `ON CONFLICT`). It loads:
|
||
|
||
- 3 eras (1st Age, 2nd Age, Age of Iron)
|
||
- 10 people (Theron, Maric, Aldric, Elara, Cael, Yssa, Vex, Alessia, Kael, Guildmaster Torren)
|
||
- 3 factions (House Vyr, The Crimson Pact, Merchants Guild)
|
||
- 4 locations (Valdorn, Mardsville, Thornwall Keep, Black Spire Pass)
|
||
- 4 items (Sword of Eventide, The Pale Ledger, Ruby Eye of Kael, Elara's Locket)
|
||
- 6 events
|
||
- 1 lineage group
|
||
- ~20 time-bounded relations
|
||
- 3 trade log entries
|
||
- 4 generated images (portraits + landscape + battle scene) uploaded to MinIO
|
||
- 5 hand-crafted consistency violations pre-materialized as `:Contradiction`,
|
||
`:Anachronism`, `:Orphan`, and `:OntologyViolation` nodes (see
|
||
`docs/CONSISTENCY_DEMO.md`)
|
||
- 1 parallel world, `arda_greyscale` — a minimal mirror of the default
|
||
world with no overlapping node ids (see `docs/MULTI_WORLD_DEMO.md`)
|
||
|
||
## Try the gateway
|
||
|
||
### List all tools
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python3 -m json.tool
|
||
```
|
||
|
||
### Look up Aldric
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"entity_context","arguments":{"name":"Aldric Raventhorne"}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
### Time-bounded query: was House Vyr allied with the Merchants Guild in 230 TA?
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{
|
||
"name":"was_true_at",
|
||
"arguments":{
|
||
"relation":"ALLIED_WITH",
|
||
"subject":"House Vyr",
|
||
"object":"Merchants Guild",
|
||
"at_time":"2nd_age.year_230"
|
||
}
|
||
}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
### Lineage: Aldric's ancestors
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"ancestors_of","arguments":{"person":"Aldric Raventhorne","generations":5}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
### Image recall: show me pictures of Aldric
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"recall_images","arguments":{"entity_id":"aldric"}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
The response includes a `presigned_url` — a MinIO URL valid for 1 hour. The LLM (or the calling client) can fetch the actual PNG from there.
|
||
|
||
### Search images by caption
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"search_images_by_caption","arguments":{"q":"aldric"}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
### Semantic image search (pgvector)
|
||
|
||
The embeddings plugin encodes each image's caption into a 384-dim vector
|
||
with a local sentence-transformer model (`all-MiniLM-L6-v2`) and stores it
|
||
in Postgres via the `pgvector` extension. Queries are encoded the same
|
||
way and ranked by cosine distance. Unlike `search_images_by_caption`, this
|
||
works on natural-language descriptions and doesn't require keyword overlap.
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"search_images_semantic","arguments":{"q":"a noble lord with a scar"}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
Returns Aldric's portrait as the top match. Try `"a sneaky thief in a hood"`
|
||
for Vex. The first call triggers a one-time ~80MB model download on the
|
||
gateway host; subsequent calls are cached in `~/.cache/torch`.
|
||
|
||
If you add new images via `register_image`, embeddings are computed in
|
||
the background by a daemon thread on the gateway — no separate job queue
|
||
needed. Re-running `embed_images` is a no-op for images that already have
|
||
embeddings.
|
||
|
||
### Market price for the Pale Ledger
|
||
|
||
```bash
|
||
curl -s -X POST http://localhost:8766/mcp \
|
||
-H "Content-Type: application/json" \
|
||
-d '{
|
||
"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||
"params":{"name":"market_price","arguments":{"item_id":"pale_ledger"}}
|
||
}' | python3 -m json.tool
|
||
```
|
||
|
||
## What this proves
|
||
|
||
1. **The plugin boundary works.** A new domain type (trade, images, embeddings,
|
||
consistency) is a new file in `plugins/`. No change to `server.py`, no change
|
||
to docker-compose, no new container. Restart the gateway and the new tools
|
||
are live. The `consistency` plugin (added in v2.T5) is the most recent
|
||
example — four violation-detection tools, all in one file.
|
||
|
||
2. **Polyglot storage is real, not aspirational.** Neo4j holds the typed world
|
||
graph. Postgres holds the time-series operational data, image manifests, and
|
||
the `image_embedding` vectors (pgvector). MinIO holds the image bytes. Each
|
||
store does what it's good at; the gateway composes the answers.
|
||
|
||
3. **Time is a first-class query primitive.** `was_true_at` checks time-bounded
|
||
edges with a single Cypher query — no LLM, no inference. Year-level
|
||
precision works against the mock data (see `2nd_age.year_230` example above).
|
||
|
||
4. **Image recall works.** Images are stored in MinIO, linked to entities in
|
||
Neo4j (`(:Image)-[:DEPICTS]->(:Person)`), and discoverable by entity id, by
|
||
tag, by caption substring search, or by natural-language description via the
|
||
`search_images_semantic` (pgvector) tool. Presigned URLs are generated on
|
||
the fly.
|
||
|
||
5. **The consistency engine is real.** The four `find_*` tools query
|
||
pre-materialized violation nodes in Neo4j and return structured
|
||
`{violations, count}` envelopes — not booleans, not error strings. The
|
||
`seed.py:seed_violations` step computes the violations from the same
|
||
heuristics (overlapping `MEMBER_OF` windows, `Person.born > event_year`,
|
||
orphan entities, and `:OntologyRule`-driven checks) so the math is visible
|
||
in plain Python — not hidden in Cypher. See `docs/CONSISTENCY_DEMO.md` for
|
||
the five hand-crafted violations the seed surfaces.
|
||
|
||
6. **Multiple worlds live in one graph.** Every world-scoped node and edge
|
||
carries a `world_id` property, and the read tools accept a `world_id`
|
||
argument (defaulting to `"default"`). The v2.T6 seed loads a parallel
|
||
`arda_greyscale` world with no overlapping node ids, and
|
||
`list_worlds()` returns both. See `docs/MULTI_WORLD_DEMO.md` for the
|
||
worked example.
|
||
|
||
7. **An LLM can drive the whole surface.** `examples/llm_consumer.py` is a
|
||
real driver that takes a natural-language question, calls the gateway's
|
||
`tools/list`, picks the right tool(s), and answers in prose — all wired
|
||
through the local LiteLLM proxy. 5 question types × 9 distinct tools
|
||
exercised, all answers hand-verified against the seed. See
|
||
`docs/LLM_CONSUMER_DEMO.md` and `examples/REPORT.md`.
|
||
|
||
8. **The world is small but real.** 10 people + 9 greyscale-world people, 6
|
||
events, 5 images (4 default + 1 greyscale), ~20 relations — enough to
|
||
demonstrate the architecture end-to-end across two parallel worlds.
|
||
Scaling is a separate problem; this is the proof of shape.
|
||
|
||
## What's not in this POC
|
||
|
||
- **No LLM in the loop at runtime — the LLM consumer is a separate
|
||
example.** The MCP gateway itself is a tool server; the LLM client
|
||
(Claude, GPT, anything reachable via the LiteLLM proxy) is the consumer.
|
||
This is intentional — the POC validates the data and tool layers, not the
|
||
LLM reasoning. The reasoning harness is in the design docs
|
||
(`lore-engine/docs/07-reasoning-harness.md`); `examples/llm_consumer.py`
|
||
implements the v1.1 of that harness against the live gateway.
|
||
|
||
- **No world-builder UI.** Everything is `curl` and `cypher-shell`. The UI
|
||
is a v3 feature.
|
||
|
||
- **No reflective memory or behavior layer.** The Stanford Generative Agents
|
||
pattern (memory stream + reflection + planning) is a v3 borrow per the
|
||
comparison in `lore-engine/docs/16-comparison.md`.
|
||
|
||
## Shipped in v2
|
||
|
||
What was on the v1 "next steps" list, and what it became in v2:
|
||
|
||
- ~~Implement the consistency detection rules behind the 4 stub tools
|
||
(T5).~~ **Done** — see `plugins/consistency.py` and
|
||
`docs/CONSISTENCY_DEMO.md`. 4 tools, 5 violations surfaced from the seed.
|
||
- ~~Add the embedding-based semantic search plugin (uses the `Image.caption`
|
||
and any future `Person.summary` text).~~ **Done** — see `plugins/embeddings.py`
|
||
and `docs/LLM_CONSUMER_DEMO.md`. 384-dim MiniLM, pgvector cosine distance,
|
||
background embedding on `register_image`.
|
||
- ~~Add an LLM client that consumes the gateway with the reasoning harness
|
||
system prompt and runs the 5 question types from the design.~~ **Done** —
|
||
see `examples/llm_consumer.py` and `examples/REPORT.md`. 5 questions, 9
|
||
distinct tools, all hand-verified against seed ground truth.
|
||
- **v2 extras** not on the v1 list: the multi-world namespace with the
|
||
`arda_greyscale` parallel seed (T6); the `:OntologyViolation` rule-driven
|
||
detection in addition to the original three classes (T5); and a fresh-clone
|
||
smoke test (`scripts/ci-smoke.sh`) that exercises the gateway end-to-end
|
||
from a clean state (T1).
|