Replaces the v1.1 'world_id is a string' model with a graph-of-planes. Driven by Kay's Q2 ('worlds are planes') and the v1.2 design review.
- 17-planes.md (NEW): the plane taxonomy, the four relation types, Cypher patterns, migration from world_id, open questions
- 01-ontology.md: Plane and Setting as first-class nodes; the 6 plane-relation edge types
- 02-time-model.md: plane-aware time (entity_planes_at_time as the 6th time-aware primitive)
- 08-architecture.md: data flow for plane questions (the 'can I get from X to Y' pattern)
- 11-extensibility.md: how to add custom planes and plane-relations without code
- 12-storage-strategy.md: planes are pure graph (no Postgres/Redis/Qdrant/S3 changes)
- 14-examples.md: example 5 — full Setting + planes + Roland + Asmodeus + LLM tool calls
- README.md: v1.2 entry + doc 17 in the table of contents
The POC rebuild (T10) is the next step: migrate the existing 4 world_id values to Setting/Plane nodes and update the plugin queries to use EXISTS_IN/LOCATED_IN.
18 KiB
08 — Architecture
The Lore Engine is a thin layer on top of the existing GraphMCP-Example stack. Same transport, same data substrate, same worker model — extended with new schema, new workers, new tools, and a consistency engine.
System diagram
┌─────────────────────────────────────┐
│ World-Builder Authoring │
│ markdown · YAML · dialogue JSON │
└──────────────┬──────────────────────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
▼ ▼ ▼
┌───────────────┐ ┌────────────────────┐ ┌────────────────────┐
│ prose path │ │ structured path │ │ dialogue path │
│ (LLM extract) │ │ (YAML parse) │ │ (HTTP POST) │
└───────┬───────┘ └─────────┬──────────┘ └─────────┬──────────┘
│ │ │
▼ ▼ ▼
Redis Streams: Redis Streams: Redis Streams:
raw.lore raw.structured raw.dialogue
│ │ │
▼ ▼ ▼
┌────────────────┐ ┌────────────────────┐ ┌────────────────────┐
│ lore-extractor │ │ structured-ingest │ │ dialogue-processor │
│ (Go worker) │ │ (Go worker) │ │ (Go worker) │
│ LLM extract │ │ YAML → Cypher │ │ Cypher write │
│ fuzzy │ │ exact │ │ exact │
└────────┬───────┘ └─────────┬──────────┘ └─────────┬──────────┘
│ │ │
└───────────────────────────┼──────────────────────────────┘
│
▼
┌─────────────────────────┐
│ Neo4j Graph │
│ ───────────────────── │
│ Person, Faction, │
│ Location, Era, │
│ Lineage, MagicSystem, │
│ Item, ... │
│ ───────────────────── │
│ + LoreChunk (vectors) │
│ + Encounter │
│ + Contradiction │
│ + Anachronism │
│ + Orphan │
└────────────┬────────────┘
│
│ Cypher / Vector queries
│
┌────────────▼────────────┐
│ mcp-server (Go) │
│ HTTP :9000 │
│ SSE :9000 │
│ ───────────────────── │
│ 8 inherited tools │
│ 22 new tools │
└────────────┬────────────┘
│ JSON-RPC over HTTP
│
┌────────────▼────────────┐
│ LLM Client │
│ (Claude, gpt-4, etc.) │
│ with system prompt │
│ from reasoning harness│
└─────────────────────────┘
Background jobs:
┌─────────────────────────┐ ┌─────────────────────────┐
│ consistency-runner │ │ consistency-monitor │
│ (cron, 03:00 daily) │ │ (HTTP /run-check) │
│ runs all rules │ │ ad-hoc rule run │
│ materializes nodes │ │ for live verification │
└─────────────────────────┘ └─────────────────────────┘
Data flow: a question
1. User → LLM Client
"Did House Vyr rule Valdorn in 340 TA?"
2. LLM Client → LLM (with system prompt + active context)
LLM picks tool: was_true_at(RULED, "House Vyr", "Valdorn", "3rd_age.year_340")
3. LLM Client → MCP Server (JSON-RPC POST /mcp)
{"method": "tools/call", "params": {"name": "was_true_at", "arguments": {...}}}
4. MCP Server → Neo4j
Cypher:
MATCH (f:Faction {name: "House Vyr"})-[r:RULED]->(l:Location {name: "Valdorn"})
WHERE time_in_window("3rd_age.year_340", r.valid_from, r.valid_until)
RETURN r, r.valid_from, r.valid_until, sources
5. Neo4j → MCP Server
[{valid_from: "3rd_age.year_312", valid_until: "3rd_age.year_360", sources: [...]}]
6. MCP Server → LLM Client (JSON-RPC response)
{"was_true": true, "valid_from": "...", "valid_until": "...", "sources": [...]}
7. LLM Client → LLM
Adds the result to its context.
8. LLM → User
"Yes — House Vyr ruled Valdorn from 312 TA to 360 TA, which covers 340 TA.
Sources: chronicles-vyr.md."
End-to-end latency target: <500ms for a single-tool call, <2s for a 3-tool chain.
Data flow: a plane question (v1.2)
The plane model (per 17-planes.md) is the new substrate for multi-setting, multi-plane worlds. A question that traverses planes looks like this:
1. User → LLM Client
"Can Asmodeus reach the Material Plane from the Nine Hells? And what planes
is the Roland of Mardonari connected to in 430 TA?"
2. LLM Client → LLM (with system prompt + active context)
LLM picks tools: list_accessible_targets(plane='nine_hells'),
entity_planes_at_time(entity='roland_raventhorne', at='3rd_age.year_430')
3. LLM Client → MCP Server (JSON-RPC POST /mcp)
Two tool calls, possibly chained.
4. MCP Server → Neo4j
Cypher for accessibility:
MATCH (start:Plane {id: 'mardonari.nine_hells'})-[:ACCESSIBLE_VIA|ADJACENT_TO*1..2]->(target:Plane)
RETURN DISTINCT target.id, target.kind
Cypher for time-bounded location:
MATCH (p:Person {id: 'roland_raventhorne'})-[r:LOCATED_IN]->(plane:Plane)
WHERE time_in_window('3rd_age.year_430', r.valid_from, r.valid_until)
RETURN plane.id, plane.kind, plane.name
5. Neo4j → MCP Server
Combined response: { reachable_planes: [...], rolands_planes_at_430: [...] }
6. MCP Server → LLM Client (JSON-RPC response)
7. LLM Client → LLM
Adds both results to its context.
8. LLM → User
"Yes — Plane Shift (a 7th-level spell) connects the Nine Hells to the Material.
And in 430 TA, Roland was in the Mardonari Material Plane (pottery workshop) and
had recently returned from a 2-year stint in Voldramir (a Mardonari demiplane)."
Plane relations (REFLECTS, LAYER_OF, ADJACENT_TO, ACCESSIBLE_VIA) make these questions a single Cypher traversal instead of a multi-step string-matching exercise.
Data flow: structured ingestion
1. World-builder writes timeline.yaml, family_tree.yaml, gazetteer.yaml.
2. World-builder → POST /ingest/structured (multipart)
Files attached, source_type per file.
3. HTTP server (mcp-server or a thin gateway) → Redis Stream raw.structured
Each file becomes a stream entry with the YAML body and source_type tag.
4. structured-ingest worker consumes the entry.
- Reads source_type, dispatches to the right parser.
- YAML parser validates against the per-type schema.
- Materializes Cypher (MERGE nodes, MERGE edges with time bounds).
- Tags the LoreSource with source_type: <type>.
5. consistency-runner (live mode) gets the write event.
- Runs the relevant anachronism + contradiction checks on the new edges.
- Materializes any new :Contradiction / :Anachronism / :Orphan nodes.
6. Neo4j state is now consistent.
The next MCP tool call sees the new data.
Services layout (extends GraphMCP-Example)
services/
├── ingestion-worker/ [inherited] markdown chunks + embeddings
├── lore-extractor/ [inherited] LLM entity extraction
├── entity-extractor/ [inherited] message entity extraction
├── encounter-processor/ [inherited] encounter graph writes
├── lore-watcher/ [inherited] ./lore-data/ watcher
├── discord-connector/ [inherited] Discord → raw.messages
├── mcp-server/ [extended] +22 new tools
│
├── structured-ingestor/ [NEW] YAML → Cypher
│ ├── main.go # dispatcher
│ ├── parsers/timeline.go # timeline.yaml
│ ├── parsers/family_tree.go # family_tree.yaml
│ ├── parsers/gazetteer.go # gazetteer.yaml
│ ├── parsers/bestiary.go # bestiary.yaml
│ ├── parsers/magic_system.go # magic_system.yaml
│ ├── parsers/culture.go # culture.yaml
│ └── parsers/validator.go # strict YAML validation
│
├── dialogue-processor/ [NEW] POST /ingest/dialogue → Cypher
│ └── main.go
│
├── consistency-runner/ [NEW] nightly batch
│ ├── main.go # cron entry point
│ ├── rules/source_contradictions.go # category A
│ ├── rules/anachronism.go # category B
│ ├── rules/ontology.go # category C (declarative)
│ └── rules/orphans.go # category D
│
├── consistency-monitor/ [NEW] HTTP /run-check
│ └── main.go # exposes run_consistency_check tool
│
└── era-tagger/ [NEW, optional] LLM-assisted era inference
└── main.go # backfills temporal_hint on prose-extracted edges
The era-tagger is optional and only used during the bootstrap phase — when prose has been ingested without canonical era tags. It calls an LLM to backfill temporal_hint on events that have Era: unknown in their temporal_hint. After the structured corpus is in, this worker can be disabled.
Schema bootstrap (new Cypher)
The full schema lives in schema/init.cypher (to be generated during the build phase). Key additions to the GraphMCP-Example schema:
// New label constraints
CREATE CONSTRAINT era_slug IF NOT EXISTS FOR (e:Era) REQUIRE e.slug IS UNIQUE;
CREATE CONSTRAINT calendar_name IF NOT EXISTS FOR (c:Calendar) REQUIRE c.name IS UNIQUE;
CREATE CONSTRAINT date_slug IF NOT EXISTS FOR (d:Date) REQUIRE d.slug IS UNIQUE;
CREATE CONSTRAINT lineage_name IF NOT EXISTS FOR (l:Lineage) REQUIRE l.name IS UNIQUE;
CREATE CONSTRAINT culture_name IF NOT EXISTS FOR (c:Culture) REQUIRE c.name IS UNIQUE;
CREATE CONSTRAINT magic_system_name IF NOT EXISTS FOR (m:MagicSystem) REQUIRE m.name IS UNIQUE;
CREATE CONSTRAINT language_name IF NOT EXISTS FOR (l:Language) REQUIRE l.name IS UNIQUE;
CREATE CONSTRAINT deity_name IF NOT EXISTS FOR (d:Deity) REQUIRE d.name IS UNIQUE;
CREATE CONSTRAINT spell_name IF NOT EXISTS FOR (s:Spell) REQUIRE s.name IS UNIQUE;
CREATE CONSTRAINT title_name IF NOT EXISTS FOR (t:Title) REQUIRE (t.name, t.domain) IS UNIQUE;
CREATE CONSTRAINT region_name IF NOT EXISTS FOR (r:Region) REQUIRE r.name IS UNIQUE;
CREATE CONSTRAINT material_name IF NOT EXISTS FOR (m:Material) REQUIRE m.name IS UNIQUE;
// New violation label constraints
CREATE CONSTRAINT anachronism_id IF NOT EXISTS FOR (a:Anachronism) REQUIRE a.id IS UNIQUE;
CREATE CONSTRAINT ontology_violation_id IF NOT EXISTS FOR (o:OntologyViolation) REQUIRE o.id IS UNIQUE;
CREATE CONSTRAINT orphan_id IF NOT EXISTS FOR (o:Orphan) REQUIRE o.id IS UNIQUE;
CREATE CONSTRAINT ontology_rule_id IF NOT EXISTS FOR (r:OntologyRule) REQUIRE r.id IS UNIQUE;
CREATE CONSTRAINT consistency_run_id IF NOT EXISTS FOR (c:ConsistencyRun) REQUIRE c.id IS UNIQUE;
// Composite index for the most common query: "what was X like at T?"
CREATE INDEX relation_time_window IF NOT EXISTS
FOR ()-[r:RULED|CONTROLS|LOCATED_IN|MEMBER_OF|PARTICIPATED_IN|ALLIED_WITH|ENEMY_OF|POSSESSES|SPOUSE_OF|PARENT_OF|WORSHIPS|PRACTICES|SPEAKS|BELONGS_TO|CLAIMS_TITLE]-()
ON (r.valid_from, r.valid_until);
// Index for era-tree traversal
CREATE INDEX era_parent IF NOT EXISTS FOR (e:Era) ON (e.parent_era);
CREATE INDEX era_slug_idx IF NOT EXISTS FOR (e:Era) ON (e.slug);
// Index for violation queries
CREATE INDEX anachronism_flagged IF NOT EXISTS FOR (a:Anachronism) ON (a.flagged);
CREATE INDEX ontology_violation_flagged IF NOT EXISTS FOR (o:OntologyViolation) ON (o.flagged);
CREATE INDEX orphan_flagged IF NOT EXISTS FOR (o:Orphan) ON (o.flagged);
User-defined functions (UDFs)
Two critical UDFs. They live in schema/udfs/:
time_in_window(t, valid_from, valid_until) → bool
The heart of the time model. See 02-time-model.md for the full specification.
// pseudocode, not runnable Cypher
RETURN time_in_window('3rd_age.year_345', '3rd_age.year_340', '3rd_age.year_352');
// → true
RETURN time_in_window('3rd_age.year_360', '3rd_age.year_340', '3rd_age.year_352');
// → false
Implementation notes:
- Resolves
currentagainst the:Nowconfig node. - Walks the era tree for parent-era membership.
- Treats
nullas open-ended. - Implementation will live in a Neo4j user-defined function (Java or Python) registered at startup.
time_windows_overlap(from_a, until_a, from_b, until_b) → bool
For contradiction detection. Two windows overlap unless one ends before the other starts.
RETURN time_windows_overlap('3rd_age.year_340', '3rd_age.year_360',
'3rd_age.year_345', '3rd_age.year_370');
// → true
The time_in_window and time_windows_overlap UDFs are the only ones the engine needs. Everything else is regular Cypher.
Hosting & deployment
The Lore Engine adds no new infrastructure dependencies on top of GraphMCP-Example:
- Neo4j 5.x — same instance, more schema. APOC plugin required for
apoc.create.addLabelsandapoc.merge.relationship. Already enabled in GraphMCP-Example. - Redis 7 — same instance, two new streams (
raw.structured,raw.dialogue). - LiteLLM proxy at
localhost:4000— used by the lore-extractor worker. New LLM calls go through the same proxy. Theera-taggerworker is the only new LLM consumer. - Go MCP server — same binary, new tool registrations.
The deployment story is: add the new workers to docker-compose.yml, add the new schema migration, restart, ingest. No new infrastructure.
Resource budget (delta from GraphMCP-Example)
| Component | Memory | Notes |
|---|---|---|
structured-ingestor |
~50MB | YAML parsing, no LLM. Light. |
consistency-runner (nightly) |
~200MB | Cypher eval, short-lived. |
consistency-monitor |
~30MB | HTTP, stateless. |
era-tagger (bootstrap only) |
~200MB | LLM calls, only during initial backfill. |
| Neo4j (additional indexes) | ~+200MB | New indexes are small. |
| Total delta | ~700MB | Negligible on the 58GB host. |
The Lore Engine is cheaper to run than the existing stack it extends.
What is intentionally not in this architecture
- No real-time updates. The consistency engine runs on a schedule and on demand. Streaming consistency (per-write checks) is for a future v2.
- No LLM in the read path. Tool calls are pure Cypher. Only
summarize_chainandnarrate_arccall an LLM, and only on the generation side. - No external world-data sources. Wikipedia imports, fantasy-name generators, etc. are explicitly out. The engine reasons about the world as defined by its sources, not the world at large.
- No user authentication. The MCP server is internal. The reasoning harness runs in a trusted context.
- No graph versioning. Edits overwrite. If you need history, that's a v2 feature (and a real cost in storage and Cypher complexity).
The architecture is a deliberate floor. The features we don't have are features we can add when we have evidence they're needed, not features we can remove once added.