Files

Hermes Agent 7d81a761f9 docs(v1.2): planes as first-class graph nodes (Setting, Plane, EXISTS_IN, REFLECTS, LAYER_OF, ADJACENT_TO, ACCESSIBLE_VIA)

Replaces the v1.1 'world_id is a string' model with a graph-of-planes. Driven by Kay's Q2 ('worlds are planes') and the v1.2 design review.

- 17-planes.md (NEW): the plane taxonomy, the four relation types, Cypher patterns, migration from world_id, open questions
- 01-ontology.md: Plane and Setting as first-class nodes; the 6 plane-relation edge types
- 02-time-model.md: plane-aware time (entity_planes_at_time as the 6th time-aware primitive)
- 08-architecture.md: data flow for plane questions (the 'can I get from X to Y' pattern)
- 11-extensibility.md: how to add custom planes and plane-relations without code
- 12-storage-strategy.md: planes are pure graph (no Postgres/Redis/Qdrant/S3 changes)
- 14-examples.md: example 5 — full Setting + planes + Roland + Asmodeus + LLM tool calls
- README.md: v1.2 entry + doc 17 in the table of contents

The POC rebuild (T10) is the next step: migrate the existing 4 world_id values to Setting/Plane nodes and update the plugin queries to use EXISTS_IN/LOCATED_IN.

2026-06-17 03:17:15 +00:00

17 KiB

Raw Blame History

12 — Storage Strategy: Which Data Goes Where

The v1 design treats Neo4j as the universal substrate. For v1.1, with polymorphic domain entities, vector embeddings, time-series events, and high-volume operational logs (trade lots, mission outcomes, campaign movements), Neo4j alone is the wrong tool for the job. Different data has different access patterns, and forcing them all into one graph makes the graph bad at everything.

This document is the storage role split: which database stores which kind of data, why, and how the engine queries across them.

The principle: pick the right tool for the access pattern

Neo4j is excellent at: relationship traversal, graph pattern matching, recursive lineage, spatial aggregation. It is mediocre at: full-text search, large property blobs, high-volume time-series ingestion, free-form JSON querying.

If we force high-volume operational data (every trade, every mission step, every army movement) into Neo4j properties, the graph bloats, indexes fragment, and queries slow down. The right move is to store each kind of data where it naturally lives, and have the engine compose across stores.

The five stores

Store	What it holds	Why
Neo4j	The world graph. People, factions, locations, lineage, era trees, type templates, domain entities with relationships, time-bounded edges, ontology rules, violation nodes.	Graph traversal is the primary access pattern.
PostgreSQL	Operational records with structured schemas. Trade logs, mission step logs, campaign event streams, audit trails, world-builder write history, session state, MCP tool call logs.	Relational, high-volume, time-series-friendly, transactional.
Qdrant (or pgvector)	Vector embeddings of `LoreChunk`, `Message`, and `DomainEntity.summary` text.	Semantic search is the primary access pattern.
Redis	Active MCP session state, per-session `world_time`, tool-call rate-limit counters, in-flight transaction state, ephemeral caches.	Sub-millisecond, ephemeral.
S3-compatible object store (MinIO)	Full text of lore sources, images, audio, large attachments, archival snapshots.	Blob storage, cheap, durable.

Existing GraphMCP-Example has Neo4j + Redis + the LLM proxy. We add PostgreSQL (the big new one), Qdrant (or pgvector, for self-contained deployments), and MinIO (or any S3 bucket).

What goes in Neo4j

The macro world graph. Anything where the LLM will say "traverse from A" or "find all X related to Y" or "is X connected to Y?" — that lives in Neo4j.

Core entities: Person, Faction, Location, Item, Era, Date, Lineage, Culture, Deity, Language, MagicSystem, Title, Region, Material.
Planes (v1.2): Setting and Plane are first-class graph nodes. Plane relations (REFLECTS, LAYER_OF, ADJACENT_TO, ACCESSIBLE_VIA) are first-class edges. The EXISTS_IN and LOCATED_IN relations between entities and planes are stored here too. See 17-planes.md.
Time-bounded relations between core entities: RULED, MEMBER_OF, LOCATED_IN, PARTICIPATED_IN, ALLIED_WITH, POSSESSES, etc. Always time-bounded. Always queryable via time_in_window.
Polymorphic domain entities (:DomainEntity with a template_id): a thieves-guild Mission, a war Campaign, a Spellbook, a TradeLot, a Ritual. The entity itself and its relations to other entities (Person, Faction, Location, other DomainEntities) live in Neo4j.
Type templates (:TypeTemplate): the YAML-defined schemas, stored as parsed JSON for the consistency engine and LLM to query.
Violation nodes (:Contradiction, :Anachronism, :Orphan, :OntologyViolation, :ConsistencyRun): the consistency engine's output.
Lore source metadata (:LoreSource): title, source_type, author, ingested_at, version. The text lives in object storage; the metadata is in Neo4j.
Indexes: all property indexes from 01-ontology.md and 08-architecture.md. Plus (:DomainEntity).type, (:DomainEntity).world_id, (:Relation).type, (:Relation).valid_from/until.

What does NOT go in Neo4j:

The full text of a lore source. (Goes in S3, with a pointer in Neo4j.)
The full text of a domain entity's summary (above some length threshold). (Goes in S3; short summaries stay in Neo4j for semantic-search embedding.)
The step-by-step log of a mission. (Goes in Postgres; only the aggregate outcome lives in Neo4j as the Mission node.)
Vector embeddings. (Goes in Qdrant; Neo4j's vector index is OK but not great for high-volume semantic search.)
High-volume time-series operational data.

What goes in PostgreSQL

Operational records that are append-mostly, high-volume, and not primarily about relationships.

The shape that Postgres handles well: rows of typed columns, indexed on time, with foreign keys back to Neo4j IDs.

Schema overview

-- World, version, and migration state
CREATE TABLE world (
  id              TEXT PRIMARY KEY,
  name            TEXT NOT NULL,
  current_era     TEXT NOT NULL,            -- canonical time string
  schema_version  TEXT NOT NULL,
  created_at      TIMESTAMPTZ DEFAULT now()
);

-- Operational event log (every meaningful state change)
CREATE TABLE lore_event (
  id              BIGSERIAL PRIMARY KEY,
  world_id        TEXT REFERENCES world(id),
  event_type      TEXT NOT NULL,            -- 'mission_logged', 'trade_completed', 'army_moved', ...
  entity_id       TEXT,                      -- DomainEntity.id from Neo4j
  entity_type     TEXT,                      -- discriminator
  occurred_at     TIMESTAMPTZ NOT NULL,
  in_fiction_time TEXT,                      -- canonical time string
  payload         JSONB NOT NULL,            -- type-specific structured data
  sources         TEXT[],
  actor_id        TEXT,                      -- who/what triggered this
  created_at      TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON lore_event (world_id, occurred_at DESC);
CREATE INDEX ON lore_event (entity_id, occurred_at DESC);
CREATE INDEX ON lore_event (event_type, occurred_at DESC);
CREATE INDEX ON lore_event USING GIN (payload);

-- Trade log (every lot, every transaction)
CREATE TABLE trade_log (
  id              BIGSERIAL PRIMARY KEY,
  world_id        TEXT REFERENCES world(id),
  lot_id          TEXT NOT NULL,
  item_id         TEXT,                      -- DomainEntity.id of the Item or Material
  buyer_id        TEXT,                      -- Person or Faction id
  seller_id       TEXT,
  quantity        NUMERIC,
  unit            TEXT,                      -- 'gp', 'soulglass_shards', etc.
  unit_price      NUMERIC,
  total_price     NUMERIC,
  occurred_at     TIMESTAMPTZ NOT NULL,
  in_fiction_time TEXT,
  location_id     TEXT,                      -- Location.id
  secrecy         TEXT,                      -- 'public', 'faction_internal', ...
  payload         JSONB,                     -- type-specific extras
  sources         TEXT[]
);
CREATE INDEX ON trade_log (world_id, occurred_at DESC);
CREATE INDEX ON trade_log (lot_id);
CREATE INDEX ON trade_log (buyer_id, occurred_at DESC);
CREATE INDEX ON trade_log (seller_id, occurred_at DESC);
CREATE INDEX ON trade_log (item_id, occurred_at DESC);
CREATE INDEX ON trade_log (location_id, occurred_at DESC);
CREATE INDEX ON trade_log USING GIN (payload);

-- Mission step log (per-mission timeline of events)
CREATE TABLE mission_log (
  id              BIGSERIAL PRIMARY KEY,
  mission_id      TEXT NOT NULL,             -- DomainEntity.id
  step_no         INT NOT NULL,
  step_type       TEXT,                      -- 'planned', 'briefed', 'infiltrated', 'completed', 'botched', 'paid'
  occurred_at     TIMESTAMPTZ NOT NULL,
  in_fiction_time TEXT,
  party           TEXT[],                    -- Person ids present
  location_id     TEXT,
  outcome         TEXT,
  notes           TEXT,
  sources         TEXT[],
  UNIQUE (mission_id, step_no)
);

-- War campaign movement log
CREATE TABLE campaign_event (
  id              BIGSERIAL PRIMARY KEY,
  campaign_id     TEXT NOT NULL,             -- DomainEntity.id of the Campaign
  event_type      TEXT,                      -- 'army_moved', 'battle', 'siege_begun', 'siege_lifted', ...
  occurred_at     TIMESTAMPTZ NOT NULL,
  in_fiction_time TEXT,
  faction_id      TEXT,
  location_id     TEXT,
  army_size       INT,
  casualties      INT,
  outcome         TEXT,
  payload         JSONB,
  sources         TEXT[]
);
CREATE INDEX ON campaign_event (campaign_id, occurred_at DESC);
CREATE INDEX ON campaign_event (faction_id, occurred_at DESC);
CREATE INDEX ON campaign_event (location_id, occurred_at DESC);

-- MCP tool call log (for the consistency monitor + audit)
CREATE TABLE tool_call (
  id              BIGSERIAL PRIMARY KEY,
  session_id      TEXT,
  tool_name       TEXT NOT NULL,
  arguments       JSONB,
  result          JSONB,
  duration_ms     INT,
  error           TEXT,
  called_at       TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON tool_call (tool_name, called_at DESC);
CREATE INDEX ON tool_call (session_id, called_at DESC);

-- Retcon history (Kay's Q4)
CREATE TABLE retcon (
  id              BIGSERIAL PRIMARY KEY,
  world_id        TEXT REFERENCES world(id),
  target_kind     TEXT,                      -- 'entity' | 'relation' | 'property'
  target_id       TEXT NOT NULL,
  before          JSONB,                     -- snapshot of what was there
  after           JSONB,                     -- what it was changed to
  reason          TEXT,
  actor_id        TEXT,                      -- world-builder id
  retconned_at    TIMESTAMPTZ DEFAULT now(),
  sources         TEXT[]
);
CREATE INDEX ON retcon (target_id, retconned_at DESC);

-- NPC dialogue history (for NPC knowledge scoping)
CREATE TABLE dialogue_log (
  id              BIGSERIAL PRIMARY KEY,
  world_id        TEXT REFERENCES world(id),
  npc_id          TEXT NOT NULL,             -- Person.id
  session_id      TEXT,
  message         TEXT NOT NULL,
  in_fiction_time TEXT,
  occurred_at     TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX ON dialogue_log (npc_id, occurred_at DESC);

These tables are the operational backbone. They're what gets high-volume writes, transactional integrity, and time-series queries.

What goes in Qdrant (or pgvector)

Vector embeddings for semantic search. Three collections:

Collection	Source	Dimension	Use
`lore_chunks`	`(:LoreChunk).text`	768	Semantic search over lore documents.
`messages`	`(:Message).content`	768	Semantic search over dialogue.
`domain_summaries`	`(:DomainEntity).summary` (if present)	768	Semantic search over domain entities.

The first two inherit from GraphMCP-Example. The third is new and only populated for domain entities that opt in (the embedded: true field in the template).

Qdrant vs pgvector: Qdrant is faster and has better filtering. pgvector is simpler (one less service to run) and stays in the same DB family. For self-hosted homelab deployments where minimizing moving parts matters, pgvector is the right call for v1.1. We can swap to Qdrant later without changing the engine's API.

The lore-engine-Example already uses Neo4j's vector index. We can keep that for lore_chunks and add pgvector for domain_summaries, or migrate everything to pgvector. Decision: pgvector for everything in v1.1. Neo4j's vector index is fine but pgvector keeps our vector storage in one place.

What goes in Redis

Ephemeral state. Lost on restart, not backed up.

Active session context (replaces the in-process sessionRegistry in the existing MCP server). Each MCP session gets a Redis key with the active entity context, world_time override, and tool-call budget.
Tool-call rate limits (per session, per IP).
Embedding cache (frequently-searched queries → cached embedding).
Pub/sub for hot-reload notifications ("new template registered").
In-flight transactions (for multi-step writes that need atomicity across Neo4j + Postgres).

What goes in S3 (MinIO)

Full text of every LoreSource (the YAML/markdown the world-builder wrote).
Full text of every long DomainEntity.summary.
Attachments: images, audio, videos, large files referenced by lore.
Snapshots: world state exports, consistency engine reports, retcon bundles.

MinIO is the self-hosted S3. Same protocol, no AWS dependency.

The cross-store query layer

The MCP tools the LLM uses compose across stores. The engine handles the cross-store joins; the LLM sees a unified response.

Example: "What was the Crimson Hand's biggest heist in Mardsville last year?"

LLM → tool: list_missions(filter_by={faction: "crimson_hand", location: "mardsville", since: "1_year_ago"}, sort_by="payout_gp", limit=5)

Engine:
  1. Neo4j: MATCH (m:DomainEntity {type: "ThievesGuildMission"})
            -[:TARGETS|F2]-> (loc:Location {name: "Mardsville"})
            -[:LOGGED_IN]-> (lot:DomainEntity {type: "TradeLogEntry"})
            -[:GIVEN_BY]-> (npc:Person {faction: "Crimson Hand"})
            RETURN m, lot, npc
  2. Postgres: SELECT * FROM mission_log WHERE mission_id IN (...) ORDER BY step_no
  3. Postgres: SELECT * FROM trade_log WHERE lot_id IN (...) AND occurred_at > ...
  4. Compose: return top 5 by payout_gp, with mission step timeline + trade details

LLM: gets a unified response. Doesn't know it crossed 3 stores.

Example: "What battles did the Vyrs lose?"

LLM → tool: list_campaign_events(filter_by={faction: "house_vyr", outcome: "loss"})

Engine:
  1. Neo4j: get the Campaign nodes tied to house_vyr
  2. Postgres: SELECT * FROM campaign_event 
            WHERE campaign_id IN (...) AND outcome = 'loss'
            ORDER BY occurred_at DESC
  3. Compose: return list with Neo4j faction details + Postgres battle details

LLM: unified response.

The engine exposes composed tools like list_missions, list_campaign_events. The LLM calls one tool; the engine fans out across stores.

The cross-store consistency story

The consistency engine operates across stores. A :Contradiction node in Neo4j can reference a Postgres row. An OntologyRule in Neo4j can include Cypher that joins with a Postgres query (via Neo4j's apoc.load.jdbc).

The rules that go cross-store:

"A :DomainEntity of type TradeLot referenced in Neo4j must have a corresponding row in trade_log."
"A mission marked status: 'completed' in Neo4j must have a step_type = 'completed' row in mission_log."
"A campaign event's army_size in Postgres must be within 10% of the :DomainEntity aggregate of the participating factions' Person.count."

These rules are written in Cypher with apoc.load.jdbc calls. They run in the nightly batch.

Why this is better than one big Neo4j

Concern	Neo4j-only	Polyglot
High-volume writes (mission steps)	Bloats graph, slows down traversal	Postgres handles it cleanly
Time-series queries (battles over time)	Requires traversal every query	Postgres `GROUP BY occurred_at` is fast
Full-text search over millions of words	Slow, requires external index	pgvector or Qdrant, designed for it
Vector search	OK, but coupled to graph	Dedicated vector store, decoupled
Blob storage (full lore text, attachments)	Don't do this in Neo4j	S3, cheap, durable
Sub-millisecond ephemeral state	Possible but ugly	Redis, designed for it
Graph traversal and pattern matching	Excellent	Still excellent (Neo4j)

The graph stays the graph. Operational data lives where it belongs. The LLM gets unified responses via composed tools.

The cost: cross-store transactions

When the world-builder writes a new mission, we touch Neo4j (entity, relations), Postgres (mission_log row), and S3 (any attachments). These three writes are not atomic. A partial failure leaves the world in an inconsistent state.

Mitigation: the saga pattern.

saga: log_mission:
  step 1: Postgres INSERT INTO mission_log
  step 2: Neo4j MERGE (:DomainEntity) + relations
  step 3: S3 PUT attachments (if any)
  step 4: Neo4j MERGE (:ConsistencyRun {saga_id: ...}) SET status = 'committed'

  on failure at step 2: rollback step 1 (Postgres DELETE)
  on failure at step 3: mark mission as 'attachments_pending', retry later
  on failure at step 4: log to dead-letter queue, alert world-builder

Sagas are more code than a single transaction, but they're correct. The alternative — putting everything in Neo4j and hoping — is the trap.

What this is not

Not a microservices overhaul. The 5 stores run in 1 docker-compose stack. The engine still looks like one system to the LLM.
Not eventual-consistency-everywhere. Most operations are single-store. The saga is for the multi-store cases.
Not a "use Postgres for everything" anti-pattern. We use Postgres for what it's good at, Neo4j for what it's good at, and the cross-store compose layer for the rest.
Not free. Postgres + Qdrant + MinIO is ~3 more services. On the 58GB host, this is fine (~1-2GB extra). On a Raspberry Pi, it would be wrong.

Summary

The storage strategy is the part of the design that lets the engine scale to the whole world, not just the macro structure. Neo4j is the nervous system — the relations, the time model, the consistency engine. Postgres is the muscle memory — the high-volume operational data. Qdrant/pgvector is the cortex — the semantic search. Redis is the short-term memory — the session state. S3 is the archive — the durable storage.

Each store is the right tool for its job. The engine is the integration layer that makes them feel like one world.

17 KiB Raw Blame History