# 12 — Storage Strategy: Which Data Goes Where The Lore Engine runs on [Cognee](https://github.com/topoteretes/cognee), which manages the storage abstraction. Cognee's default backend is **Neo4j 5.x** (graph) + **a vector store** (Qdrant or pgvector, Cognee chooses based on the deployment) + **Postgres** (metadata, sessions, task state). The Lore Engine does not manage the graph, vector, or metadata stores directly — it adds a domain layer that uses them. For v1.1, the Lore Engine **does** add its own Postgres tables (operational event logs, the `setting` table, retcon history, the template registry) on top of the Cognee-managed instance. This document describes which Lore Engine data goes where and why. ## The principle: Cognee manages substrate; Lore Engine manages domain Cognee provides the storage role split. The Lore Engine adds: - A **typed ontology** (the 36 labels and their constraints) registered as a Cognee data-model extension. - A **time model** (`time_in_window` UDF, era-tree helpers) on top of Cognee's query layer. - A **consistency engine** (4 rule categories) as a Cognee data-pipeline that materializes violation nodes. - A **template registry** (the `TypeTemplate` nodes, the template-watcher data-pipeline) that hot-reloads new domain types. - **Operational tables** in Postgres (the `setting` table, `lore_event`, `retcon`, `dialogue_log`, etc.) on top of Cognee's metadata Postgres. ## The three layers | Layer | What it holds | Who manages | |---|---|---| | **Cognee substrate** (Neo4j + vector store + Postgres metadata) | The graph, the embeddings, the sessions, the task state. Cognee adds its own `DataPoint`, `Chunk`, `Entity` types automatically. | Cognee (we don't manage these directly) | | **Lore Engine data-model extension** (added on top of Cognee's substrate) | The 36 typed labels (Person, Faction, Location, ..., Setting, Plane, DomainEntity), constraints, indexes, the `time_in_window` and `time_windows_overlap` UDFs. | The Lore Engine registers these as a Cognee data-model extension at startup. | | **Lore Engine operational tables** (in Cognee's Postgres) | The `setting` table, `lore_event`, `retcon`, `dialogue_log`, the template registry. The world's operational event history. | The Lore Engine writes to its own tables through Cognee's Postgres adapter. | The distinction matters: we don't manage Neo4j or the vector store; we manage the typed labels and the operational tables on top. Cognee handles connection pooling, schema migrations for its own types, embedding refresh, and the cross-store query layer. ## What the Lore Engine adds on top of the Cognee graph The macro world graph lives in the Cognee-managed graph store (**Neo4j** — ADR 0008). The Lore Engine's data-model extension adds the typed labels and constraints on top. Anything where the LLM will say *"traverse from A"* or *"find all X related to Y"* or *"is X connected to Y?"* is a typed node or edge in the graph. - **Core entities (v1):** Person, Faction, Location, Item, Era, Date, Lineage, Culture, Deity, Language, MagicSystem, Title, Region, Material. - **Planes (v1.2):** `Setting` and `Plane` are first-class graph nodes. Plane relations (`REFLECTS`, `LAYER_OF`, `ADJACENT_TO`, `ACCESSIBLE_VIA`) are first-class edges. The `EXISTS_IN` and `LOCATED_IN` relations between entities and planes are stored here too. See `17-planes.md`. - **Time-bounded relations** between core entities: `RULED`, `MEMBER_OF`, `LOCATED_IN`, `PARTICIPATED_IN`, `ALLIED_WITH`, `POSSESSES`, etc. Always time-bounded. Always queryable via `time_in_window`. - **Polymorphic domain entities** (`:DomainEntity` with a `template_id`): a thieves-guild Mission, a war Campaign, a Spellbook, a TradeLot, a Ritual. The entity *itself* and its relations to other entities (Person, Faction, Location, other DomainEntities) live in the graph. - **Type templates** (`:TypeTemplate`): the YAML-defined schemas, stored as parsed JSON for the consistency engine and LLM to query. - **Violation nodes** (`:Contradiction`, `:Anachronism`, `:Orphan`, `:OntologyViolation`, `:ConsistencyRun`): the consistency engine's output. - **Lore source metadata** (`:LoreSource`): title, source_type, author, ingested_at, version. The *text* lives in Cognee-managed chunk storage; the metadata is in the graph. - **Indexes:** all property indexes from `01-ontology.md` and `08-architecture.md`. Plus `(:DomainEntity).type`, `(:DomainEntity).setting_id`, `(:Relation).type`, `(:Relation).valid_from/until`. **What does NOT live as a typed graph node:** - The full text of a lore source. (Cognee chunks it for embedding; we don't store the text in our typed graph.) - The step-by-step log of a mission. (Goes in our Postgres operational tables; only the *aggregate outcome* lives in the graph as the Mission node.) - High-volume time-series operational data. (Goes in Postgres; not in the graph.) ## What goes in PostgreSQL Operational records that are *append-mostly*, *high-volume*, and *not primarily about relationships*. The shape that Postgres handles well: rows of typed columns, indexed on time, with foreign keys back to Neo4j IDs. > **v1.2 note:** the `setting` table (renamed from the v1.1 `world` table) is the v1.2 namespace that backs the `(:Setting)` and `(:Plane)` graph nodes. The `world_id` string property is deprecated in v1.2 in favor of structured `EXISTS_IN` graph edges; Postgres follows the same migration. See `17-planes.md`. ### Schema overview ```sql -- Setting (a v1.2 setting; renamed from v1.1's `world` table), version, and migration state CREATE TABLE setting ( id TEXT PRIMARY KEY, name TEXT NOT NULL, kind TEXT NOT NULL DEFAULT 'single_plane', -- 'single_plane' | 'multi_plane' current_era TEXT NOT NULL, -- canonical time string schema_version TEXT NOT NULL, created_at TIMESTAMPTZ DEFAULT now() ); -- Operational event log (every meaningful state change) CREATE TABLE lore_event ( id BIGSERIAL PRIMARY KEY, setting_id TEXT REFERENCES setting(id), event_type TEXT NOT NULL, -- 'mission_logged', 'trade_completed', 'army_moved', ... entity_id TEXT, -- DomainEntity.id from Neo4j entity_type TEXT, -- discriminator occurred_at TIMESTAMPTZ NOT NULL, in_fiction_time TEXT, -- canonical time string payload JSONB NOT NULL, -- type-specific structured data sources TEXT[], actor_id TEXT, -- who/what triggered this created_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX ON lore_event (setting_id, occurred_at DESC); CREATE INDEX ON lore_event (entity_id, occurred_at DESC); CREATE INDEX ON lore_event (event_type, occurred_at DESC); CREATE INDEX ON lore_event USING GIN (payload); -- Trade log (every lot, every transaction) CREATE TABLE trade_log ( id BIGSERIAL PRIMARY KEY, setting_id TEXT REFERENCES setting(id), lot_id TEXT NOT NULL, item_id TEXT, -- DomainEntity.id of the Item or Material buyer_id TEXT, -- Person or Faction id seller_id TEXT, quantity NUMERIC, unit TEXT, -- 'gp', 'soulglass_shards', etc. unit_price NUMERIC, total_price NUMERIC, occurred_at TIMESTAMPTZ NOT NULL, in_fiction_time TEXT, location_id TEXT, -- Location.id secrecy TEXT, -- 'public', 'faction_internal', ... payload JSONB, -- type-specific extras sources TEXT[] ); CREATE INDEX ON trade_log (setting_id, occurred_at DESC); CREATE INDEX ON trade_log (lot_id); CREATE INDEX ON trade_log (buyer_id, occurred_at DESC); CREATE INDEX ON trade_log (seller_id, occurred_at DESC); CREATE INDEX ON trade_log (item_id, occurred_at DESC); CREATE INDEX ON trade_log (location_id, occurred_at DESC); CREATE INDEX ON trade_log USING GIN (payload); -- Mission step log (per-mission timeline of events) CREATE TABLE mission_log ( id BIGSERIAL PRIMARY KEY, mission_id TEXT NOT NULL, -- DomainEntity.id step_no INT NOT NULL, step_type TEXT, -- 'planned', 'briefed', 'infiltrated', 'completed', 'botched', 'paid' occurred_at TIMESTAMPTZ NOT NULL, in_fiction_time TEXT, party TEXT[], -- Person ids present location_id TEXT, outcome TEXT, notes TEXT, sources TEXT[], UNIQUE (mission_id, step_no) ); -- War campaign movement log CREATE TABLE campaign_event ( id BIGSERIAL PRIMARY KEY, campaign_id TEXT NOT NULL, -- DomainEntity.id of the Campaign event_type TEXT, -- 'army_moved', 'battle', 'siege_begun', 'siege_lifted', ... occurred_at TIMESTAMPTZ NOT NULL, in_fiction_time TEXT, faction_id TEXT, location_id TEXT, army_size INT, casualties INT, outcome TEXT, payload JSONB, sources TEXT[] ); CREATE INDEX ON campaign_event (campaign_id, occurred_at DESC); CREATE INDEX ON campaign_event (faction_id, occurred_at DESC); CREATE INDEX ON campaign_event (location_id, occurred_at DESC); -- MCP tool call log (for the consistency monitor + audit) CREATE TABLE tool_call ( id BIGSERIAL PRIMARY KEY, session_id TEXT, tool_name TEXT NOT NULL, arguments JSONB, result JSONB, duration_ms INT, error TEXT, called_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX ON tool_call (tool_name, called_at DESC); CREATE INDEX ON tool_call (session_id, called_at DESC); -- Retcon history (Kay's Q4) CREATE TABLE retcon ( id BIGSERIAL PRIMARY KEY, setting_id TEXT REFERENCES setting(id), target_kind TEXT, -- 'entity' | 'relation' | 'property' target_id TEXT NOT NULL, before JSONB, -- snapshot of what was there after JSONB, -- what it was changed to reason TEXT, actor_id TEXT, -- world-builder id retconned_at TIMESTAMPTZ DEFAULT now(), sources TEXT[] ); CREATE INDEX ON retcon (target_id, retconned_at DESC); -- NPC dialogue history (for NPC knowledge scoping) CREATE TABLE dialogue_log ( id BIGSERIAL PRIMARY KEY, setting_id TEXT REFERENCES setting(id), npc_id TEXT NOT NULL, -- Person.id session_id TEXT, message TEXT NOT NULL, in_fiction_time TEXT, occurred_at TIMESTAMPTZ DEFAULT now() ); CREATE INDEX ON dialogue_log (npc_id, occurred_at DESC); ``` These tables are **the Lore Engine's operational backbone** on top of Cognee's Postgres. They're what gets high-volume writes, transactional integrity, and time-series queries. ## What Cognee handles for us (out of our control) Cognee manages the following stores directly; the Lore Engine does not configure them. They are mentioned here for completeness so the LLM-tool author knows where the data is. - **Vector embeddings.** Cognee chunks the lore source text, embeds it, and indexes it. The Lore Engine queries this through `cognee.recall()` for the `lore_about` and `cite` tools. Cognee supports pgvector and Qdrant under the hood; the Lore Engine does not need to know which. - **Session state.** Cognee manages active MCP sessions, per-session context, and tool-call rate limits in its own store (typically Redis or in-memory, depending on the deployment). The Lore Engine's MCP server uses Cognee's session API. - **Ephemeral caches.** Cognee handles embedding caches, hot-reload pub/sub, and any other ephemeral state. - **Blob storage for source text.** Cognee stores the full text of ingested chunks. The Lore Engine never sees the full text — it only sees the typed graph nodes that Cognee extracted from the text. - **Cross-store transactions.** Cognee manages its own transaction model for writes that span the graph + Postgres + vector store. The Lore Engine calls `cognee.add()` / `cognee.cognify()` and Cognee handles atomicity. We do not need a custom saga layer. The implication: the Lore Engine's storage strategy is **simpler than the v1.1 plan called for**. The polyglot-persistence story collapses to "Cognee manages the substrate, the Lore Engine owns the typed ontology and the operational tables." ## The cross-store compose layer (Lore Engine's job) The MCP tools the LLM uses compose across the Lore Engine's typed graph + the operational Postgres tables. The engine handles the joins; the LLM sees a unified response. ### Example: "What was the Crimson Hand's biggest heist in Mardsville last year?" ``` LLM → tool: list_missions(filter_by={faction: "crimson_hand", location: "mardsville", since: "1_year_ago"}, sort_by="payout_gp", limit=5) Lore Engine: 1. Graph (Cognee): MATCH (m:DomainEntity {type: "ThievesGuildMission"}) -[:TARGETS]-> (loc:Location {name: "Mardsville"}) -[:GIVEN_BY]-> (npc:Person {name: "Vex the Silent"}) RETURN m 2. Operational Postgres: SELECT * FROM mission_log WHERE mission_id IN (...) ORDER BY step_no 3. Operational Postgres: SELECT * FROM trade_log WHERE lot_id IN (...) AND occurred_at > ... 4. Compose: return top 5 by payout_gp, with mission step timeline + trade details LLM: gets a unified response. Doesn't know it crossed the graph + Postgres. ``` ### Example: "What battles did the Vyrs lose?" ``` LLM → tool: list_campaign_events(filter_by={faction: "house_vyr", outcome: "loss"}) Lore Engine: 1. Graph: get the Campaign nodes tied to house_vyr 2. Operational Postgres: SELECT * FROM campaign_event WHERE campaign_id IN (...) AND outcome = 'loss' ORDER BY occurred_at DESC 3. Compose: return list with graph faction details + Postgres battle details LLM: unified response. ``` The engine exposes **composed tools** like `list_missions`, `list_campaign_events`. The LLM calls one tool; the engine fans out across the typed graph and the operational tables. ## The cross-store consistency story The consistency engine runs as a Cognee data-pipeline. A `:Contradiction` node in the graph can reference a row in `mission_log` or `campaign_event`. The rules that go cross-store: - *"A `:DomainEntity` of type `TradeLot` referenced in the graph must have a corresponding row in `trade_log`."* - *"A mission marked `status: 'completed'` in the graph must have a `step_type = 'completed'` row in `mission_log`."* - *"A campaign event's `army_size` in Postgres must be within 10% of the `:DomainEntity` aggregate of the participating factions' `Person.count`."* These rules are written in Cypher with Postgres lookups via Cognee's query API. They run in the nightly consistency pipeline. ## Why this is simpler than the 5-store v1.1 plan | Concern | Old v1.1 plan (5 stores) | Cognee-based v1.1 | |---|---|---| | Graph (people, factions, edges) | Hand-managed Neo4j | Cognee-managed Neo4j (ADR 0008) | | Operational event log | Hand-managed Postgres | Cognee-managed Postgres + Lore Engine tables | | Vector search | Hand-managed Qdrant OR pgvector | Cognee-managed (pgvector or Qdrant) | | Session state | Hand-managed Redis | Cognee-managed (Redis or in-memory) | | Source text blob storage | Hand-managed MinIO | Cognee-managed | | Cross-store transactions | Hand-written saga pattern | Cognee's transaction model | | Embedding refresh | Hand-written pipeline | Cognee-managed | The 5-store plan was a real engineering project. The Cognee plan collapses it to: **Cognee does the substrate, the Lore Engine adds the domain layer.** The Lore Engine only owns the typed ontology (registered as a Cognee data-model extension) and the operational tables (a handful of Postgres tables on top of Cognee's Postgres). ## What this is not - **Not a microservices overhaul.** Cognee runs in 1 Docker container; the Lore Engine extension is in-process. The engine still looks like one system to the LLM. - **Not "the Lore Engine does not own storage."** The Lore Engine owns the typed ontology, the operational tables, and the consistency rules. Cognee owns the substrate (graph, vector, sessions, source text, transactions). - **Not free.** Cognee is a real piece of infrastructure. On the existing 58GB dev host it's fine. On a Raspberry Pi it would be wrong (use a hosted Cognee instance or pick a lighter substrate). ## Summary The storage strategy is the part of the design that lets the engine scale to *the whole world*, not just the macro structure. Cognee is the **substrate** — the graph, the vectors, the sessions, the source text, the transactions. The Lore Engine is the **domain layer** — the 36 typed labels, the time model, the consistency engine, the operational tables, the TypeTemplate registry. Each piece is the right tool for its job. The Lore Engine is the integration layer that makes the substrate feel like one world.