# 10 — Critique Pressure-test of the design. What could break, what's weak, where this could fail. I tried to find the holes; here they are, ranked by severity. ## Severity 1 (blockers) ### S1.1 — The `current` token is a global mutable The `current` reserved time token resolves against a single `:Now` config node. **This is a single point of failure and a synchronization nightmare in multi-user scenarios.** - If two sessions are running in different in-fiction times (e.g., a flashback scene and a present-day scene), they cannot both use `current` correctly. - If the world-builder forgets to update `:Now` after a time skip, every `current` query is wrong. **Fix:** every `current` query must accept an optional `world_time` parameter that overrides the `:Now` node. The LLM is told to use it for flashback scenes. The MCP server tracks per-session `world_time` in the active context. **Status:** acknowledged, fix in scope of Phase 2. ### S1.2 — `lore_verified: false` is a boolean, but reality is a spectrum A fact from one provisional source should be weighted differently from a fact from five contradictory sources. The boolean is too coarse. **Fix:** add a `source_confidence` float on every node and every edge, weighted by source document confidence, source agreement, and rule-engine consistency results. The LLM sees the float and phrases its answer accordingly ("high confidence" / "reported by one source, unconfirmed"). **Status:** partially fixed in `01-ontology.md` (we have `source_confidence` as a property). The `lore_verified` boolean stays as a coarse filter; the float is for nuanced reasoning. Need to formalize the weighting formula. ### S1.3 — Entity resolution at scale `loadKnownEntities` in the existing extractor loads 100 names and injects them into the prompt. **At 10,000 entities, this won't fit. At 100,000, the prompt is unusable.** **Fix:** the structured ingestion path bypasses this entirely (YAML is exact). For the prose path, we need a different strategy: - Pre-compute embeddings of entity names, retrieve top-K by similarity to the chunk being extracted. - Or: extract first, resolve second via a separate entity-linking model (cheaper LLM call). - Or: accept that the prose path doesn't scale beyond ~10K entities and force the world-builder to use YAML for the long tail. **Status:** known, fix required before Phase 2 ships at scale. Mitigation: structured YAML is exact; prose is fuzzy; the design is robust as long as the high-stakes data goes through structured paths. ## Severity 2 (design risks) ### S2.1 — Time model precision vs. lore granularity mismatch We chose `{era}.{year}` as the canonical format. Most prose says things like *"in the late Third Age"* — no year. If we force precision, the prose extractor either guesses (hallucination risk) or stamps everything as `3rd_age` (lossy). **Fix:** the prose extractor is told to use the *least specific* valid time the source supports. "Late Third Age" → `3rd_age` with a `precision: low` flag on the edge. The LLM is told that low-precision edges are not safe to use for narrow time-window claims. **Status:** documented, not yet implemented in the extraction prompt. Add to the prose-extractor update in Phase 2. ### S2.2 — The Consistency Engine will over-flag High-fantasy worlds are full of valid temporal overlaps (a person ruling two kingdoms through marriage, a faction that is both allied and at war with a third party via different treaties). The Category A rules will produce a flood of `Contradiction` nodes. **Fix:** - Default severity is `warn`, not `error`. A `warn` contradiction is "the world-builder should look at this," not "this is wrong." - The world-builder can mark a `warn` as `acknowledged` (a property on the `Contradiction` node), which suppresses future flagging. - Rules have a `confidence_threshold` parameter; below it, no violation is created. - A `disable_rules[]` list on the world config to silence specific rules per era or per region. **Status:** fix design complete, implementation in Phase 7. ### S2.3 — LLM cost on `summarize_chain` and `narrate_arc` These are the only LLM-in-the-loop read tools. They make multiple Cypher queries, then call an LLM to render prose. **At session scale, this is the single biggest cost driver.** **Fix:** - Default to no internal-LLM path. The LLM the user is talking to can do its own narrative synthesis from raw tool output. - `summarize_chain` is opt-in: the LLM must explicitly request it. - Future: cache `summarize_chain` results per `(entity, depth, style, world_time)` tuple. The world doesn't change for 95% of queries. **Status:** documented, gated behind explicit LLM request. ### S2.4 — The 45-tool surface is past the LLM's tool-use ceiling Empirically, LLMs start making poor tool choices past ~25 tools in the same system prompt. The current catalog is 8 inherited + 37 new = **45 tools**, well past the ceiling. **Fix:** - Phase 6 test: measure tool selection accuracy with all 45 vs. with the 8 most-used. If 8 is dramatically better, collapse the long tail. - Group tools by function in the system prompt (we already do this) and instruct the LLM to look at the relevant group first. - If still bad: collapse `state_at` into `entity_context` (with optional `comprehensive: true`), and `summarize_chain` into `narrate_arc` (with optional `style: bullets`). **Status:** acknowledged, in scope of Phase 10. ## Severity 3 (known limitations) ### S3.1 — Prophecy and unreliable narration aren't first-class A claim like "the prophecy says the Crimson Throne will fall" is in the graph as a `Claim` node (if at all), but the engine doesn't model who said it, how reliable they are, or whether it's come true. **Fix (v2):** add a `Claim` node label with `claimant`, `reliability`, `verification_status`, and `claimed_event` edges. The `cite` tool can return claims, not just chunks, and the LLM can answer "is the prophecy true?" with "the prophecy claims X, source: Aelar's temple, reliability: contested, no verification." **Status:** out of scope for v1. Documented as a v2 feature. ### S3.2 — Cross-world queries The engine is per-world. A future version might want to query across two worlds (for a multi-world campaign or a comparison). The schema doesn't support this — the `:Era` slugs aren't namespaced. **Fix (v1.2, resolved):** the v1.2 `Setting` and `Plane` graph nodes + `EXISTS_IN` edges replace the v1.1 flat `world_id` string namespace. Multi-setting queries are now supported via `Setting` filters and `EXISTS_IN` traversal. The "deferred to v2" framing in the v1 review is no longer accurate — the resolution is the v1.2 plane model. See `17-planes.md`. **Status:** deferred. ### S3.3 — The reasoning harness depends on the LLM reading it The system prompt is *instruction*, not *constraint*. The LLM can ignore it, especially under adversarial user pressure ("just give me an answer, don't worry about citations"). **Fix:** - The MCP server can enforce some rules (e.g., refuse `cite`-less answers via a "force citation" mode). - A "consistency-required" mode that rejects LLM tool calls inconsistent with the latest `:ConsistencyRun` result. - A user-facing UI that shows the LLM's tool-call trace, so a human can audit violations. **Status:** enforcement is a v2 feature. v1 relies on the LLM being well-behaved. ### S3.4 — The structured YAML format is a maintenance burden A world-builder has to learn YAML, follow a strict schema, and update it as the world evolves. The prose path is much easier: just write a story. **Fix:** - Phase 5: build a CLI `tea worldbuilder` with autocomplete, validation, and preview. - Phase 5: a web UI for editing YAML with type-ahead from existing entity names. - Phase 5: import-from-prose via the LLM (read a markdown chapter, propose a YAML diff, world-builder approves). **Status:** tooling is in scope but not the core design. ## Severity 4 (philosophical issues) ### S4.1 — The engine models the *written* world, not the *imagined* world A world-builder's mental model of their world is always richer than what's in any document. The engine can only reason about what's been ingested. **The LLM can never answer "what is the secret history of the Vyrs that the world-builder hasn't written down?" — because the engine has no record of unwritten facts.** This isn't a bug, it's a feature. The engine is *bounded* by its sources. The LLM should never invent to fill the gap. **Status:** explicit design choice. The system prompt says so. ### S4.2 — The "best" tool for the LLM is the one it actually uses We designed 45 tools (8 inherited + 37 new). The LLM might use 8 of them 95% of the time. The other 37 are dead weight — they bloat the system prompt and confuse the tool-selection logic. **Fix:** measure tool usage in Phase 6. Tools with <2% usage in test sessions get either promoted (made part of a higher-level tool) or pruned. The design is a *floor*, not a *ceiling*. We add tools; we don't take them away unless evidence says we should. **Status:** ongoing. Re-evaluate after Phase 10. ### S4.3 — "Historically accurate" is a moving target A world-builder changes the lore. The engine must absorb the change without breaking prior reasoning. We don't have a versioning model. **Fix (v2):** every node and edge has a `valid_from_version` / `valid_until_version` pair. Old queries can be replayed against a snapshot. The consistency engine can diff two versions and surface what changed. **Status:** deferred. v1 expects the world to evolve by `MERGE`, not by version. ## Open questions These are decisions I couldn't make alone. The world-builder should answer them before Phase 1. 1. ~~**How granular is the time model in practice?**~~ **Resolved (Q1):** year-level precision is the default, with optional month/day/event precision when the source supports it. The UDF and the storage cost are unchanged. 2. ~~**Are there multi-world / planar structures?**~~ **Resolved (Q2):** yes. The engine adds `Setting` and `Plane` graph nodes (v1.2); the v1.1 flat `world_id` string namespace is deprecated. Multi-setting queries are supported via `Setting` filters; planar relationships via `Plane`, `EXISTS_IN`, and the four plane-relation edge types (`REFLECTS`, `LAYER_OF`, `ADJACENT_TO`, `ACCESSIBLE_VIA`). See `17-planes.md`. 3. ~~**How are NPCs and PC players modeled?**~~ **Resolved (Q3):** separately. The `NPC`, `PC`, and `Human` labels in `01-ontology.md` cover this. The in-fiction `Person` is canonical; the wrappers track who controls it. 4. ~~**What's the policy on retconning?**~~ **Resolved (Q4):** preserve history by default. Old edges/nodes are marked `retconned` with a snapshot in the `retcon` Postgres table (`12-storage-strategy.md#postgres-schema`). Explicit `DELETE` is the only way to remove something permanently. 5. ~~**How is the world bootstrapped?**~~ **Resolved (Q5):** organically over a long period. The engine supports partial worlds (some eras defined, some not), and the consistency engine surfaces missing structural data as `:Orphan` nodes. No need to pre-define everything. 6. ~~**What's the confidence weighting formula?**~~ **Resolved (Q6):** more recent source wins. The `source_uploaded_at` (or `source_published_at` when known) is the tiebreaker. The engine stores both. When two prose sources disagree and both are recent, the rule engine surfaces the contradiction; it does not pick a winner automatically. 7. ~~**Are contradiction nodes user-facing?**~~ **Resolved (Q7):** the local engine is read-only for contradictions — the world-builder reviews them in a queue. An external source *may* be authorized to resolve contradictions later (e.g. a community lore-council with write access). The local engine never auto-resolves. ## Resolved-by-Kay decisions in v1.1 All 7 open questions are now resolved and reflected in: - `01-ontology.md` — adds `Plane`, `NPC`, `PC`, `Human` labels - `02-time-model.md` — year-level precision is the default - `12-storage-strategy.md` — `retcon` Postgres table for retcon history - `09-roadmap.md` — Phase 0 (pre-flight) now includes resolving these ## What this design is good at For balance: - **Time-aware queries.** The time model is the strongest part. The `time_in_window` UDF + era-tree membership + `current` resolution is a real primitive that solves the most common failure mode. - **Source attribution.** Every claim traces to a document. The LLM is told to cite. - **Structured ingestion.** The YAML path makes high-stakes data (lineage, era boundaries, faction rules) exact, not fuzzy. - **Modular tools.** Each tool does one job. Higher-level patterns are compositions, not mega-tools. - **Consistency surfacing.** The engine reports what it doesn't know as loudly as what it does. - **Polymorphic extension.** v1.1's `DomainEntity` + `TypeTemplate` model lets the world-builder add new domain types (thieves-guild missions, war campaigns, black markets) without code changes. ## What this design is not good at (yet) - **Scaling beyond ~10K entities on the prose path.** Entity resolution via prompt-injection doesn't scale. The structured path scales; the prose path doesn't. - **Prophecy, deception, unreliable narration.** v1 doesn't model these as first-class. - **Forcing the LLM to behave.** The reasoning harness is a contract, not an enforcement mechanism. - **User experience for world-builders.** v1 is CLI + YAML. UI is a v2. - **Versioning and retcon handling at the v1 level.** v1 mutates in place; v1.1's `retcon` table preserves history but the in-graph nodes still get `MERGE`'d. A v2 might use temporal versioning on the graph itself. - **Auto-resolution of cross-source conflicts.** v1.1 surfaces them; the world-builder resolves. ## v1.1 critique additions After the v1 review, the modularization question surfaced four new design risks worth recording. ### S1.4 (NEW, blocker) — Closed-world ontology ceiling The v1 ontology has roughly 36 hard-coded labels (7 base + v1 core incl. `Relation` per ADR 0009 + 2 v1.2 planes + 5 v1.1 polymorphic + 5 consistency). A thieves-guild mission is forced into `Event`, a war campaign is forced into `Faction`-with-properties, a black-market trade log is forced into `Item`-with-properties. The LLM can *talk* about these things, but the engine can't *reason* over their structure. **Fix:** the polymorphic `DomainEntity` wrapper + `TypeTemplate` data-defined schemas. See `11-extensibility.md`. This is the load-bearing change for "arbitrary new concept, define how it associates with larger constructs, but also have flexibility to get as detailed as we need." **Status:** resolved in v1.2 design. The polymorphic extension model is **shipped in the MVP** (it's how the v1 ontology becomes extensible without code). The template-watcher is a Cognee data-pipeline; the dynamic tool generator is part of the Lore Engine extension. Implementation is Phase 5 of the Cognee roadmap in `09-roadmap.md`. ### S1.5 (NEW, blocker) — Single mcp-server binary blocks iteration The original GraphMCP-Example `mcp-server/main.go` was a 1144-line single file. Adding a new tool meant editing main.go, recompiling, redeploying. The iteration loop for a world that's going to grow indefinitely is the cost of the entire program. **Fix:** switching to Cognee as the substrate. Cognee is the gateway; the Lore Engine is one in-process Python extension (one tool per Group file, registered at startup). Adding a new tool is a Python edit + Cognee restart (5 minutes). Adding a new domain type is a YAML file + hot-reload (no restart). See `13-microservice-decomposition.md`. **Status:** N/A in v1.2. The substrate switch resolves this completely. The Lore Engine does not own the mcp-server; Cognee does. ### S2.5 (NEW, design risk) — The polymorphic wrapper adds query complexity Every `DomainEntity` query is now polymorphic — the engine has to look up the template, get the field names, build the right query. The performance overhead is small for typed queries, but for `expand_context` and `graph_traverse`, the engine has to follow relations through the `Relation` label and re-resolve the template for each step. **Fix:** Cognee caches `TypeTemplate` lookups in its in-process store. The first time a template is referenced, its spec is loaded; subsequent queries use the cached version. Cache invalidation is on template reload (hot-reload event from the template-watcher data-pipeline). Cognee's caching layer handles this without us writing a custom cache. **Status:** acknowledged, fix designed, implementation in Phase 5 of the Cognee roadmap. ### S2.6 (NEW, design risk) — Cross-store consistency is genuinely hard When the world-builder writes a new mission, we touch the Cognee graph (entity, relations) and the operational Postgres tables (mission_log row). These two writes are not atomic. A partial failure leaves the world in an inconsistent state. **Fix:** the saga pattern is **no longer needed**. Cognee manages its own transaction model for the graph + Postgres + vector store. The Lore Engine's operational tables are in Cognee's Postgres, so writes that touch the graph and the operational tables are managed by Cognee's atomicity guarantees. We do not need a custom saga layer. **Status:** N/A in v1.2. Cognee handles this. The v1.1 saga-pattern section in `12-storage-strategy.md` has been removed. ## Conclusion The design is **viable for v1 on Cognee**, with a clear scope of 16 days for the MVP (Phases 0–3) and 33 days for the full v1 + extensions (Phases 0–6). The 7 open questions are resolved. The biggest remaining risks are scale (entity resolution), over-flagging (consistency engine), Cognee-specific substrate quirks, and LLM misbehavior (harness enforcement). Each has a documented mitigation. I would build the Cognee spike first (Phase 0, 2 days), validate the substrate, then proceed to the MVP (Phases 1–3, 14 days). The polymorphic extension model (Phase 5) and the consistency engine (Phase 4) are the highest-leverage v1.1 additions and ship in the same ~33-day window.