# 04 — Consistency Engine The Consistency Engine is the part of the Lore Engine that *catches the LLM before it lies*. It runs a set of Cypher-defined rules over the graph and surfaces violations as first-class nodes. Cognee ships **no** contradiction machinery of its own — `Contradiction`, `Anachronism`, `Orphan`, and `OntologyViolation` are all Lore Engine types, built from scratch in slice 2 on top of Cognee's `DataPoint`. This document catalogs the rule categories, the node types they produce, and the MCP tools the LLM uses to inspect them. ## The three failure modes we catch 1. **Contradictions** — two sources make incompatible claims about the same fact. (Built from scratch on Cognee — there is no inherited contradiction handler to generalize.) 2. **Anachronisms** — a claim requires a person/faction/thing to exist at a time it could not have. Aldric is at the Battle of Black Spire, but Black Spire happened 200 years before his birth. 3. **Ontology violations** — the graph is internally inconsistent in ways that violate domain rules. A region is inside two non-overlapping kingdoms. A spell is in a magic system that doesn't exist in this era. There is a fourth, weaker failure mode — **gaps** — that we also surface: a noble with no recorded parents, a battle with no recorded location, a faction that appears in lore but has no `FOUNDED` event. We don't claim these are errors, but we make them visible. ## Node types for violations ```cypher (:Contradiction { id, subject, predicate, claim_a, doc_a, claim_b, doc_b, severity, // "error" | "warn" flagged, // bool — has the LLM been told? detected_at }) (:Anachronism { id, entity_name, event_name, era, claim: "EXISTED_BEFORE" | "EXISTED_AFTER" | "EXISTED_DURING_MISMATCH", expected, actual, sources[], flagged, detected_at }) (:OntologyViolation { id, rule_id, // references :OntologyRule.id subject, predicate, claim, sources[], severity, flagged, detected_at }) (:Orphan { id, entity_name, entity_type, reason, // e.g. "Person with no recorded parents" or "Location not in any Region" flagged, detected_at }) ``` All four are *first-class nodes*, not flags. The LLM can query them, the user can review them, and a UI can render them. We don't bury violations in property strings. ## Rule categories ### Category A: Source-claim contradictions A `LoreSource` makes a claim about an entity that another `LoreSource` contradicts. Cognee provides only coreference resolution in its extraction prompt; the contradiction detector is ours to build. We cover: - `MEMBER_OF` (Person can't be in two factions at once, unless `valid_from`/`valid_until` differ) - `RULES` (a Location can't have two simultaneous rulers — `valid_from`/`valid_until` must not overlap) - `POSSESSES` (an Item can't be in two places at once) - `SPOUSE_OF` (a Person can't have two concurrent spouses) - `PARENT_OF` (a Person can have at most two parents, and the parents' genders must be different unless the world allows otherwise) - `BELONGS_TO` (a Person belongs to one culture at a time, unless `valid_from`/`valid_until` differ) - `EXISTED_DURING` (a Person/Faction/Location/Item has one existence window; multiple non-contiguous windows are valid for reincarnated deities etc., but they must not overlap) **Cypher (general pattern)** — edges are reified `:Relation` nodes per ADR 0009, so the contradiction check matches Relation nodes by `type` and overlapping time windows: ```cypher MATCH (a) MATCH (r1:Relation {type: "RELATION_TYPE"}) WHERE r1.from_id = a.id MATCH (r2:Relation {type: "RELATION_TYPE"}) WHERE r2.from_id = a.id WHERE r1.to_id <> r2.to_id AND time_windows_overlap(r1.valid_from, r1.valid_until, r2.valid_from, r2.valid_until) MERGE (contra:Contradiction {subject: a.name, predicate: "RELATION_TYPE", claim_a: r1.to_id, claim_b: r2.to_id}) ON CREATE SET contra.detected_at = timestamp(), contra.is_disputed = true WITH a, r1, r2, contra SET r1.is_disputed = true, r2.is_disputed = true, r1.disputed_with = coalesce(r1.disputed_with, []) + [r2.id], r2.disputed_with = coalesce(r2.disputed_with, []) + [r1.id] MERGE (a)-[:HAS_CONTRADICTION]->(contra) ``` (Note: the `is_disputed` / `disputed_with` mutation here is the consistency engine materializing ADR 0002's disputed-edge state, not the LLM inventing it.) ### Category B: Anachronism detection For every edge of type `PARTICIPATED_IN`, `WITNESSED`, `LOCATED_IN`, `POSSESSES`, `CAUSED`, `CREATED` — verify the subject's existence window contains the event/object's time. **Cypher (anachronism: entity before birth)** — `PARTICIPATED_IN` is a reified `:Relation` (ADR 0009): ```cypher MATCH (p:Person) MATCH (r:Relation {type: "PARTICIPATED_IN"}) WHERE r.from_id = p.id MATCH (e:Event {id: r.to_id}) WHERE p.birth IS NOT NULL AND time_in_window(e.in_fiction_date, p.birth, p.death) = false MERGE (an:Anachronism {entity_name: p.name, event_name: e.name, claim: "EXISTED_BEFORE_OR_AFTER"}) ON CREATE SET an.detected_at = timestamp(), an.expected = p.birth, an.actual = e.in_fiction_date WITH p, an MERGE (p)-[:HAS_ANACHRONISM]->(an) ``` Same pattern for `Faction` (vs. `OCCURRED_AT`), `Item` (vs. `POSSESSES`), `Creature` (vs. `LOCATED_IN`). ### Category C: Ontology rules (declarative) The user can define arbitrary rules as Cypher strings. They get stored as `:OntologyRule` nodes: ```cypher (:OntologyRule { id: "no-overlapping-kingdoms", cypher: "MATCH (k1:Kingdom)-[:CONTROLS]->(loc:Location)<-[:CONTROLS]-(k2:Kingdom) WHERE k1 <> k2 AND time_windows_overlap(k1.valid_from, k1.valid_until, k2.valid_from, k2.valid_until) RETURN k1, k2, loc", description: "A location cannot be controlled by two kingdoms at the same time.", severity: "error" }) ``` A nightly batch job iterates over all `:OntologyRule` nodes and runs each one, materializing `:OntologyViolation` nodes for any non-empty result. This leans on Cognee's documented pattern for graph-level consistency (Cypher rules with APOC's `apoc.util.validate`) — Category C is the one part of the consistency engine that rides a Cognee-native mechanism rather than being built entirely from scratch. **Out of the box**, we ship a starter set of ~10 rules. See `05-mcp-tools.md#starter-rules` for the list. ### Category D: Orphan detection Surfacing missing data is just as important as catching errors. We run a daily scan that flags: - `:Person` nodes with no `PARENT_OF`/`DESCENDED_FROM` connections and no `birth` property → "Person of unknown lineage." - `:Faction` nodes with no `FOUNDED` connection → "Faction of unknown origin." - `:Location` nodes with no `PART_OF` connection to a `Region` → "Unmapped location." - `:Event` nodes with no `OCCURRED_AT` → "Event with no location." - `:Event` nodes with no `OCCURRED_DURING` → "Event with no era." - `:Item` nodes with no `POSSESSES` connection (i.e. not held by anyone) → "Unowned artifact." - `:Spell` nodes with no `PART_OF_SYSTEM` → "Spell with no magic system." These produce `:Orphan` nodes with a `reason` field. The LLM is told "we don't know X about this entity," not "this entity has no X." ## Running the consistency engine Cognee exposes no built-in post-`cognify` hook (ADR 0008). The consistency engine attaches via two mechanisms: ### Live (in-process, post-ingest) A fast sweep runs as the final `Task` in `run_custom_pipeline(tasks=[...])`, right after extraction. It scans only the entities touched by the ingest, materializes `Contradiction`/`Anachronism`/`Orphan` nodes while the pipeline context is warm, and targets <100ms. This is what makes a freshly-ingested chunk's violations visible before the LLM is ever asked about them. ### Batch (nightly, external) The full rule set runs at 03:00 wall-clock as an external scheduled job (cron), independent of any single ingest — re-scans the whole graph for long-tail rules and cross-entity patterns the live sweep doesn't cover. A new `:ConsistencyRun` node records the run, and any new violations are materialized. Both modes write the same `:ConsistencyRun` shape. ``` (:ConsistencyRun { id, started_at, finished_at, duration_ms, rules_run, violations_found, anachronisms_found, orphans_found }) ``` ## MCP tools for the LLM These are the tools the LLM uses to *ask* about consistency, not just the tools the engine uses to *find* it. | Tool | Purpose | |---|---| | `get_contradictions(subject?, severity?, limit)` | List flagged contradictions, optionally filtered. **Built in slice 2 (no Cognee equivalent).** | | `get_anachronisms(entity?, limit)` | List flagged anachronisms. | | `get_ontology_violations(rule_id?, severity?, limit)` | List ontology rule violations. | | `get_orphans(reason?, limit)` | List entities with missing structural data. | | `flag_for_review(node_id, reason)` | The LLM can mark a node as suspicious. Goes to a review queue. | | `explain_violation(node_id)` | Returns the Cypher rule that produced a violation, the offending edges, and the source documents. Critical for LLM transparency. | | `run_consistency_check(scope)` | Force-run the consistency engine over a specified scope (single entity, single era, full graph). LLM uses this when it suspects a problem and wants confirmation. | | `latest_run()` | Returns the most recent `ConsistencyRun` summary. | | `add_ontology_rule(id, cypher, description, severity)` | Add a new declarative rule. Used by world-builders, not the LLM. | | `list_ontology_rules()` | Browse existing rules. | ## How the LLM should use these The LLM is **explicitly told** in the reasoning harness (`07-reasoning-harness.md`) to: 1. **Before answering a historical question**, call `latest_run()`. If the most recent run is stale (>24h) or had a high violation count, the LLM caveats its answer: "Based on the consistency check from 2 days ago, which flagged 3 unresolved issues..." 2. **After making a claim that introduces a new entity**, call `run_consistency_check` over that entity. If a violation is found, retract or qualify the claim. 3. **When citing a source**, cross-reference `get_contradictions` for that source. If the source has flagged contradictions, the LLM says so: "Source X has 2 flagged contradictions; this claim may not reflect the canonical view." 4. **Never assert something `Orphan` says is unknown.** If a Person has no recorded parents, the LLM must say "of unknown lineage," not invent parents. This is the contract. The LLM that violates it gets surfaced in conversation review. ## What the engine does not catch - **Narrative contradictions that don't map to graph edges.** "The book says the sky was red" vs. "the book says the sky was blue" — we can ingest both, and they live as separate `LoreChunk` nodes with embeddings, but unless someone writes an `OntologyRule` to flag them, they sit as parallel claims. **This is a known limitation.** Mitigation: encourage writers to use a structured format (YAML, see `06-ingestion.md`) that produces typed graph edges, not just text chunks. - **Subjective claims.** "Aldric was brave" vs. "Aldric was cowardly" — we don't flag these. They're characterizations, not facts. The engine tracks *who* said it, not whether it's true. - **Prophecy and unreliable narration.** The engine can model "the prophecy says X" as a `Claim` node with `reliability: "prophetic"`, but it does not adjudicate. The LLM must surface the unreliability to the user. ## Self-pressure-test What could break here? 1. **`time_in_window` bugs.** If the UDF is wrong, every consistency check is wrong. **Mitigation:** unit test the UDF against 50+ known cases; never change it without a regression test. 2. **Over-flagging.** A world with rich, layered history will have many temporal overlaps that aren't real contradictions. **Mitigation:** severity = warn by default, error requires explicit human review. False positives erode trust. 3. **Rule explosion.** Users adding 100 custom `OntologyRule` nodes will slow the nightly batch. **Mitigation:** the run is parallelized, and a user can disable rules by ID. 4. **LLM ignores the warnings.** If the LLM just answers anyway, the consistency engine is theater. **Mitigation:** the reasoning harness makes ignoring `get_contradictions` an explicit failure mode; a future UI can show violation count alongside the LLM's response. The biggest risk is #4. The engine is a *tool*, not a *guard*. The LLM has to want to use it.