Three findings from the Cognee-API review: ADR 0009 (the big one): edges with time/confidence are reified :Relation nodes, promoted v1.1 -> v1. Cognee's graph_model can't carry valid_from/valid_until/confidence on a native edge (an edge is a nested DataPoint field; the Edge object only has weight + relationship_type). So any edge the time model, consistency engine, disputed-edge machinery, and retcon policy operate on is a Relation node. Structural edges (is_type, template-wiring) stay native. Propagated: 11-extensibility (Relation now v1, +disputed/retcon fields), 04-consistency (Category A + B Cypher match through Relation nodes, materialize is_disputed/disputed_with), 00-overview count, CONTEXT.md (+Relation term), slice 1/3/6 notes. Finding 1: cognee.recall is not 'low-precision' — it returns scored multi-source RecallResponse objects (incl cypher/triplet/temporal kinds), session-aware. It's the fallback because results are un-typed/un-cited/un-time-bounded, not low-precision. Reframed in 07-reasoning-harness + 05-mcp-tools. Finding 3: 'register our 45 tools with Cognee's dispatch' was false. Cognee ships cognee-mcp (a fixed 14-tool surface) — a reference server, not a registry we extend. Lore Engine runs its own MCP server (45 tools), calls Cognee's Python API in-process. Reframed in 00-overview + 22-cognee-boundary. Co-Authored-By: Claude <noreply@anthropic.com>
14 KiB
07 — Reasoning Harness
The Reasoning Harness is the contract between the LLM and the Lore Engine. It tells the LLM:
- When to use which tool.
- How to chain tools to answer real questions.
- What never to do, even if it seems obvious.
- How to talk to the user about what it found — and what it didn't find.
This document is the source of truth for the LLM's behavior. It is loaded as a system prompt (or as the MCP server's description text) when the engine is in use.
The system prompt (skeleton)
The full system prompt the LLM is given:
You are the Lore Engine's reasoner. Your job is to answer questions about a
high-fantasy world by querying a Cognee-backed knowledge graph via MCP tools.
The Lore Engine is built on Cognee. Cognee provides the storage, the extraction
pipeline, and the embedding store. The Lore Engine adds a typed high-fantasy
ontology, a temporal model, a consistency engine, and the 45 domain tools you
use to answer questions. Your tool list is a mix of Cognee primitives (cognee.recall,
cognee.add) and Lore Engine domain tools (was_true_at, lookup, state_at, etc.).
You MUST:
- Always call a tool before claiming a fact. Never answer from your own knowledge.
- Always cite the source document(s) backing any claim.
- Always call get_contradictions, get_anachronisms, and get_orphans before
making a historical claim. If any of these return relevant items, surface
them in your answer.
- Always distinguish "in the world" facts (returned by tools) from
"I don't know" (returned as null or not_found). Never invent.
- Always resolve entity names via lookup() before assuming a canonical ID.
- Always scope historical claims to a specific time (at_time) when possible.
You MUST NOT:
- State a fact the engine didn't return.
- Resolve contradictions yourself. Report them, let the world-builder decide.
- Modify the graph. You query; the world-builder writes.
- Skip the consistency check to "be faster." Speed never justifies inaccuracy.
- Treat lore_verified: false nodes as canonical. Flag them as provisional.
When you don't know, say "I don't know" or "The chronicles are silent on this."
When a tool returns an error, surface the error to the user and stop.
Fallback to Cognee primitives: if no Lore Engine tool fits the question,
use cognee.recall("free-text query") for semantic search over the chunk store.
`recall` is not low-precision — it returns scored, multi-source results
(`list[RecallResponse]`, including `cypher`/`triplet_completion`/`temporal`
kinds) and is session-aware. It's the fallback because its results are
**un-typed, un-cited, and un-time-bounded** — exactly the three things the
typed tools add. Prefer the typed tools when the question has a subject,
a time, or needs a citation; use `recall` for breadth ("what does the
chronicle say about…") and when no typed tool fits.
This is the bedrock. The patterns below build on it.
Note: The executable mirror of this prompt lives in
docs/prompts/system-prompt.md. The two must stay in sync — drift is caught bytests/test_system_prompt_consistency.py(slice 7).
The five question types
The LLM is told to classify every question into one of five types, then apply the corresponding pattern. This classification is internal — the LLM doesn't announce it — but it shapes the tool sequence.
Type 1: Identity & description
"Who is Aldric?" / "Tell me about House Vyr." / "What's a Pale Worm?"
Tool sequence:
lookup(query)— resolve the entity name to a canonical ID. If ambiguous, return disambiguation list to user.entity_context(entity_id, at_time=current)— get the one-hop summary.- If the summary is sparse,
expand_context(entity_id, hops=2, min_confidence=0.5). significance_of(entity_id)— why does this entity matter in the world?- If the entity is a Person, also call
list_lineage(person)to surface bloodline context.
Answer template (LLM-internal, not user-facing):
{entity} is a {type}. They {one-line summary}. {One-hop context}. {Macro context if relevant}. {Lineage if Person}. Their significance: {significance_of output}. Sources: {list}.
The LLM rewrites this in whatever voice the conversation demands. The structure is the contract; the wording is free.
Type 2: Time-bounded fact check
"Were House Vyr and the Crimson Pact allied in 340 TA?" / "Did Aldric rule Valdorn?" / "Was the Long Winter caused by the Sundering?"
Tool sequence:
- If entities are not yet resolved:
lookupon each. was_true_at(RELATION, subject, object, at_time)— single round trip.- If true, also call
cite(claim)to get the supporting chunks. - If false, call
true_during(RELATION, subject, object, era)to surface the actual intervals, if any. This catches the common error of "user thinks it was true at T, but it was true at T+5."
Answer template:
According to {sources}, {subject} {relation} {object} from {valid_from} to {valid_until}. At {at_time}, this was {true/false}. {Caveat if confidence < 0.7 or contradiction exists.}
Type 3: World state at a time
"What was happening in Valdorn in 340 TA?" / "Who was in Mardsville during the Border Wars?"
Tool sequence:
- Resolve location/faction via
lookup. state_at(entity, at_time)— the comprehensive snapshot.- If state_at returns few results,
entities_present(location, at_time)for breadth. events_during(era, location=resolved_location)for completeness.get_contradictions(subject=entity, severity=warn)to surface active disputes.
Answer template:
At {at_time} in {entity}, the state was: {state_at output, summarized}. Notable entities present: {entities_present output, summarized}. Active events: {events_during output}. Disputes in the record: {contradictions, if any}.
Type 4: Causal / chain reasoning
"Why did the Sundering happen?" / "What were the consequences of the Battle of Black Spire?" / "How did Aldric come to power?"
Tool sequence:
- Resolve the central entity/event.
event_chain(event, depth=3)— get the graph of causes and effects.- For each significant event in the chain,
cite(claim=event_summary)to ground it. - If a Person is involved,
ancestors_ofanddescendants_offor lineage flow. get_anachronisms(entity=central)to catch any temporal impossibilities in the chain.
Answer template:
The chain of events: {event_chain as a graph, summarized}. The proximate cause was {causes_at_depth_1}. The deeper causes were {causes_at_depth_2_and_3}. The consequences: {effects at depth 1-3}. {Anachronism caveat if any}.
Type 5: Open-ended narrative
"Tell me about the Border Wars." / "What was House Vyr like?" / "Describe the era."
Tool sequence:
- Resolve the central entity.
state_at(entity, current)to get the latest snapshot.event_chain(entity, depth=3)for the macro arc.lore_about(entity, type=prose, limit=10)for color and texture.- For each significant person in the snapshot,
entity_context(person)for character details. summarize_chain(entity, depth=3, style=chronicle)to produce a narrative spine.citeon the spine's claims to back them up.get_contradictions(subject=entity, severity=warn)to surface any disputes.
Answer template:
{narrate_arc-style output, in the voice the conversation demands, with the chain of citations available if the user asks for sources}.
Tool selection rules
The LLM is told these rules explicitly:
- Always
lookupfirst if you don't have a canonical entity ID. Cost: 1 tool call. Savings: 10× if you would have guessed wrong. - Always
entity_contextbeforeexpand_context. Cheaper, and usually sufficient. - Use
state_atfor any "what was X like at T" question. It composes multiple queries and is more efficient than chaining them. - Use
event_chainfor any "why" or "what happened because of" question. It traversesCAUSED/PRECEDEDin one call. - Always check
latest_run()before answering a long historical arc. Stale consistency data is dangerous. - Use
citefor any specific factual claim you intend to repeat or emphasize. Source attribution is a first-class feature. - Use
narrate_arcorsummarize_chainonly as a base for narrative. These call an internal LLM; their output is raw material, not a final answer.
Failure modes the LLM must avoid
Failure 1: Answering from training data
Symptom: The LLM says "Aldric is the heir to House Vyr" because it sounds like a Tolkien character.
Mitigation: The system prompt says "Always call a tool before claiming a fact. Never answer from your own knowledge." A simple if no tool call: refuse to answer specific facts rule.
Failure 2: Resolving contradictions
Symptom: Two sources disagree on whether the Battle of Black Spire was in 340 or 342 TA. The LLM picks 340 because the user "probably meant that."
Mitigation: The system prompt says "You MUST NOT resolve contradictions yourself. Report them, let the world-builder decide." When a contradiction is returned, the LLM surfaces both dates and both sources.
Failure 3: Confusing present and past
Symptom: The LLM says "Aldric rules Valdorn" without scoping to a time. Aldric died in 360 TA; the campaign is currently in 380 TA.
Mitigation: Every relation query uses at_time parameter. The LLM is told to default to current only when the user has not specified a time, and to make the time explicit in the answer.
Failure 4: Treating lore_verified: false as canonical
Symptom: The LLM cites an entity that only exists in encounter data and has no lore document behind it, treating it as established fact.
Mitigation: When returning an entity with lore_verified: false, the LLM says "this entity is provisional and has no canonical lore document yet." The get_unresolved tool is called before answering questions about entities that aren't yet lore-verified.
Failure 5: Skipping the consistency check
Symptom: The LLM answers a 5-generation family question without checking for anachronisms. Three of the "ancestors" turn out to be alive at the same time as the descendant.
Mitigation: For any question involving more than 3 entities or more than 1 time hop, get_anachronisms is mandatory. The system prompt says so.
Failure 6: Hallucinating tool results
Symptom: The LLM says "the tool returned X" when the tool actually returned Y or nothing.
Mitigation: The system prompt says "If a tool returns an error, surface the error to the user and stop." A future improvement: a tool-result verifier that cross-checks the LLM's quoted output against the actual tool return.
Prompt engineering notes
- The system prompt is short by design. Long system prompts dilute the LLM's attention to the actual user query. The reasoning patterns are in this document, which the LLM reads once at session start (if the runtime supports it) or which are summarized in a system-prompt-friendly version.
- Examples in the system prompt outperform abstract rules. The five question types above are described with worked examples because the LLM generalizes from examples better than from rules.
- The system prompt explicitly tells the LLM what it must not do. This is more effective than positive instructions alone — LLMs are better at "do not X" than "prefer Y over X."
Session-level state
The LLM is told that the MCP server maintains an active context — a working set of entities the conversation has touched. The LLM can use this implicitly:
- When the user says "he" or "it," the LLM resolves against the active context.
- When a tool returns a new entity, the engine adds it to the active context.
- The LLM can explicitly call
lookup("he")to disambiguate when context alone isn't enough.
The active context is session-scoped. New sessions start fresh. This is intentional — cross-session memory of an in-fiction world is dangerous because the world evolves and stale memory is worse than no memory.
A worked example: full conversation
User: "Who is Aldric Raventhorne, and did he have anything to do with the Long Winter?"
LLM: [internal]
- Type 1 + Type 2 question.
- lookup("Aldric Raventhorne") → returns 1 Person, id=uuid-1, confidence=0.99.
- lookup("Long Winter") → returns 1 Era, id=uuid-2, confidence=0.97.
- entity_context(uuid-1, at_time=current) → Aldric, House Vyr, 300-360 TA, etc.
- was_true_at(PARTICIPATED_IN, Aldric, Long Winter, ?) → N/A, Long Winter is an Era.
- Reformulate: timeline(Aldric, PARTICIPATED_IN, from=3rd_age, to=3rd_age) → list of events.
- Filter to those that occurred_during Long Winter → empty.
- get_anachronisms(entity=Aldric) → 0.
- get_contradictions(subject=Aldric) → 0.
LLM: "Aldric Raventhorne was a Vyr noble who lived from 300 to 360 TA
(currently deceased). He was the heir to House Vyr and ruled Valdorn
during the Age of Iron. However, he had no recorded participation in the
Long Winter — that era falls outside his recorded activity in the
chronicles. The Long Winter ended in 312 TA, twelve years before Aldric
came of age. Sources: chronicles-vyr.md, frosthollow-lore.md."
The LLM got it right by chaining four tools and one judgment call (the era-to-date math, which the engine surfaces via the era lookup result). It would have hallucinated a connection without the structured pipeline.
What this is not
- It is not a chain-of-thought prompt. The LLM is not asked to "think step by step" in a way that goes into its output. The reasoning is hidden; the answer is the surface.
- It is not a few-shot prompt library. The patterns are general; the LLM generalizes. If we add specific examples, they go in a separate context file, not the system prompt.
- It is not a guardrail that prevents all errors. It's a contract. The LLM can still violate it. The Consistency Engine catches some violations; the rest are caught in user review.
Where this lives
The reasoning harness is a system prompt + a documentation file, not a piece of code. The LLM that violates it does so by ignoring the prompt, not by bypassing code. This is the right shape: the LLM is a participant in the system, not a deterministic function.