Two companion docs answering 'how does a host module drive the Lore Engine correctly?'. INTEGRATION.md — the practical guide. Audience: anyone who has the engine and wants to wrap it. 12 sections: TL;DR (30-line integration module), mental model, transports, 50-tool surface, 24 read tools + 12 write tools, template-generated tools, 7 integration rules, 6 failure modes, 4 metrics, adding a new domain type, worked end-to-end example. integration-module-contract.md — the formal, testable contract. Audience: host-app authors. The 7 rules + their tests + their failure modes. Versions with the system prompt (v1.0/v1.1/v1.2). The host is 'good' when its 50-question harness run scores: tool-selection accuracy >=80%, citation rate >=90%, hallucination rate <5%, time-window violation rate <5%. Per the slice 7 doc deliverable (slice 7 Track A, blocked on the API key for the LLM execution half). These are the hand-off artefacts for any future host module author. Co-Authored-By: Claude <noreply@anthropic.com>
21 KiB
Integration Guide
Audience: developers who have the Lore Engine POC installed
(~/projects/lore-engine-poc/) and want to wire it into a host
application — an LLM agent, a chat UI, an IDE plugin, a Discord
bot, a CLI tool, anything that needs to ask questions about a
fictional world.
What this doc is: the practical "how to drive the engine" guide. The 22 design docs in this repo describe the engine from the inside out (ontology, time model, consistency rules, planes, templates, ADRs). This doc is the outside-in view: what the host sends, what the engine returns, what the host must do in between to satisfy the engine's contract.
What this doc is not: it does not duplicate the design
rationale (see docs/00-overview.md for that). It also does
not cover the engine's internal code path — for that, the
test files in tests/ are the canonical examples.
TL;DR — the 30-line integration module
import json, subprocess, sys
# 1. Spawn the MCP server (stdio transport)
server = subprocess.Popen(
[sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
text=True, bufsize=1,
)
def rpc(method, params=None, id_=None):
msg = {"jsonrpc": "2.0", "method": method, "params": params or {}}
if id_ is not None:
msg["id"] = id_
server.stdin.write(json.dumps(msg) + "\n")
server.stdin.flush()
return json.loads(server.stdout.readline())
# 2. Discover the tools
rpc("initialize", id_=1)
tools = rpc("tools/list", id_=2)["result"]["tools"]
# tools is a list of {name, description, inputSchema}
# 3. Call one
result = rpc("tools/call",
params={"name": "entity_context",
"arguments": {"name": "Roland Raventhorne",
"at_time": "3rd_age.year_345"}},
id_=3)["result"]
# result is {content: [...], isError: bool}
That's the whole shape. The rest of this doc explains what the 50 tools do, what their responses mean, and the rules the host must follow to use them correctly.
Table of contents
- The mental model
- Transports: stdio vs Streamable HTTP
- The 50-tool surface
- Read tools: the 24 read patterns
- Write tools: the 12 mutation patterns
- Template-generated tools: 14 polymorphic tools
- The 7 integration rules
- The 6 failure modes the host must avoid
- The 4 metrics a good integration module measures
- Adding a new domain type via templates/
- Worked end-to-end example
- Where to go next
1. The mental model
The Lore Engine is a typed, time-aware, multi-setting knowledge graph with a reified :Relation layer and a polymorphic :DomainEntity substrate. The host sees it as a single JSON-RPC service. The five concepts the host must internalize:
Setting. A campaign/world scope. Every entity belongs to
exactly one Setting via an EXISTS_IN edge (the slice 6
setting filter consumes this). The default Mardonari codex
lives in setting="mardonari". The Wild Dream (slice 6.5
test target) lives in setting="the_wild_dream".
Plane. A layer of existence within a Setting (Material,
Shadowfell, demiplane, Outer Plane, transit, etc.). Planes
are first-class nodes since slice 6.1. They have relations
to other planes (LAYER_OF, REFLECTS, ADJACENT_TO,
ACCESSIBLE_VIA). The Voldramir demiplane is a child of
Mardonari's Material Plane via LAYER_OF.
Entity. A typed node. The 36 core labels are: Person,
Faction, Location, Region, Item, Era, Date, Lineage, Culture,
Deity, Language, MagicSystem, Title, Material, Event, Creature,
Spell, NPC, PC, Human, LoreSource, LoreVerified, Plus, ItemSlot,
DomainEntity, TypeTemplate, Setting, Plane, … (about 36 in
total, with some added in slice 5T/6). Every entity has a
canonical name (the by_name key) and a type (the add_entity_of_type
index).
Edge. A typed relation between two entities. Most edges
are time-bounded (valid_from / valid_until); some are
timeless type-assertions (EXISTS_IN). Each edge carries a
source list (the documents that asserted the fact) and a
two-dimensional confidence score
(extraction_confidence × source_confidence). Two sources
that disagree create a disputed edge pair — slice 2's
consistency engine surfaces these as Contradiction nodes.
Template (slice 5T). A YAML schema for a polymorphic
domain type (thieves-guild mission, war campaign, black-market
lot, NPC secret knowledge, etc.). The engine reads the YAML
and registers N read-only MCP tools (list_missions,
get_mission, missions_by_target, etc.) automatically.
No Python change, no server restart — the host calls
reload() to pick up new templates.
2. Transports: stdio vs Streamable HTTP
The engine ships two MCP transports. Choose by deployment context, not by preference.
stdio — for local development, IDE plugins, in-process
agents. The host spawns the server as a subprocess and pipes
JSON-RPC messages over stdin/stdout. See
scripts/05_mcp_server.py. Latency is ~1ms per call; no
network, no auth.
Streamable HTTP (slice 11) — for production deployments
where the host is a remote service (web app, multi-user chat
backend). The server runs in a hardened Docker container with
a 1 MiB body cap, non-root user, and read-only filesystem.
The host speaks HTTP+JSON-RPC against the POST /mcp endpoint.
See scripts/06_mcp_http_server.py and the
docker-compose.yml profile. Latency is ~5–50ms depending
on host network.
The wire protocol is the same in both. The host can write the integration code once and switch transports by swapping the RPC adapter. The only thing that changes is how the bytes get from the host to the engine.
3. The 50-tool surface
tools/list returns one entry per tool with name,
description, and inputSchema (a JSON Schema). The full
surface as of slice 6.7 + 5T.5 + 10 + 11:
| Group | Count | Examples |
|---|---|---|
| Read | 12 | lookup, entity_context, was_true_at, true_during, entities_present, events_during, timeline, ancestors_of, descendants_of, event_chain, lore_about, significance_of |
| List/expand | 6 | list_lineage, list_offspring, location_hierarchy, expand_context, recent_changes, list_lore_sources |
| Read (consistency) | 5 | run_consistency_check, latest_run, get_contradictions, get_anachronisms, get_orphans |
| Read (ontology) | 3 | get_ontology_violations, list_ontology_rules, explain_violation |
| Write (entity) | 6 | add_entity, add_relation, add_lore_source, set_alias, update_entity, delete_entity |
| Write (workflow) | 4 | retcon, mark_verified, merge_entities, flag_for_review |
| Write (time) | 3 | define_calendar, define_era, define_date |
| Template-generated | ~14 | list_missions, get_mission, missions_by_target, etc. (1 per query: in each template) |
| Meta | 2 | list_template_tools, reload_templates |
The tool list is dynamic. Every time the host calls
tools/list, the engine returns the current registry
including any templates that have been loaded. The host
should re-fetch on reload_templates completion, not
rely on a cached list.
4. Read tools: the 24 read patterns
The 24 read tools fall into 5 design-doc question types. The
host's LLM caller should pick a type and follow the canonical
tool sequence (see docs/07-reasoning-harness.md §"The five
question types"):
Type 1 — Identity & description. "Who is Aldric?"
lookup(query)
entity_context(entity_id, at_time=current)
expand_context(entity_id, hops=2, min_confidence=0.5) # if sparse
significance_of(entity_id)
list_lineage(person) # if Person
Type 2 — Time-bounded fact check. "Was X true at T?"
lookup(subject) + lookup(object) # if not resolved
was_true_at(RELATION, subject, object, at_time)
cite(claim) # if true
true_during(RELATION, subject, object, era) # if false
Type 3 — World state at a time. "What was X like at T?"
lookup(entity)
entities_present(location, at_time)
events_during(era, location=resolved)
get_contradictions(subject=entity, severity=warn)
Type 4 — Causal / chain reasoning. "Why did X happen?"
lookup(event/event_chain_target)
event_chain(event, depth=3)
ancestors_of(person) + descendants_of(person) # if Person
get_anachronisms(entity=central)
Type 5 — Open-ended narrative. "Tell me about X."
lookup(entity)
entity_context(entity) # state snapshot
event_chain(entity, depth=3)
lore_about(entity, type=prose, limit=10)
narrate_arc(entity, style=chronicle)
cite(claim) # back the spine
get_contradictions(subject=entity, severity=warn)
Critical: every read tool returns a sources list. A
good integration module extracts the sources from each
tool response and includes them in the final answer. A
claim without a source is a hallucination (per the slice 7.2
system prompt's Rule 2).
Critical: every read tool respects at_time. A claim
about "X was true" without a time scope is wrong by
default. The host should pass at_time on every fact query;
the engine's current reserved token resolves to the
setting's current_era.
5. Write tools: the 12 mutation patterns
The 12 write tools (slice 10) are world-builder tools, not
LLM tools. The integration module should generally not
let the LLM call these — the LLM is a reader, not an editor.
Allow them only behind an explicit confirmation flow (see
docs/19-retcon-policy.md for the retcon workflow):
# 1. The world-builder wants to retcon "Roland married
# Aldric" — this is wrong, it was actually "allied with".
add_relation(subject="Roland", relation="MARRIED", object="Aldric") # or
retcon(edge_id=..., new_object="Aldric", note="...")
# 2. The world-builder wants to mark an edge as verified
# after a human read the source.
mark_verified(edge_id=..., verified_by="world_builder", note="checked chronicles")
The two most important write tools are retcon and
mark_verified (slice 10.2). Both stamp the edge with an
audit log entry; both are append-only at the audit-log
level, even when they mutate the edge itself. Every other
write tool is a simpler add_* / update_* /
delete_* variant.
Integration module must: log every write tool call to the world-builder's audit log (timestamp, tool, args, caller). The audit log is the safety net — if a bad write ever lands, the roll-back path is to read the log.
6. Template-generated tools: 14 polymorphic tools
Slice 5T shipped 4 example templates (thieves-guild mission,
war campaign, black-market lot, NPC secret knowledge). Each
template has 3-4 query: blocks, each of which becomes an
MCP tool at registration time. The total template-generated
surface is ~14 tools, and it grows when the world-builder
adds more templates/*.yaml files.
The template tools are read-only; they run a Cypher query
(allowlist-validated per slice 5T.3) against the
:DomainEntity nodes the engine has ingested. The full
killer demo walkthrough is in docs/14-examples.md §"Example
5: Planes of existence" and the slice 5T ADR (docs/adr/0012-typetemplate-polymorphism.md).
Integration module must: re-discover the tool list
after every reload_templates call. A cached list from
before a template was added will return
method_not_found for the new tool.
7. The 7 integration rules
These are the rules a good integration module follows. They
come from the system prompt (prompts/system_prompt.md,
slice 7.2), the design docs, and the ADRs. The
tests/harness/test_questions.py 50-question test set
checks that the LLM's tool sequence satisfies them.
Rule 1 — Always lookup first. Don't guess entity
IDs. The cost of one lookup is 1ms; the cost of a wrong
guess is a hallucinated answer.
Rule 2 — Cite every claim. Every specific factual claim in the host's response must cite at least one source returned by a tool. A claim without a source is a hallucination.
Rule 3 — Time-window every fact query. Pass at_time
on every fact query (was_true_at, true_during, etc.).
Default to current only when the user has not specified
a time. Make the time explicit in the answer.
Rule 4 — Never resolve contradictions yourself. If two sources disagree, surface both with both sources. The world-builder decides.
Rule 5 — setting= is mandatory for cross-setting
questions. When the user asks a question that could mix
multiple settings, the host should pass setting=<id>
explicitly. The default behaviour (no filter) is correct
for single-setting worlds; the slice 6.5 cross-setting
filter is the safe default for multi-setting worlds.
Rule 6 — Re-discover tools/list after reload_templates.
A cached list from before a template was added will
return method_not_found for the new tool. The
reload_templates tool's response is the contract that
"the registry is now what you saw".
Rule 7 — For long historical arcs, check
latest_run() first. Stale consistency data is
dangerous — a contradiction that the consistency engine
found 2 weeks ago may have been resolved by a retcon
since. latest_run() returns the timestamp and counts of
the most recent consistency pass.
8. The 6 failure modes the host must avoid
These come from docs/07-reasoning-harness.md §"Failure
modes the LLM must avoid" and are the same rules the
host's LLM caller is told. The integration module should
detect each and reject the response:
F1 — Answering from training data. Symptom: the LLM
says "Aldric is the heir to House Vyr" without calling
entity_context first. The host's audit log should flag
any tool-using turn that produces a specific fact claim
without a corresponding tool call in the trace.
F2 — Resolving contradictions. Symptom: the LLM
picks one of two disagreeing sources. The host should
reject any response that mentions a is_disputed: true
edge and presents the answer as settled.
F3 — Confusing present and past. Symptom: "Aldric
rules Valdorn" without a time scope. The host should
require at_time on every fact query and surface the
time in the answer.
F4 — Treating lore_verified: false as canonical.
Symptom: the LLM cites an entity that only exists in
encounter data and has no lore document. The host
should mark provisional entities explicitly in the
response.
F5 — Skipping the consistency check. Symptom: the
LLM answers a 5-generation family question without
calling get_anachronisms. The host should make
get_anachronisms mandatory for any question involving
3+ entities or 1+ time hop.
F6 — Hallucinating tool results. Symptom: the LLM says "the tool returned X" when the tool actually returned Y or nothing. The host should verify every quoted tool result against the actual tool return (cross-check the trace).
9. The 4 metrics a good integration module measures
A "good integration module" is one that catches its own regressions. The 4 metrics (slice 7.3) are the regression net:
Tool-selection accuracy (per type). What fraction of the LLM's tool sequences match the canonical sequence for each question type. AC 7.3: ≥80% on the 50-question test set.
Citation rate. What fraction of claims cite ≥1 source. AC 7.4: ≥90%.
Hallucination rate. Average number of unsourced facts per question. AC 7.5: <5%.
Time-window violation rate. What fraction of answers
made claims outside the question's at_time window.
AC 7.6: <5%.
The integration module should run the harness
(tests/harness/questions.json) before each release and
fail the build if any metric regresses. The
scripts/harness/run_questions.py runner (slice 7.3,
Track B — needs $OLLAMA_API_KEY) is the canonical
way to measure.
10. Adding a new domain type via templates/
The killer demo (slice 5T.5). A new domain type is one YAML file away. Walkthrough:
# 1. Drop a template YAML
cat > lore_engine_poc/seed/templates/npc_quirk.yaml <<'EOF'
template:
id: npc_quirk
version: 1.0.0
label: NPCQuirk
description: A persistent behavioral quirk for an NPC.
entity:
properties:
- {name: trigger, type: string, required: true}
- {name: response, type: string, required: true}
- {name: severity, type: enum, values: [minor, major, defining]}
relations:
- {to_type: Person, type: QUIRK_OF}
queries:
- id: list_quirks
description: List every quirk, sorted by severity.
cypher: |
MATCH (n:DomainEntity {type: 'NPCQuirk'})
RETURN n ORDER BY n.severity
parameters: {}
- id: quirks_of
description: All quirks of a given NPC.
cypher: |
MATCH (n:DomainEntity {type: 'NPCQuirk'})-[:QUIRK_OF]->(p {name: $name})
RETURN n
parameters:
name: {type: string, required: true}
EOF
# 2. Reload templates (no restart)
python3 scripts/01_ingest.py --reload-templates --skip-cognee
# 3. Ingest an instance
cat > lore_engine_poc/seed/instances/aldric_quirks.yaml <<'EOF'
template_id: npc_quirk
instances:
- name: Aldric's coin flip
properties:
trigger: asked for a side
response: flips a Valdorni silver piece; calls in the air
severity: major
relations:
- {to: Aldric Raventhorne, type: QUIRK_OF}
EOF
python3 scripts/01_ingest.py --ingest-instance \
lore_engine_poc/seed/instances/aldric_quirks.yaml --skip-cognee
# 4. Use the generated tool
python3 scripts/05_mcp_server.py --port 18765 &
curl -s http://127.0.0.1:18765/mcp \
-H 'Content-Type: application/json' \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"quirks_of",
"arguments":{"name":"Aldric Raventhorne"}}}'
The 2 new tools (list_quirks, quirks_of) appeared with
no Python change and no engine restart. The same pattern
works for any domain type the world-builder wants to model.
11. Worked end-to-end example
A 30-line host that asks "Was House Vyr allied with the Crimson Pact in 340 TA?" and gets a cited answer back:
import json, subprocess, sys
server = subprocess.Popen(
[sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
text=True, bufsize=1,
)
def rpc(method, params=None, id_=1):
msg = {"jsonrpc": "2.0", "id": id_, "method": method,
"params": params or {}}
server.stdin.write(json.dumps(msg) + "\n")
server.stdin.flush()
return json.loads(server.stdout.readline())
# 1. Initialize + discover.
rpc("initialize", id_=1)
tools = {t["name"]: t for t in rpc("tools/list", id_=2)["result"]["tools"]}
# 2. Resolve both entities (Rule 1).
rpc("tools/call",
params={"name": "lookup", "arguments": {"query": "House Vyr"}}, id_=3)
rpc("tools/call",
params={"name": "lookup", "arguments": {"query": "Crimson Pact"}}, id_=4)
# 3. Time-bounded fact query (Rule 3).
fact = rpc("tools/call",
params={"name": "was_true_at",
"arguments": {"relation": "ALLIED_WITH",
"subject": "House Vyr",
"object": "Crimson Pact",
"at_time": "3rd_age.year_340"}},
id_=5)["result"]
# 4. Render the answer with citations (Rule 2).
if fact["was_true"]:
answer = (f"Yes — House Vyr was allied with the Crimson Pact "
f"from {fact['valid_from']} to {fact['valid_until']}. "
f"Sources: {', '.join(fact['sources'])}")
else:
answer = ("No — they were not allied at that time. "
f"Tools examined: {fact['edges_examined']}")
print(answer)
Expected output (Mardonari codex, slice 0 fixture):
Yes — House Vyr was allied with the Crimson Pact
from 3rd_age.year_312 to 3rd_age.year_345.
Sources: chronicles-vyr.md, pact-treaties.md
12. Where to go next
integration-module-contract.md— the formal contract a host module must satisfy to be "good"docs/00-overview.md— engine overviewdocs/05-mcp-tools.md— the full tool catalogdocs/07-reasoning-harness.md— the 5 question types and 6 failure modesdocs/11-extensibility.md— the TypeTemplate polymorphic layerdocs/17-planes.md— the Setting/Plane modeldocs/19-retcon-policy.md— retcon + mark_verified audit policydocs/20-multi-setting-policy.md— cross-setting rulesdocs/21-quickstart.md— 5-minute setupdocs/adr/— the 13 ADRs that pin the design decisionsprompts/system_prompt.mdin the poc repo — the system prompt the LLM caller is toldtests/harness/questions.yamlin the poc repo — the 50-question regression net