Files

Kaysser Kayyali 7d2fe1f97e docs: integration module how-to (INTEGRATION.md) + formal contract

Two companion docs answering 'how does a host module drive the
Lore Engine correctly?'.

INTEGRATION.md — the practical guide. Audience: anyone who has
the engine and wants to wrap it. 12 sections: TL;DR (30-line
integration module), mental model, transports, 50-tool surface,
24 read tools + 12 write tools, template-generated tools, 7
integration rules, 6 failure modes, 4 metrics, adding a new
domain type, worked end-to-end example.

integration-module-contract.md — the formal, testable contract.
Audience: host-app authors. The 7 rules + their tests + their
failure modes. Versions with the system prompt (v1.0/v1.1/v1.2).
The host is 'good' when its 50-question harness run scores:
tool-selection accuracy >=80%, citation rate >=90%, hallucination
rate <5%, time-window violation rate <5%.

Per the slice 7 doc deliverable (slice 7 Track A, blocked on
the API key for the LLM execution half). These are the
hand-off artefacts for any future host module author.

Co-Authored-By: Claude <noreply@anthropic.com>

2026-06-19 23:11:04 -04:00

21 KiB

Raw Permalink Blame History

Integration Guide

Audience: developers who have the Lore Engine POC installed (~/projects/lore-engine-poc/) and want to wire it into a host application — an LLM agent, a chat UI, an IDE plugin, a Discord bot, a CLI tool, anything that needs to ask questions about a fictional world.

What this doc is: the practical "how to drive the engine" guide. The 22 design docs in this repo describe the engine from the inside out (ontology, time model, consistency rules, planes, templates, ADRs). This doc is the outside-in view: what the host sends, what the engine returns, what the host must do in between to satisfy the engine's contract.

What this doc is not: it does not duplicate the design rationale (see docs/00-overview.md for that). It also does not cover the engine's internal code path — for that, the test files in tests/ are the canonical examples.

TL;DR — the 30-line integration module

import json, subprocess, sys

# 1. Spawn the MCP server (stdio transport)
server = subprocess.Popen(
    [sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE,
    text=True, bufsize=1,
)

def rpc(method, params=None, id_=None):
    msg = {"jsonrpc": "2.0", "method": method, "params": params or {}}
    if id_ is not None:
        msg["id"] = id_
    server.stdin.write(json.dumps(msg) + "\n")
    server.stdin.flush()
    return json.loads(server.stdout.readline())

# 2. Discover the tools
rpc("initialize", id_=1)
tools = rpc("tools/list", id_=2)["result"]["tools"]
# tools is a list of {name, description, inputSchema}

# 3. Call one
result = rpc("tools/call",
    params={"name": "entity_context",
            "arguments": {"name": "Roland Raventhorne",
                          "at_time": "3rd_age.year_345"}},
    id_=3)["result"]
# result is {content: [...], isError: bool}

That's the whole shape. The rest of this doc explains what the 50 tools do, what their responses mean, and the rules the host must follow to use them correctly.

The mental model
Transports: stdio vs Streamable HTTP
The 50-tool surface
Read tools: the 24 read patterns
Write tools: the 12 mutation patterns
Template-generated tools: 14 polymorphic tools
The 7 integration rules
The 6 failure modes the host must avoid
The 4 metrics a good integration module measures
Adding a new domain type via templates/
Worked end-to-end example
Where to go next

1. The mental model

The Lore Engine is a typed, time-aware, multi-setting knowledge graph with a reified :Relation layer and a polymorphic :DomainEntity substrate. The host sees it as a single JSON-RPC service. The five concepts the host must internalize:

Setting. A campaign/world scope. Every entity belongs to exactly one Setting via an EXISTS_IN edge (the slice 6 setting filter consumes this). The default Mardonari codex lives in setting="mardonari". The Wild Dream (slice 6.5 test target) lives in setting="the_wild_dream".

Plane. A layer of existence within a Setting (Material, Shadowfell, demiplane, Outer Plane, transit, etc.). Planes are first-class nodes since slice 6.1. They have relations to other planes (LAYER_OF, REFLECTS, ADJACENT_TO, ACCESSIBLE_VIA). The Voldramir demiplane is a child of Mardonari's Material Plane via LAYER_OF.

Entity. A typed node. The 36 core labels are: Person, Faction, Location, Region, Item, Era, Date, Lineage, Culture, Deity, Language, MagicSystem, Title, Material, Event, Creature, Spell, NPC, PC, Human, LoreSource, LoreVerified, Plus, ItemSlot, DomainEntity, TypeTemplate, Setting, Plane, … (about 36 in total, with some added in slice 5T/6). Every entity has a canonical name (the by_name key) and a type (the add_entity_of_type index).

Edge. A typed relation between two entities. Most edges are time-bounded (valid_from / valid_until); some are timeless type-assertions (EXISTS_IN). Each edge carries a source list (the documents that asserted the fact) and a two-dimensional confidence score (extraction_confidence × source_confidence). Two sources that disagree create a disputed edge pair — slice 2's consistency engine surfaces these as Contradiction nodes.

Template (slice 5T). A YAML schema for a polymorphic domain type (thieves-guild mission, war campaign, black-market lot, NPC secret knowledge, etc.). The engine reads the YAML and registers N read-only MCP tools (list_missions, get_mission, missions_by_target, etc.) automatically. No Python change, no server restart — the host calls reload() to pick up new templates.

2. Transports: stdio vs Streamable HTTP

The engine ships two MCP transports. Choose by deployment context, not by preference.

stdio — for local development, IDE plugins, in-process agents. The host spawns the server as a subprocess and pipes JSON-RPC messages over stdin/stdout. See scripts/05_mcp_server.py. Latency is ~1ms per call; no network, no auth.

Streamable HTTP (slice 11) — for production deployments where the host is a remote service (web app, multi-user chat backend). The server runs in a hardened Docker container with a 1 MiB body cap, non-root user, and read-only filesystem. The host speaks HTTP+JSON-RPC against the POST /mcp endpoint. See scripts/06_mcp_http_server.py and the docker-compose.yml profile. Latency is ~5–50ms depending on host network.

The wire protocol is the same in both. The host can write the integration code once and switch transports by swapping the RPC adapter. The only thing that changes is how the bytes get from the host to the engine.

3. The 50-tool surface

tools/list returns one entry per tool with name, description, and inputSchema (a JSON Schema). The full surface as of slice 6.7 + 5T.5 + 10 + 11:

Group	Count	Examples
Read	12	`lookup`, `entity_context`, `was_true_at`, `true_during`, `entities_present`, `events_during`, `timeline`, `ancestors_of`, `descendants_of`, `event_chain`, `lore_about`, `significance_of`
List/expand	6	`list_lineage`, `list_offspring`, `location_hierarchy`, `expand_context`, `recent_changes`, `list_lore_sources`
Read (consistency)	5	`run_consistency_check`, `latest_run`, `get_contradictions`, `get_anachronisms`, `get_orphans`
Read (ontology)	3	`get_ontology_violations`, `list_ontology_rules`, `explain_violation`
Write (entity)	6	`add_entity`, `add_relation`, `add_lore_source`, `set_alias`, `update_entity`, `delete_entity`
Write (workflow)	4	`retcon`, `mark_verified`, `merge_entities`, `flag_for_review`
Write (time)	3	`define_calendar`, `define_era`, `define_date`
Template-generated	~14	`list_missions`, `get_mission`, `missions_by_target`, etc. (1 per `query:` in each template)
Meta	2	`list_template_tools`, `reload_templates`

The tool list is dynamic. Every time the host calls tools/list, the engine returns the current registry including any templates that have been loaded. The host should re-fetch on reload_templates completion, not rely on a cached list.

4. Read tools: the 24 read patterns

The 24 read tools fall into 5 design-doc question types. The host's LLM caller should pick a type and follow the canonical tool sequence (see docs/07-reasoning-harness.md §"The five question types"):

Type 1 — Identity & description. "Who is Aldric?"

lookup(query)
entity_context(entity_id, at_time=current)
expand_context(entity_id, hops=2, min_confidence=0.5)   # if sparse
significance_of(entity_id)
list_lineage(person)                                    # if Person

Type 2 — Time-bounded fact check. "Was X true at T?"

lookup(subject) + lookup(object)              # if not resolved
was_true_at(RELATION, subject, object, at_time)
cite(claim)                                    # if true
true_during(RELATION, subject, object, era)    # if false

Type 3 — World state at a time. "What was X like at T?"

lookup(entity)
entities_present(location, at_time)
events_during(era, location=resolved)
get_contradictions(subject=entity, severity=warn)

Type 4 — Causal / chain reasoning. "Why did X happen?"

lookup(event/event_chain_target)
event_chain(event, depth=3)
ancestors_of(person) + descendants_of(person)  # if Person
get_anachronisms(entity=central)

Type 5 — Open-ended narrative. "Tell me about X."

lookup(entity)
entity_context(entity)                          # state snapshot
event_chain(entity, depth=3)
lore_about(entity, type=prose, limit=10)
narrate_arc(entity, style=chronicle)
cite(claim)                                     # back the spine
get_contradictions(subject=entity, severity=warn)

Critical: every read tool returns a sources list. A good integration module extracts the sources from each tool response and includes them in the final answer. A claim without a source is a hallucination (per the slice 7.2 system prompt's Rule 2).

Critical: every read tool respects at_time. A claim about "X was true" without a time scope is wrong by default. The host should pass at_time on every fact query; the engine's current reserved token resolves to the setting's current_era.

5. Write tools: the 12 mutation patterns

The 12 write tools (slice 10) are world-builder tools, not LLM tools. The integration module should generally not let the LLM call these — the LLM is a reader, not an editor. Allow them only behind an explicit confirmation flow (see docs/19-retcon-policy.md for the retcon workflow):

# 1. The world-builder wants to retcon "Roland married
#    Aldric" — this is wrong, it was actually "allied with".
add_relation(subject="Roland", relation="MARRIED", object="Aldric")   # or
retcon(edge_id=..., new_object="Aldric", note="...")

# 2. The world-builder wants to mark an edge as verified
#    after a human read the source.
mark_verified(edge_id=..., verified_by="world_builder", note="checked chronicles")

The two most important write tools are retcon and mark_verified (slice 10.2). Both stamp the edge with an audit log entry; both are append-only at the audit-log level, even when they mutate the edge itself. Every other write tool is a simpler add_* / update_* / delete_* variant.

Integration module must: log every write tool call to the world-builder's audit log (timestamp, tool, args, caller). The audit log is the safety net — if a bad write ever lands, the roll-back path is to read the log.

6. Template-generated tools: 14 polymorphic tools

Slice 5T shipped 4 example templates (thieves-guild mission, war campaign, black-market lot, NPC secret knowledge). Each template has 3-4 query: blocks, each of which becomes an MCP tool at registration time. The total template-generated surface is ~14 tools, and it grows when the world-builder adds more templates/*.yaml files.

The template tools are read-only; they run a Cypher query (allowlist-validated per slice 5T.3) against the :DomainEntity nodes the engine has ingested. The full killer demo walkthrough is in docs/14-examples.md §"Example 5: Planes of existence" and the slice 5T ADR (docs/adr/0012-typetemplate-polymorphism.md).

Integration module must: re-discover the tool list after every reload_templates call. A cached list from before a template was added will return method_not_found for the new tool.

7. The 7 integration rules

These are the rules a good integration module follows. They come from the system prompt (prompts/system_prompt.md, slice 7.2), the design docs, and the ADRs. The tests/harness/test_questions.py 50-question test set checks that the LLM's tool sequence satisfies them.

Rule 1 — Always lookup first. Don't guess entity IDs. The cost of one lookup is 1ms; the cost of a wrong guess is a hallucinated answer.

Rule 2 — Cite every claim. Every specific factual claim in the host's response must cite at least one source returned by a tool. A claim without a source is a hallucination.

Rule 3 — Time-window every fact query. Pass at_time on every fact query (was_true_at, true_during, etc.). Default to current only when the user has not specified a time. Make the time explicit in the answer.

Rule 4 — Never resolve contradictions yourself. If two sources disagree, surface both with both sources. The world-builder decides.

Rule 5 — setting= is mandatory for cross-setting questions. When the user asks a question that could mix multiple settings, the host should pass setting=<id> explicitly. The default behaviour (no filter) is correct for single-setting worlds; the slice 6.5 cross-setting filter is the safe default for multi-setting worlds.

Rule 6 — Re-discover tools/list after reload_templates. A cached list from before a template was added will return method_not_found for the new tool. The reload_templates tool's response is the contract that "the registry is now what you saw".

Rule 7 — For long historical arcs, check latest_run() first. Stale consistency data is dangerous — a contradiction that the consistency engine found 2 weeks ago may have been resolved by a retcon since. latest_run() returns the timestamp and counts of the most recent consistency pass.

8. The 6 failure modes the host must avoid

These come from docs/07-reasoning-harness.md §"Failure modes the LLM must avoid" and are the same rules the host's LLM caller is told. The integration module should detect each and reject the response:

F1 — Answering from training data. Symptom: the LLM says "Aldric is the heir to House Vyr" without calling entity_context first. The host's audit log should flag any tool-using turn that produces a specific fact claim without a corresponding tool call in the trace.

F2 — Resolving contradictions. Symptom: the LLM picks one of two disagreeing sources. The host should reject any response that mentions a is_disputed: true edge and presents the answer as settled.

F3 — Confusing present and past. Symptom: "Aldric rules Valdorn" without a time scope. The host should require at_time on every fact query and surface the time in the answer.

F4 — Treating lore_verified: false as canonical. Symptom: the LLM cites an entity that only exists in encounter data and has no lore document. The host should mark provisional entities explicitly in the response.

F5 — Skipping the consistency check. Symptom: the LLM answers a 5-generation family question without calling get_anachronisms. The host should make get_anachronisms mandatory for any question involving 3+ entities or 1+ time hop.

F6 — Hallucinating tool results. Symptom: the LLM says "the tool returned X" when the tool actually returned Y or nothing. The host should verify every quoted tool result against the actual tool return (cross-check the trace).

9. The 4 metrics a good integration module measures

A "good integration module" is one that catches its own regressions. The 4 metrics (slice 7.3) are the regression net:

Tool-selection accuracy (per type). What fraction of the LLM's tool sequences match the canonical sequence for each question type. AC 7.3: ≥80% on the 50-question test set.

Citation rate. What fraction of claims cite ≥1 source. AC 7.4: ≥90%.

Hallucination rate. Average number of unsourced facts per question. AC 7.5: <5%.

Time-window violation rate. What fraction of answers made claims outside the question's at_time window. AC 7.6: <5%.

The integration module should run the harness (tests/harness/questions.json) before each release and fail the build if any metric regresses. The scripts/harness/run_questions.py runner (slice 7.3, Track B — needs $OLLAMA_API_KEY) is the canonical way to measure.

10. Adding a new domain type via templates/

The killer demo (slice 5T.5). A new domain type is one YAML file away. Walkthrough:

# 1. Drop a template YAML
cat > lore_engine_poc/seed/templates/npc_quirk.yaml <<'EOF'
template:
  id: npc_quirk
  version: 1.0.0
  label: NPCQuirk
  description: A persistent behavioral quirk for an NPC.

entity:
  properties:
    - {name: trigger, type: string, required: true}
    - {name: response, type: string, required: true}
    - {name: severity, type: enum, values: [minor, major, defining]}

relations:
  - {to_type: Person, type: QUIRK_OF}

queries:
  - id: list_quirks
    description: List every quirk, sorted by severity.
    cypher: |
      MATCH (n:DomainEntity {type: 'NPCQuirk'})
      RETURN n ORDER BY n.severity
    parameters: {}

  - id: quirks_of
    description: All quirks of a given NPC.
    cypher: |
      MATCH (n:DomainEntity {type: 'NPCQuirk'})-[:QUIRK_OF]->(p {name: $name})
      RETURN n
    parameters:
      name: {type: string, required: true}
EOF

# 2. Reload templates (no restart)
python3 scripts/01_ingest.py --reload-templates --skip-cognee

# 3. Ingest an instance
cat > lore_engine_poc/seed/instances/aldric_quirks.yaml <<'EOF'
template_id: npc_quirk
instances:
  - name: Aldric's coin flip
    properties:
      trigger: asked for a side
      response: flips a Valdorni silver piece; calls in the air
      severity: major
    relations:
      - {to: Aldric Raventhorne, type: QUIRK_OF}
EOF

python3 scripts/01_ingest.py --ingest-instance \
    lore_engine_poc/seed/instances/aldric_quirks.yaml --skip-cognee

# 4. Use the generated tool
python3 scripts/05_mcp_server.py --port 18765 &
curl -s http://127.0.0.1:18765/mcp \
  -H 'Content-Type: application/json' \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
       "params":{"name":"quirks_of",
                 "arguments":{"name":"Aldric Raventhorne"}}}'

The 2 new tools (list_quirks, quirks_of) appeared with no Python change and no engine restart. The same pattern works for any domain type the world-builder wants to model.

11. Worked end-to-end example

A 30-line host that asks "Was House Vyr allied with the Crimson Pact in 340 TA?" and gets a cited answer back:

import json, subprocess, sys

server = subprocess.Popen(
    [sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE,
    text=True, bufsize=1,
)

def rpc(method, params=None, id_=1):
    msg = {"jsonrpc": "2.0", "id": id_, "method": method,
           "params": params or {}}
    server.stdin.write(json.dumps(msg) + "\n")
    server.stdin.flush()
    return json.loads(server.stdout.readline())

# 1. Initialize + discover.
rpc("initialize", id_=1)
tools = {t["name"]: t for t in rpc("tools/list", id_=2)["result"]["tools"]}

# 2. Resolve both entities (Rule 1).
rpc("tools/call",
    params={"name": "lookup", "arguments": {"query": "House Vyr"}}, id_=3)
rpc("tools/call",
    params={"name": "lookup", "arguments": {"query": "Crimson Pact"}}, id_=4)

# 3. Time-bounded fact query (Rule 3).
fact = rpc("tools/call",
    params={"name": "was_true_at",
            "arguments": {"relation": "ALLIED_WITH",
                          "subject": "House Vyr",
                          "object": "Crimson Pact",
                          "at_time": "3rd_age.year_340"}},
    id_=5)["result"]

# 4. Render the answer with citations (Rule 2).
if fact["was_true"]:
    answer = (f"Yes — House Vyr was allied with the Crimson Pact "
              f"from {fact['valid_from']} to {fact['valid_until']}. "
              f"Sources: {', '.join(fact['sources'])}")
else:
    answer = ("No — they were not allied at that time. "
              f"Tools examined: {fact['edges_examined']}")

print(answer)

Expected output (Mardonari codex, slice 0 fixture):

Yes — House Vyr was allied with the Crimson Pact
from 3rd_age.year_312 to 3rd_age.year_345.
Sources: chronicles-vyr.md, pact-treaties.md

12. Where to go next

integration-module-contract.md — the formal contract a host module must satisfy to be "good"
docs/00-overview.md — engine overview
docs/05-mcp-tools.md — the full tool catalog
docs/07-reasoning-harness.md — the 5 question types and 6 failure modes
docs/11-extensibility.md — the TypeTemplate polymorphic layer
docs/17-planes.md — the Setting/Plane model
docs/19-retcon-policy.md — retcon + mark_verified audit policy
docs/20-multi-setting-policy.md — cross-setting rules
docs/21-quickstart.md — 5-minute setup
docs/adr/ — the 13 ADRs that pin the design decisions
prompts/system_prompt.md in the poc repo — the system prompt the LLM caller is told
tests/harness/questions.yaml in the poc repo — the 50-question regression net

21 KiB Raw Permalink Blame History Unescape Escape