docs: integration module how-to (INTEGRATION.md) + formal contract
Two companion docs answering 'how does a host module drive the Lore Engine correctly?'. INTEGRATION.md — the practical guide. Audience: anyone who has the engine and wants to wrap it. 12 sections: TL;DR (30-line integration module), mental model, transports, 50-tool surface, 24 read tools + 12 write tools, template-generated tools, 7 integration rules, 6 failure modes, 4 metrics, adding a new domain type, worked end-to-end example. integration-module-contract.md — the formal, testable contract. Audience: host-app authors. The 7 rules + their tests + their failure modes. Versions with the system prompt (v1.0/v1.1/v1.2). The host is 'good' when its 50-question harness run scores: tool-selection accuracy >=80%, citation rate >=90%, hallucination rate <5%, time-window violation rate <5%. Per the slice 7 doc deliverable (slice 7 Track A, blocked on the API key for the LLM execution half). These are the hand-off artefacts for any future host module author. Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
552
docs/INTEGRATION.md
Normal file
552
docs/INTEGRATION.md
Normal file
@@ -0,0 +1,552 @@
|
||||
# Integration Guide
|
||||
|
||||
**Audience:** developers who have the Lore Engine POC installed
|
||||
(`~/projects/lore-engine-poc/`) and want to wire it into a host
|
||||
application — an LLM agent, a chat UI, an IDE plugin, a Discord
|
||||
bot, a CLI tool, anything that needs to ask questions about a
|
||||
fictional world.
|
||||
|
||||
**What this doc is:** the practical "how to drive the engine"
|
||||
guide. The 22 design docs in this repo describe the engine from
|
||||
the inside out (ontology, time model, consistency rules, planes,
|
||||
templates, ADRs). This doc is the outside-in view: what the
|
||||
host sends, what the engine returns, what the host must do in
|
||||
between to satisfy the engine's contract.
|
||||
|
||||
**What this doc is not:** it does not duplicate the design
|
||||
rationale (see `docs/00-overview.md` for that). It also does
|
||||
not cover the engine's *internal* code path — for that, the
|
||||
test files in `tests/` are the canonical examples.
|
||||
|
||||
## TL;DR — the 30-line integration module
|
||||
|
||||
```python
|
||||
import json, subprocess, sys
|
||||
|
||||
# 1. Spawn the MCP server (stdio transport)
|
||||
server = subprocess.Popen(
|
||||
[sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
|
||||
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
|
||||
text=True, bufsize=1,
|
||||
)
|
||||
|
||||
def rpc(method, params=None, id_=None):
|
||||
msg = {"jsonrpc": "2.0", "method": method, "params": params or {}}
|
||||
if id_ is not None:
|
||||
msg["id"] = id_
|
||||
server.stdin.write(json.dumps(msg) + "\n")
|
||||
server.stdin.flush()
|
||||
return json.loads(server.stdout.readline())
|
||||
|
||||
# 2. Discover the tools
|
||||
rpc("initialize", id_=1)
|
||||
tools = rpc("tools/list", id_=2)["result"]["tools"]
|
||||
# tools is a list of {name, description, inputSchema}
|
||||
|
||||
# 3. Call one
|
||||
result = rpc("tools/call",
|
||||
params={"name": "entity_context",
|
||||
"arguments": {"name": "Roland Raventhorne",
|
||||
"at_time": "3rd_age.year_345"}},
|
||||
id_=3)["result"]
|
||||
# result is {content: [...], isError: bool}
|
||||
```
|
||||
|
||||
That's the whole shape. The rest of this doc explains what
|
||||
the 50 tools do, what their responses mean, and the rules
|
||||
the host must follow to use them correctly.
|
||||
|
||||
## Table of contents
|
||||
|
||||
1. [The mental model](#1-the-mental-model)
|
||||
2. [Transports: stdio vs Streamable HTTP](#2-transports-stdio-vs-streamable-http)
|
||||
3. [The 50-tool surface](#3-the-50-tool-surface)
|
||||
4. [Read tools: the 24 read patterns](#4-read-tools-the-24-read-patterns)
|
||||
5. [Write tools: the 12 mutation patterns](#5-write-tools-the-12-mutation-patterns)
|
||||
6. [Template-generated tools: 14 polymorphic tools](#6-template-generated-tools-14-polymorphic-tools)
|
||||
7. [The 7 integration rules](#7-the-7-integration-rules)
|
||||
8. [The 6 failure modes the host must avoid](#8-the-6-failure-modes-the-host-must-avoid)
|
||||
9. [The 4 metrics a good integration module measures](#9-the-4-metrics-a-good-integration-module-measures)
|
||||
10. [Adding a new domain type via templates/](#10-adding-a-new-domain-type-via-templates)
|
||||
11. [Worked end-to-end example](#11-worked-end-to-end-example)
|
||||
12. [Where to go next](#12-where-to-go-next)
|
||||
|
||||
## 1. The mental model
|
||||
|
||||
The Lore Engine is a typed, time-aware, multi-setting knowledge
|
||||
graph with a reified :Relation layer and a polymorphic
|
||||
:DomainEntity substrate. The host sees it as a single JSON-RPC
|
||||
service. The five concepts the host must internalize:
|
||||
|
||||
**Setting.** A campaign/world scope. Every entity belongs to
|
||||
exactly one Setting via an `EXISTS_IN` edge (the slice 6
|
||||
setting filter consumes this). The default Mardonari codex
|
||||
lives in `setting="mardonari"`. The Wild Dream (slice 6.5
|
||||
test target) lives in `setting="the_wild_dream"`.
|
||||
|
||||
**Plane.** A layer of existence within a Setting (Material,
|
||||
Shadowfell, demiplane, Outer Plane, transit, etc.). Planes
|
||||
are first-class nodes since slice 6.1. They have relations
|
||||
to other planes (`LAYER_OF`, `REFLECTS`, `ADJACENT_TO`,
|
||||
`ACCESSIBLE_VIA`). The Voldramir demiplane is a child of
|
||||
Mardonari's Material Plane via `LAYER_OF`.
|
||||
|
||||
**Entity.** A typed node. The 36 core labels are: Person,
|
||||
Faction, Location, Region, Item, Era, Date, Lineage, Culture,
|
||||
Deity, Language, MagicSystem, Title, Material, Event, Creature,
|
||||
Spell, NPC, PC, Human, LoreSource, LoreVerified, Plus, ItemSlot,
|
||||
DomainEntity, TypeTemplate, Setting, Plane, … (about 36 in
|
||||
total, with some added in slice 5T/6). Every entity has a
|
||||
canonical name (the `by_name` key) and a type (the `add_entity_of_type`
|
||||
index).
|
||||
|
||||
**Edge.** A typed relation between two entities. Most edges
|
||||
are time-bounded (`valid_from` / `valid_until`); some are
|
||||
timeless type-assertions (`EXISTS_IN`). Each edge carries a
|
||||
source list (the documents that asserted the fact) and a
|
||||
two-dimensional confidence score
|
||||
(`extraction_confidence × source_confidence`). Two sources
|
||||
that disagree create a *disputed* edge pair — slice 2's
|
||||
consistency engine surfaces these as `Contradiction` nodes.
|
||||
|
||||
**Template (slice 5T).** A YAML schema for a polymorphic
|
||||
domain type (thieves-guild mission, war campaign, black-market
|
||||
lot, NPC secret knowledge, etc.). The engine reads the YAML
|
||||
and registers N read-only MCP tools (`list_missions`,
|
||||
`get_mission`, `missions_by_target`, etc.) automatically.
|
||||
No Python change, no server restart — the host calls
|
||||
`reload()` to pick up new templates.
|
||||
|
||||
## 2. Transports: stdio vs Streamable HTTP
|
||||
|
||||
The engine ships two MCP transports. Choose by deployment
|
||||
context, not by preference.
|
||||
|
||||
**stdio** — for local development, IDE plugins, in-process
|
||||
agents. The host spawns the server as a subprocess and pipes
|
||||
JSON-RPC messages over stdin/stdout. See
|
||||
`scripts/05_mcp_server.py`. Latency is ~1ms per call; no
|
||||
network, no auth.
|
||||
|
||||
**Streamable HTTP** (slice 11) — for production deployments
|
||||
where the host is a remote service (web app, multi-user chat
|
||||
backend). The server runs in a hardened Docker container with
|
||||
a 1 MiB body cap, non-root user, and read-only filesystem.
|
||||
The host speaks HTTP+JSON-RPC against the `POST /mcp` endpoint.
|
||||
See `scripts/06_mcp_http_server.py` and the
|
||||
`docker-compose.yml` profile. Latency is ~5–50ms depending
|
||||
on host network.
|
||||
|
||||
**The wire protocol is the same in both.** The host can
|
||||
write the integration code once and switch transports by
|
||||
swapping the RPC adapter. The only thing that changes is
|
||||
how the bytes get from the host to the engine.
|
||||
|
||||
## 3. The 50-tool surface
|
||||
|
||||
`tools/list` returns one entry per tool with `name`,
|
||||
`description`, and `inputSchema` (a JSON Schema). The full
|
||||
surface as of slice 6.7 + 5T.5 + 10 + 11:
|
||||
|
||||
| Group | Count | Examples |
|
||||
|---|---|---|
|
||||
| Read | 12 | `lookup`, `entity_context`, `was_true_at`, `true_during`, `entities_present`, `events_during`, `timeline`, `ancestors_of`, `descendants_of`, `event_chain`, `lore_about`, `significance_of` |
|
||||
| List/expand | 6 | `list_lineage`, `list_offspring`, `location_hierarchy`, `expand_context`, `recent_changes`, `list_lore_sources` |
|
||||
| Read (consistency) | 5 | `run_consistency_check`, `latest_run`, `get_contradictions`, `get_anachronisms`, `get_orphans` |
|
||||
| Read (ontology) | 3 | `get_ontology_violations`, `list_ontology_rules`, `explain_violation` |
|
||||
| Write (entity) | 6 | `add_entity`, `add_relation`, `add_lore_source`, `set_alias`, `update_entity`, `delete_entity` |
|
||||
| Write (workflow) | 4 | `retcon`, `mark_verified`, `merge_entities`, `flag_for_review` |
|
||||
| Write (time) | 3 | `define_calendar`, `define_era`, `define_date` |
|
||||
| Template-generated | ~14 | `list_missions`, `get_mission`, `missions_by_target`, etc. (1 per `query:` in each template) |
|
||||
| Meta | 2 | `list_template_tools`, `reload_templates` |
|
||||
|
||||
**The tool list is dynamic.** Every time the host calls
|
||||
`tools/list`, the engine returns the current registry
|
||||
including any templates that have been loaded. The host
|
||||
should re-fetch on `reload_templates` completion, not
|
||||
rely on a cached list.
|
||||
|
||||
## 4. Read tools: the 24 read patterns
|
||||
|
||||
The 24 read tools fall into 5 design-doc question types. The
|
||||
host's LLM caller should pick a type and follow the canonical
|
||||
tool sequence (see `docs/07-reasoning-harness.md` §"The five
|
||||
question types"):
|
||||
|
||||
**Type 1 — Identity & description.** *"Who is Aldric?"*
|
||||
```
|
||||
lookup(query)
|
||||
entity_context(entity_id, at_time=current)
|
||||
expand_context(entity_id, hops=2, min_confidence=0.5) # if sparse
|
||||
significance_of(entity_id)
|
||||
list_lineage(person) # if Person
|
||||
```
|
||||
|
||||
**Type 2 — Time-bounded fact check.** *"Was X true at T?"*
|
||||
```
|
||||
lookup(subject) + lookup(object) # if not resolved
|
||||
was_true_at(RELATION, subject, object, at_time)
|
||||
cite(claim) # if true
|
||||
true_during(RELATION, subject, object, era) # if false
|
||||
```
|
||||
|
||||
**Type 3 — World state at a time.** *"What was X like at T?"*
|
||||
```
|
||||
lookup(entity)
|
||||
entities_present(location, at_time)
|
||||
events_during(era, location=resolved)
|
||||
get_contradictions(subject=entity, severity=warn)
|
||||
```
|
||||
|
||||
**Type 4 — Causal / chain reasoning.** *"Why did X happen?"*
|
||||
```
|
||||
lookup(event/event_chain_target)
|
||||
event_chain(event, depth=3)
|
||||
ancestors_of(person) + descendants_of(person) # if Person
|
||||
get_anachronisms(entity=central)
|
||||
```
|
||||
|
||||
**Type 5 — Open-ended narrative.** *"Tell me about X."*
|
||||
```
|
||||
lookup(entity)
|
||||
entity_context(entity) # state snapshot
|
||||
event_chain(entity, depth=3)
|
||||
lore_about(entity, type=prose, limit=10)
|
||||
narrate_arc(entity, style=chronicle)
|
||||
cite(claim) # back the spine
|
||||
get_contradictions(subject=entity, severity=warn)
|
||||
```
|
||||
|
||||
**Critical: every read tool returns a `sources` list.** A
|
||||
good integration module extracts the `sources` from each
|
||||
tool response and includes them in the final answer. A
|
||||
claim without a source is a hallucination (per the slice 7.2
|
||||
system prompt's Rule 2).
|
||||
|
||||
**Critical: every read tool respects `at_time`.** A claim
|
||||
about "X was true" without a time scope is wrong by
|
||||
default. The host should pass `at_time` on every fact query;
|
||||
the engine's `current` reserved token resolves to the
|
||||
setting's `current_era`.
|
||||
|
||||
## 5. Write tools: the 12 mutation patterns
|
||||
|
||||
The 12 write tools (slice 10) are world-builder tools, not
|
||||
LLM tools. The integration module should generally **not**
|
||||
let the LLM call these — the LLM is a reader, not an editor.
|
||||
Allow them only behind an explicit confirmation flow (see
|
||||
`docs/19-retcon-policy.md` for the retcon workflow):
|
||||
|
||||
```
|
||||
# 1. The world-builder wants to retcon "Roland married
|
||||
# Aldric" — this is wrong, it was actually "allied with".
|
||||
add_relation(subject="Roland", relation="MARRIED", object="Aldric") # or
|
||||
retcon(edge_id=..., new_object="Aldric", note="...")
|
||||
|
||||
# 2. The world-builder wants to mark an edge as verified
|
||||
# after a human read the source.
|
||||
mark_verified(edge_id=..., verified_by="world_builder", note="checked chronicles")
|
||||
```
|
||||
|
||||
The two most important write tools are `retcon` and
|
||||
`mark_verified` (slice 10.2). Both stamp the edge with an
|
||||
audit log entry; both are append-only at the audit-log
|
||||
level, even when they mutate the edge itself. Every other
|
||||
write tool is a simpler `add_*` / `update_*` /
|
||||
`delete_*` variant.
|
||||
|
||||
**Integration module must:** log every write tool call to
|
||||
the world-builder's audit log (timestamp, tool, args,
|
||||
caller). The audit log is the safety net — if a bad write
|
||||
ever lands, the roll-back path is to read the log.
|
||||
|
||||
## 6. Template-generated tools: 14 polymorphic tools
|
||||
|
||||
Slice 5T shipped 4 example templates (thieves-guild mission,
|
||||
war campaign, black-market lot, NPC secret knowledge). Each
|
||||
template has 3-4 `query:` blocks, each of which becomes an
|
||||
MCP tool at registration time. The total template-generated
|
||||
surface is ~14 tools, and it grows when the world-builder
|
||||
adds more `templates/*.yaml` files.
|
||||
|
||||
The template tools are read-only; they run a Cypher query
|
||||
(allowlist-validated per slice 5T.3) against the
|
||||
`:DomainEntity` nodes the engine has ingested. The full
|
||||
killer demo walkthrough is in `docs/14-examples.md` §"Example
|
||||
5: Planes of existence" and the slice 5T ADR (`docs/adr/0012-typetemplate-polymorphism.md`).
|
||||
|
||||
**Integration module must:** re-discover the tool list
|
||||
after every `reload_templates` call. A cached list from
|
||||
before a template was added will return
|
||||
`method_not_found` for the new tool.
|
||||
|
||||
## 7. The 7 integration rules
|
||||
|
||||
These are the rules a good integration module follows. They
|
||||
come from the system prompt (`prompts/system_prompt.md`,
|
||||
slice 7.2), the design docs, and the ADRs. The
|
||||
`tests/harness/test_questions.py` 50-question test set
|
||||
checks that the LLM's tool sequence satisfies them.
|
||||
|
||||
**Rule 1 — Always `lookup` first.** Don't guess entity
|
||||
IDs. The cost of one `lookup` is 1ms; the cost of a wrong
|
||||
guess is a hallucinated answer.
|
||||
|
||||
**Rule 2 — Cite every claim.** Every specific factual
|
||||
claim in the host's response must cite at least one source
|
||||
returned by a tool. A claim without a source is a
|
||||
hallucination.
|
||||
|
||||
**Rule 3 — Time-window every fact query.** Pass `at_time`
|
||||
on every fact query (`was_true_at`, `true_during`, etc.).
|
||||
Default to `current` only when the user has not specified
|
||||
a time. Make the time explicit in the answer.
|
||||
|
||||
**Rule 4 — Never resolve contradictions yourself.** If
|
||||
two sources disagree, surface both with both sources.
|
||||
The world-builder decides.
|
||||
|
||||
**Rule 5 — `setting=` is mandatory for cross-setting
|
||||
questions.** When the user asks a question that could mix
|
||||
multiple settings, the host should pass `setting=<id>`
|
||||
explicitly. The default behaviour (no filter) is correct
|
||||
for single-setting worlds; the slice 6.5 cross-setting
|
||||
filter is the safe default for multi-setting worlds.
|
||||
|
||||
**Rule 6 — Re-discover `tools/list` after `reload_templates`.**
|
||||
A cached list from before a template was added will
|
||||
return `method_not_found` for the new tool. The
|
||||
`reload_templates` tool's response is the contract that
|
||||
"the registry is now what you saw".
|
||||
|
||||
**Rule 7 — For long historical arcs, check
|
||||
`latest_run()` first.** Stale consistency data is
|
||||
dangerous — a contradiction that the consistency engine
|
||||
found 2 weeks ago may have been resolved by a retcon
|
||||
since. `latest_run()` returns the timestamp and counts of
|
||||
the most recent consistency pass.
|
||||
|
||||
## 8. The 6 failure modes the host must avoid
|
||||
|
||||
These come from `docs/07-reasoning-harness.md` §"Failure
|
||||
modes the LLM must avoid" and are the same rules the
|
||||
host's LLM caller is told. The integration module should
|
||||
detect each and reject the response:
|
||||
|
||||
**F1 — Answering from training data.** Symptom: the LLM
|
||||
says "Aldric is the heir to House Vyr" without calling
|
||||
`entity_context` first. The host's audit log should flag
|
||||
any tool-using turn that produces a specific fact claim
|
||||
without a corresponding tool call in the trace.
|
||||
|
||||
**F2 — Resolving contradictions.** Symptom: the LLM
|
||||
picks one of two disagreeing sources. The host should
|
||||
reject any response that mentions a `is_disputed: true`
|
||||
edge and presents the answer as settled.
|
||||
|
||||
**F3 — Confusing present and past.** Symptom: "Aldric
|
||||
rules Valdorn" without a time scope. The host should
|
||||
require `at_time` on every fact query and surface the
|
||||
time in the answer.
|
||||
|
||||
**F4 — Treating `lore_verified: false` as canonical.**
|
||||
Symptom: the LLM cites an entity that only exists in
|
||||
encounter data and has no lore document. The host
|
||||
should mark provisional entities explicitly in the
|
||||
response.
|
||||
|
||||
**F5 — Skipping the consistency check.** Symptom: the
|
||||
LLM answers a 5-generation family question without
|
||||
calling `get_anachronisms`. The host should make
|
||||
`get_anachronisms` mandatory for any question involving
|
||||
3+ entities or 1+ time hop.
|
||||
|
||||
**F6 — Hallucinating tool results.** Symptom: the LLM
|
||||
says "the tool returned X" when the tool actually
|
||||
returned Y or nothing. The host should verify every
|
||||
quoted tool result against the actual tool return
|
||||
(cross-check the trace).
|
||||
|
||||
## 9. The 4 metrics a good integration module measures
|
||||
|
||||
A "good integration module" is one that catches its own
|
||||
regressions. The 4 metrics (slice 7.3) are the
|
||||
regression net:
|
||||
|
||||
**Tool-selection accuracy** (per type). What fraction
|
||||
of the LLM's tool sequences match the canonical sequence
|
||||
for each question type. AC 7.3: ≥80% on the 50-question
|
||||
test set.
|
||||
|
||||
**Citation rate.** What fraction of claims cite ≥1
|
||||
source. AC 7.4: ≥90%.
|
||||
|
||||
**Hallucination rate.** Average number of unsourced
|
||||
facts per question. AC 7.5: <5%.
|
||||
|
||||
**Time-window violation rate.** What fraction of answers
|
||||
made claims outside the question's `at_time` window.
|
||||
AC 7.6: <5%.
|
||||
|
||||
The integration module should run the harness
|
||||
(`tests/harness/questions.json`) before each release and
|
||||
fail the build if any metric regresses. The
|
||||
`scripts/harness/run_questions.py` runner (slice 7.3,
|
||||
Track B — needs `$OLLAMA_API_KEY`) is the canonical
|
||||
way to measure.
|
||||
|
||||
## 10. Adding a new domain type via templates/
|
||||
|
||||
The killer demo (slice 5T.5). A new domain type is one
|
||||
YAML file away. Walkthrough:
|
||||
|
||||
```bash
|
||||
# 1. Drop a template YAML
|
||||
cat > lore_engine_poc/seed/templates/npc_quirk.yaml <<'EOF'
|
||||
template:
|
||||
id: npc_quirk
|
||||
version: 1.0.0
|
||||
label: NPCQuirk
|
||||
description: A persistent behavioral quirk for an NPC.
|
||||
|
||||
entity:
|
||||
properties:
|
||||
- {name: trigger, type: string, required: true}
|
||||
- {name: response, type: string, required: true}
|
||||
- {name: severity, type: enum, values: [minor, major, defining]}
|
||||
|
||||
relations:
|
||||
- {to_type: Person, type: QUIRK_OF}
|
||||
|
||||
queries:
|
||||
- id: list_quirks
|
||||
description: List every quirk, sorted by severity.
|
||||
cypher: |
|
||||
MATCH (n:DomainEntity {type: 'NPCQuirk'})
|
||||
RETURN n ORDER BY n.severity
|
||||
parameters: {}
|
||||
|
||||
- id: quirks_of
|
||||
description: All quirks of a given NPC.
|
||||
cypher: |
|
||||
MATCH (n:DomainEntity {type: 'NPCQuirk'})-[:QUIRK_OF]->(p {name: $name})
|
||||
RETURN n
|
||||
parameters:
|
||||
name: {type: string, required: true}
|
||||
EOF
|
||||
|
||||
# 2. Reload templates (no restart)
|
||||
python3 scripts/01_ingest.py --reload-templates --skip-cognee
|
||||
|
||||
# 3. Ingest an instance
|
||||
cat > lore_engine_poc/seed/instances/aldric_quirks.yaml <<'EOF'
|
||||
template_id: npc_quirk
|
||||
instances:
|
||||
- name: Aldric's coin flip
|
||||
properties:
|
||||
trigger: asked for a side
|
||||
response: flips a Valdorni silver piece; calls in the air
|
||||
severity: major
|
||||
relations:
|
||||
- {to: Aldric Raventhorne, type: QUIRK_OF}
|
||||
EOF
|
||||
|
||||
python3 scripts/01_ingest.py --ingest-instance \
|
||||
lore_engine_poc/seed/instances/aldric_quirks.yaml --skip-cognee
|
||||
|
||||
# 4. Use the generated tool
|
||||
python3 scripts/05_mcp_server.py --port 18765 &
|
||||
curl -s http://127.0.0.1:18765/mcp \
|
||||
-H 'Content-Type: application/json' \
|
||||
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
|
||||
"params":{"name":"quirks_of",
|
||||
"arguments":{"name":"Aldric Raventhorne"}}}'
|
||||
```
|
||||
|
||||
The 2 new tools (`list_quirks`, `quirks_of`) appeared with
|
||||
no Python change and no engine restart. The same pattern
|
||||
works for any domain type the world-builder wants to model.
|
||||
|
||||
## 11. Worked end-to-end example
|
||||
|
||||
A 30-line host that asks "Was House Vyr allied with the
|
||||
Crimson Pact in 340 TA?" and gets a cited answer back:
|
||||
|
||||
```python
|
||||
import json, subprocess, sys
|
||||
|
||||
server = subprocess.Popen(
|
||||
[sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
|
||||
stdin=subprocess.PIPE, stdout=subprocess.PIPE,
|
||||
text=True, bufsize=1,
|
||||
)
|
||||
|
||||
def rpc(method, params=None, id_=1):
|
||||
msg = {"jsonrpc": "2.0", "id": id_, "method": method,
|
||||
"params": params or {}}
|
||||
server.stdin.write(json.dumps(msg) + "\n")
|
||||
server.stdin.flush()
|
||||
return json.loads(server.stdout.readline())
|
||||
|
||||
# 1. Initialize + discover.
|
||||
rpc("initialize", id_=1)
|
||||
tools = {t["name"]: t for t in rpc("tools/list", id_=2)["result"]["tools"]}
|
||||
|
||||
# 2. Resolve both entities (Rule 1).
|
||||
rpc("tools/call",
|
||||
params={"name": "lookup", "arguments": {"query": "House Vyr"}}, id_=3)
|
||||
rpc("tools/call",
|
||||
params={"name": "lookup", "arguments": {"query": "Crimson Pact"}}, id_=4)
|
||||
|
||||
# 3. Time-bounded fact query (Rule 3).
|
||||
fact = rpc("tools/call",
|
||||
params={"name": "was_true_at",
|
||||
"arguments": {"relation": "ALLIED_WITH",
|
||||
"subject": "House Vyr",
|
||||
"object": "Crimson Pact",
|
||||
"at_time": "3rd_age.year_340"}},
|
||||
id_=5)["result"]
|
||||
|
||||
# 4. Render the answer with citations (Rule 2).
|
||||
if fact["was_true"]:
|
||||
answer = (f"Yes — House Vyr was allied with the Crimson Pact "
|
||||
f"from {fact['valid_from']} to {fact['valid_until']}. "
|
||||
f"Sources: {', '.join(fact['sources'])}")
|
||||
else:
|
||||
answer = ("No — they were not allied at that time. "
|
||||
f"Tools examined: {fact['edges_examined']}")
|
||||
|
||||
print(answer)
|
||||
```
|
||||
|
||||
Expected output (Mardonari codex, slice 0 fixture):
|
||||
```
|
||||
Yes — House Vyr was allied with the Crimson Pact
|
||||
from 3rd_age.year_312 to 3rd_age.year_345.
|
||||
Sources: chronicles-vyr.md, pact-treaties.md
|
||||
```
|
||||
|
||||
## 12. Where to go next
|
||||
|
||||
- [`integration-module-contract.md`](./integration-module-contract.md) — the
|
||||
formal contract a host module must satisfy to be "good"
|
||||
- [`docs/00-overview.md`](./00-overview.md) — engine overview
|
||||
- [`docs/05-mcp-tools.md`](./05-mcp-tools.md) — the full tool catalog
|
||||
- [`docs/07-reasoning-harness.md`](./07-reasoning-harness.md) — the
|
||||
5 question types and 6 failure modes
|
||||
- [`docs/11-extensibility.md`](./11-extensibility.md) — the
|
||||
TypeTemplate polymorphic layer
|
||||
- [`docs/17-planes.md`](./17-planes.md) — the Setting/Plane
|
||||
model
|
||||
- [`docs/19-retcon-policy.md`](./19-retcon-policy.md) —
|
||||
retcon + mark_verified audit policy
|
||||
- [`docs/20-multi-setting-policy.md`](./20-multi-setting-policy.md) —
|
||||
cross-setting rules
|
||||
- [`docs/21-quickstart.md`](./21-quickstart.md) — 5-minute
|
||||
setup
|
||||
- [`docs/adr/`](./adr/) — the 13 ADRs that pin the design
|
||||
decisions
|
||||
- `prompts/system_prompt.md` in the poc repo — the system
|
||||
prompt the LLM caller is told
|
||||
- `tests/harness/questions.yaml` in the poc repo — the
|
||||
50-question regression net
|
||||
358
docs/integration-module-contract.md
Normal file
358
docs/integration-module-contract.md
Normal file
@@ -0,0 +1,358 @@
|
||||
# Integration Module Contract
|
||||
|
||||
**Audience:** authors of host modules — LLM agents, chat UIs,
|
||||
IDE plugins, Discord bots, CLIs, anything that wraps the Lore
|
||||
Engine's MCP server.
|
||||
|
||||
**What this doc is:** the formal contract a host module must
|
||||
satisfy. The 7 rules in [`INTEGRATION.md`](./INTEGRATION.md) are
|
||||
the same rules; this doc is the version that's machine-checkable
|
||||
(every rule has a test, every test is in `tests/harness/`).
|
||||
|
||||
**The contract is one-way.** The engine promises a fixed wire
|
||||
protocol (JSON-RPC over stdio or HTTP) and a fixed tool surface
|
||||
(name, description, JSON Schema per tool). The host promises to
|
||||
satisfy these rules; if it doesn't, the engine will produce
|
||||
wrong answers and the LLM caller will hallucinate.
|
||||
|
||||
## The contract, version 1.2
|
||||
|
||||
This contract is versioned alongside the system prompt
|
||||
(`prompts/system_prompt.md`, slice 7.2). When the prompt
|
||||
version bumps, this contract bumps; old hosts that
|
||||
satisfy v1.0 may not satisfy v1.2.
|
||||
|
||||
| Rule | Test | What the host must do |
|
||||
|---|---|---|
|
||||
| R1 — Discover | `test_7_2_registry_well_formed` | Re-fetch `tools/list` after every `reload_templates` |
|
||||
| R2 — Lookup first | `test_7_1_every_question_has_expected_tools` | Call `lookup` before any `entity_context` / `was_true_at` / `entity_about` |
|
||||
| R3 — Cite | (host-side audit) | Include the `sources` from every tool response in the final answer |
|
||||
| R4 — Time-window | (host-side audit) | Pass `at_time` on every fact query; surface the time in the answer |
|
||||
| R5 — Don't resolve contradictions | `test_7_2_prompt_citation_rule_present` | Reject any response that mentions `is_disputed: true` and presents it as settled |
|
||||
| R6 — Setting filter | (host-side audit) | Pass `setting=<id>` when the user asks a cross-setting question |
|
||||
| R7 — Reload contract | (host-side audit) | Treat `reload_templates`'s response as the new registry state |
|
||||
|
||||
## Rule 1 — Discover (test: `test_7_2_registry_well_formed`)
|
||||
|
||||
**What:** the host's tool registry must reflect the engine's
|
||||
current tool list at the time of the call.
|
||||
|
||||
**Why:** templates are hot-reloadable. A host that caches the
|
||||
tool list from a previous `tools/list` will call tools that
|
||||
no longer exist (after a template was removed) or miss tools
|
||||
that were just added (after a template was added).
|
||||
|
||||
**Test:** `test_7_2_registry_well_formed` (slice 7.2) pins the
|
||||
*server-side* contract — the registry must be well-formed.
|
||||
The *client-side* contract is host-side: the host must
|
||||
re-fetch `tools/list` after every `reload_templates` call.
|
||||
|
||||
**Failure mode:** the host calls `get_mission` after
|
||||
`thieves_guild_mission.yaml` was removed. The engine returns
|
||||
`method_not_found`; the host's LLM caller hallucinates an
|
||||
answer.
|
||||
|
||||
**Mitigation:** every `reload_templates` response includes the
|
||||
new tool list. The host should store it as the canonical
|
||||
"current tools" and re-resolve on every dispatch.
|
||||
|
||||
## Rule 2 — Lookup first (test: `test_7_1_every_question_has_expected_tools`)
|
||||
|
||||
**What:** every question that resolves to an entity must
|
||||
call `lookup` before any other read tool.
|
||||
|
||||
**Why:** entity names are ambiguous. "The dagger" is one of
|
||||
many; the LLM cannot know which one. `lookup` returns a
|
||||
canonical id (or a disambiguation list). The LLM picks one
|
||||
(or asks the user).
|
||||
|
||||
**Test:** the 50-question set in
|
||||
`tests/harness/questions.yaml` requires `lookup` to appear
|
||||
in the canonical tool sequence for every question that names
|
||||
an entity.
|
||||
|
||||
**Failure mode:** the LLM guesses the entity. The guess
|
||||
resolves to the wrong id. The tool returns "unknown entity"
|
||||
or a wrong entity's context. The LLM hallucinates an answer.
|
||||
|
||||
**Mitigation:** the host's LLM caller must include `lookup`
|
||||
in every Type 1-4 question's tool sequence. The
|
||||
`test_7_1_every_question_has_expected_tools` test pins this
|
||||
on the server side; the host-side pin is "include `lookup`
|
||||
or your test suite fails".
|
||||
|
||||
## Rule 3 — Cite every claim (test: `test_7_2_prompt_citation_rule_present`)
|
||||
|
||||
**What:** every specific factual claim in the host's
|
||||
response must cite at least one source returned by a tool.
|
||||
|
||||
**Why:** a claim without a source is a hallucination. The
|
||||
engine returns a `sources` list on every edge-bearing tool
|
||||
response; the host's job is to forward those sources
|
||||
through to the final answer.
|
||||
|
||||
**Test:** `test_7_2_prompt_citation_rule_present` pins the
|
||||
*server-side* contract — the system prompt must contain
|
||||
the citation rule. The *client-side* pin is the citation
|
||||
rate metric (AC 7.4): ≥90% of claims cite ≥1 source.
|
||||
|
||||
**Failure mode:** the LLM says "Aldric is the heir to House
|
||||
Vyr" with no source. The user can't verify; the answer
|
||||
might be from training data, not the codex.
|
||||
|
||||
**Mitigation:** every tool response includes a `sources`
|
||||
list. The host should pass this list through to the LLM
|
||||
caller and require the LLM to include ≥1 source per claim
|
||||
in its response. A claim without a source is a
|
||||
hallucination and should be rejected.
|
||||
|
||||
## Rule 4 — Time-window every fact query (test: `test_7_2_prompt_time_window_rule_present`)
|
||||
|
||||
**What:** every fact query must pass `at_time`, and the
|
||||
host's response must surface the time in the answer.
|
||||
|
||||
**Why:** "Was X true?" is incomplete without "When?". The
|
||||
codex is time-bounded; an answer about the past presented
|
||||
as the present is wrong by default.
|
||||
|
||||
**Test:** `test_7_2_prompt_time_window_rule_present` pins the
|
||||
server-side rule. The client-side pin is the time-window
|
||||
violation rate metric (AC 7.6): <5% of answers make claims
|
||||
outside the question's `at_time`.
|
||||
|
||||
**Failure mode:** "Aldric rules Valdorn" (he died in 360
|
||||
TA; the campaign is in 380 TA). The LLM should have
|
||||
scoped to 350 TA or earlier.
|
||||
|
||||
**Mitigation:** the host's LLM caller should pass `at_time`
|
||||
on every `was_true_at`, `true_during`, `entities_present`,
|
||||
and `events_during` call. If the user didn't specify a
|
||||
time, default to the setting's `current_era`.
|
||||
|
||||
## Rule 5 — Don't resolve contradictions (test: `test_7_2_prompt_citation_rule_present`)
|
||||
|
||||
**What:** the host must surface contradictions, not
|
||||
resolve them.
|
||||
|
||||
**Why:** two sources disagree. The LLM cannot know which
|
||||
is right — the world-builder decides. The engine marks
|
||||
the edge as `is_disputed: true` and points at the
|
||||
disagreeing edges via `disputed_with`. The host's job
|
||||
is to forward both sides.
|
||||
|
||||
**Test:** the slice 2 consistency engine tests pin the
|
||||
server-side rule (the engine returns disputed edges
|
||||
with both sources). The client-side rule is "any
|
||||
response that mentions `is_disputed: true` and presents
|
||||
the answer as settled is a bug".
|
||||
|
||||
**Failure mode:** the LLM picks the more recent source.
|
||||
The world-builder's source (older, authoritative) is
|
||||
silently dropped. The user gets a wrong answer.
|
||||
|
||||
**Mitigation:** the host's LLM caller is told (in
|
||||
`prompts/system_prompt.md` Rule 4) to never resolve
|
||||
contradictions. The host should also reject any
|
||||
response that mentions `is_disputed: true` and presents
|
||||
the answer as settled — that's the host's enforcement
|
||||
layer for the rule.
|
||||
|
||||
## Rule 6 — Setting filter for cross-setting questions
|
||||
|
||||
**What:** when the user asks a question that could mix
|
||||
multiple settings, the host must pass `setting=<id>`
|
||||
explicitly.
|
||||
|
||||
**Why:** the slice 6.5 setting filter exists exactly to
|
||||
prevent cross-setting bleed. A query for "events in the
|
||||
3rd Age" should not return events from both `mardonari`
|
||||
and `the_wild_dream` if the user only meant one.
|
||||
|
||||
**Test:** `test_6_5_setting_filter_on_was_true_at` (slice
|
||||
6.5) pins the server-side rule — the filter is
|
||||
additive, `setting=None` (default) keeps the
|
||||
single-setting behaviour. The client-side rule is "any
|
||||
question whose answer could cross settings should
|
||||
pass `setting=<id>`".
|
||||
|
||||
**Failure mode:** the user asks "What happened in the
|
||||
3rd Age?" and the LLM returns events from both settings
|
||||
without distinction. The user doesn't know which
|
||||
setting each event belongs to.
|
||||
|
||||
**Mitigation:** the host should track the "active
|
||||
setting" in the conversation context. When the user
|
||||
mentions a setting name (e.g. "in Mardonari"), the host
|
||||
sets the active setting. When the user asks a question
|
||||
without a setting, the host either asks "which
|
||||
setting?" or uses the conversation's active setting
|
||||
explicitly.
|
||||
|
||||
## Rule 7 — Reload contract
|
||||
|
||||
**What:** after every `reload_templates` call, the host
|
||||
must treat the response's tool list as the new canonical
|
||||
state.
|
||||
|
||||
**Why:** the template registry may have added, removed,
|
||||
or modified tools. A host that holds a stale tool list
|
||||
will dispatch to non-existent tools or miss new ones.
|
||||
|
||||
**Test:** no automated test on the server side (the
|
||||
server's `reload_templates` always returns the new list).
|
||||
The client-side test is "after every `reload_templates`
|
||||
call, re-fetch `tools/list` and re-validate the host's
|
||||
tool registry".
|
||||
|
||||
**Failure mode:** the world-builder adds a new template
|
||||
and calls `reload_templates`. The host doesn't re-fetch
|
||||
the tool list. The LLM caller tries to call
|
||||
`list_missions` and gets `method_not_found`. The LLM
|
||||
hallucinates an answer.
|
||||
|
||||
**Mitigation:** the host's `reload_templates` handler
|
||||
should:
|
||||
|
||||
1. Call `reload_templates` on the engine.
|
||||
2. Re-call `tools/list`.
|
||||
3. Replace the local tool registry.
|
||||
4. Re-validate any in-flight conversations (or surface
|
||||
a "tools have changed" notice to the user).
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
A host module is "good" when it satisfies all 7 rules.
|
||||
The minimum acceptance suite:
|
||||
|
||||
```python
|
||||
# test_host_compliance.py
|
||||
import json, subprocess, sys
|
||||
|
||||
def test_host_uses_lookup_first():
|
||||
"""Every Type 1-4 question's tool trace must include lookup."""
|
||||
...
|
||||
|
||||
def test_host_cites_every_claim():
|
||||
"""Every claim in the response must include ≥1 source."""
|
||||
...
|
||||
|
||||
def test_host_time_windows_every_fact_query():
|
||||
"""Every fact query must pass at_time; the response surfaces it."""
|
||||
...
|
||||
|
||||
def test_host_does_not_resolve_contradictions():
|
||||
"""Any response mentioning is_disputed and presenting as settled is rejected."""
|
||||
...
|
||||
|
||||
def test_host_passes_setting_for_cross_setting():
|
||||
"""Cross-setting questions must pass setting=<id> explicitly."""
|
||||
...
|
||||
|
||||
def test_host_refetches_tools_list_on_reload():
|
||||
"""After reload_templates, the host's tool registry must match the engine's."""
|
||||
...
|
||||
```
|
||||
|
||||
The full harness (`tests/harness/test_questions.py` +
|
||||
`tests/harness/test_system_prompt.py` + the slice 7.3
|
||||
runner) is the regression net. The host is "good" when
|
||||
its 50-question run scores:
|
||||
|
||||
- tool-selection accuracy ≥80%
|
||||
- citation rate ≥90%
|
||||
- hallucination rate <5%
|
||||
- time-window violation rate <5%
|
||||
|
||||
## What the engine promises
|
||||
|
||||
The engine is a fixed-target service from the host's
|
||||
point of view. The promises:
|
||||
|
||||
- **Wire protocol is JSON-RPC 2.0** (per the MCP
|
||||
specification). Every tool call is a single
|
||||
request/response. No streaming, no async.
|
||||
- **Tool names are stable** within a major version.
|
||||
A tool's `name` and `inputSchema` are versioned
|
||||
together; a host that calls a v1.2 tool against a
|
||||
v1.1 engine gets `invalid_params` (schema mismatch)
|
||||
or `method_not_found` (tool removed).
|
||||
- **Tool responses are JSON objects with a stable
|
||||
shape.** The `sources`, `at_time`, `valid_from`,
|
||||
`valid_until`, `is_disputed` fields are guaranteed.
|
||||
New fields may be added in minor versions; the host
|
||||
should ignore unknown fields.
|
||||
- **Errors are JSON-RPC standard.** Invalid params,
|
||||
method not found, internal error — each maps to a
|
||||
standard `code` (per the JSON-RPC spec) and a
|
||||
human-readable `message`. The host can branch on
|
||||
the code without parsing the message.
|
||||
- **Idempotency:** `lookup`, `entity_context`,
|
||||
`was_true_at`, and other read tools are pure. The
|
||||
same arguments always return the same response (modulo
|
||||
graph updates). Write tools are idempotent only when
|
||||
the args are the same — re-running `add_entity` with
|
||||
the same args is a no-op; re-running with different
|
||||
args is an error.
|
||||
- **Hot-reload:** the engine supports `reload_templates`
|
||||
at any time. The response is the new tool list. The
|
||||
host can call this between conversations or even
|
||||
mid-conversation (the active conversation's
|
||||
`tools/list` call will return the new list).
|
||||
|
||||
## Versioning
|
||||
|
||||
| Engine version | Contract version | Notes |
|
||||
|---|---|---|
|
||||
| v1.0 (slice 0–3) | 1.0 | Initial tool surface (12 read tools) |
|
||||
| v1.1 (slice 4–5T) | 1.1 | + 12 read tools + 14 template tools = 38 |
|
||||
| v1.2 (slice 6–11) | 1.2 | + 6 read tools + 4 write tools + setting filter + Streamable HTTP transport |
|
||||
| v2.0 (planned) | 2.0 | Cypher-write templates, cross-LLM benchmarks, UI |
|
||||
|
||||
The contract version is the same as the engine's schema
|
||||
version (the `schema_version` field on the `Setting`
|
||||
node, per slice 6.4). Hosts that target v1.2 will not
|
||||
work against a v1.1 engine (missing tools) or a v2.0
|
||||
engine (renamed/removed tools). The mismatch surfaces
|
||||
as a `method_not_found` or `invalid_params` error on
|
||||
the first call.
|
||||
|
||||
## Out of scope (deferred)
|
||||
|
||||
- **Streaming.** The engine does not support
|
||||
`tools/call` with server-sent events. Long-running
|
||||
queries (e.g. cross-codex searches) block until
|
||||
complete. Streaming is a v2.0 follow-up.
|
||||
- **Authentication.** The stdio transport is local-only
|
||||
(no auth). The Streamable HTTP transport runs in a
|
||||
Docker container with a 1 MiB body cap and a
|
||||
loopback bind by default; production deployments
|
||||
should add a reverse proxy with auth.
|
||||
- **Multi-tenant.** A single engine instance holds one
|
||||
graph. Multi-tenant (one engine, multiple worlds)
|
||||
is a v2.0 follow-up; the v1.2 model is
|
||||
multi-*setting* within one world.
|
||||
- **UI for failure-mode review.** The slice 7.4
|
||||
red-team suite produces a failure-mode log; a UI to
|
||||
review it is a v2.0 follow-up.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- [`INTEGRATION.md`](./INTEGRATION.md) — the practical
|
||||
how-to guide (companion to this contract)
|
||||
- [`docs/07-reasoning-harness.md`](./07-reasoning-harness.md) — the
|
||||
5 question types and 6 failure modes
|
||||
- [`docs/05-mcp-tools.md`](./05-mcp-tools.md) — the full
|
||||
tool catalog with response shapes
|
||||
- [`docs/19-retcon-policy.md`](./19-retcon-policy.md) —
|
||||
the retcon + mark_verified audit policy
|
||||
- [`docs/20-multi-setting-policy.md`](./20-multi-setting-policy.md) —
|
||||
the cross-setting rules
|
||||
- [`docs/adr/0011-graph-backend-protocol.md`](./adr/0011-graph-backend-protocol.md) —
|
||||
the `GraphBackend` Protocol (informs the engine's
|
||||
substrate promise)
|
||||
- [`docs/adr/0012-typetemplate-polymorphism.md`](./adr/0012-typetemplate-polymorphism.md) —
|
||||
the slice 5T TypeTemplate layer
|
||||
- `prompts/system_prompt.md` in the poc repo — the
|
||||
system prompt the LLM caller is told
|
||||
- `tests/harness/questions.yaml` in the poc repo — the
|
||||
50-question regression net
|
||||
Reference in New Issue
Block a user