docs: integration module how-to (INTEGRATION.md) + formal contract

Two companion docs answering 'how does a host module drive the Lore Engine correctly?'. INTEGRATION.md — the practical guide. Audience: anyone who has the engine and wants to wrap it. 12 sections: TL;DR (30-line integration module), mental model, transports, 50-tool surface, 24 read tools + 12 write tools, template-generated tools, 7 integration rules, 6 failure modes, 4 metrics, adding a new domain type, worked end-to-end example. integration-module-contract.md — the formal, testable contract. Audience: host-app authors. The 7 rules + their tests + their failure modes. Versions with the system prompt (v1.0/v1.1/v1.2). The host is 'good' when its 50-question harness run scores: tool-selection accuracy >=80%, citation rate >=90%, hallucination rate <5%, time-window violation rate <5%. Per the slice 7 doc deliverable (slice 7 Track A, blocked on the API key for the LLM execution half). These are the hand-off artefacts for any future host module author. Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-19 23:11:04 -04:00
parent 122ce88295
commit 7d2fe1f97e
2 changed files with 910 additions and 0 deletions
--- a/docs/INTEGRATION.md
+++ b/docs/INTEGRATION.md
@@ -0,0 +1,552 @@
+# Integration Guide
+
+**Audience:** developers who have the Lore Engine POC installed
+(`~/projects/lore-engine-poc/`) and want to wire it into a host
+application — an LLM agent, a chat UI, an IDE plugin, a Discord
+bot, a CLI tool, anything that needs to ask questions about a
+fictional world.
+
+**What this doc is:** the practical "how to drive the engine"
+guide. The 22 design docs in this repo describe the engine from
+the inside out (ontology, time model, consistency rules, planes,
+templates, ADRs). This doc is the outside-in view: what the
+host sends, what the engine returns, what the host must do in
+between to satisfy the engine's contract.
+
+**What this doc is not:** it does not duplicate the design
+rationale (see `docs/00-overview.md` for that). It also does
+not cover the engine's *internal* code path — for that, the
+test files in `tests/` are the canonical examples.
+
+## TL;DR — the 30-line integration module
+
+```python
+import json, subprocess, sys
+
+# 1. Spawn the MCP server (stdio transport)
+server = subprocess.Popen(
+    [sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
+    stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+    text=True, bufsize=1,
+)
+
+def rpc(method, params=None, id_=None):
+    msg = {"jsonrpc": "2.0", "method": method, "params": params or {}}
+    if id_ is not None:
+        msg["id"] = id_
+    server.stdin.write(json.dumps(msg) + "\n")
+    server.stdin.flush()
+    return json.loads(server.stdout.readline())
+
+# 2. Discover the tools
+rpc("initialize", id_=1)
+tools = rpc("tools/list", id_=2)["result"]["tools"]
+# tools is a list of {name, description, inputSchema}
+
+# 3. Call one
+result = rpc("tools/call",
+    params={"name": "entity_context",
+            "arguments": {"name": "Roland Raventhorne",
+                          "at_time": "3rd_age.year_345"}},
+    id_=3)["result"]
+# result is {content: [...], isError: bool}
+```
+
+That's the whole shape. The rest of this doc explains what
+the 50 tools do, what their responses mean, and the rules
+the host must follow to use them correctly.
+
+## Table of contents
+
+1. [The mental model](#1-the-mental-model)
+2. [Transports: stdio vs Streamable HTTP](#2-transports-stdio-vs-streamable-http)
+3. [The 50-tool surface](#3-the-50-tool-surface)
+4. [Read tools: the 24 read patterns](#4-read-tools-the-24-read-patterns)
+5. [Write tools: the 12 mutation patterns](#5-write-tools-the-12-mutation-patterns)
+6. [Template-generated tools: 14 polymorphic tools](#6-template-generated-tools-14-polymorphic-tools)
+7. [The 7 integration rules](#7-the-7-integration-rules)
+8. [The 6 failure modes the host must avoid](#8-the-6-failure-modes-the-host-must-avoid)
+9. [The 4 metrics a good integration module measures](#9-the-4-metrics-a-good-integration-module-measures)
+10. [Adding a new domain type via templates/](#10-adding-a-new-domain-type-via-templates)
+11. [Worked end-to-end example](#11-worked-end-to-end-example)
+12. [Where to go next](#12-where-to-go-next)
+
+## 1. The mental model
+
+The Lore Engine is a typed, time-aware, multi-setting knowledge
+graph with a reified :Relation layer and a polymorphic
+:DomainEntity substrate. The host sees it as a single JSON-RPC
+service. The five concepts the host must internalize:
+
+**Setting.** A campaign/world scope. Every entity belongs to
+exactly one Setting via an `EXISTS_IN` edge (the slice 6
+setting filter consumes this). The default Mardonari codex
+lives in `setting="mardonari"`. The Wild Dream (slice 6.5
+test target) lives in `setting="the_wild_dream"`.
+
+**Plane.** A layer of existence within a Setting (Material,
+Shadowfell, demiplane, Outer Plane, transit, etc.). Planes
+are first-class nodes since slice 6.1. They have relations
+to other planes (`LAYER_OF`, `REFLECTS`, `ADJACENT_TO`,
+`ACCESSIBLE_VIA`). The Voldramir demiplane is a child of
+Mardonari's Material Plane via `LAYER_OF`.
+
+**Entity.** A typed node. The 36 core labels are: Person,
+Faction, Location, Region, Item, Era, Date, Lineage, Culture,
+Deity, Language, MagicSystem, Title, Material, Event, Creature,
+Spell, NPC, PC, Human, LoreSource, LoreVerified, Plus, ItemSlot,
+DomainEntity, TypeTemplate, Setting, Plane, … (about 36 in
+total, with some added in slice 5T/6). Every entity has a
+canonical name (the `by_name` key) and a type (the `add_entity_of_type`
+index).
+
+**Edge.** A typed relation between two entities. Most edges
+are time-bounded (`valid_from` / `valid_until`); some are
+timeless type-assertions (`EXISTS_IN`). Each edge carries a
+source list (the documents that asserted the fact) and a
+two-dimensional confidence score
+(`extraction_confidence × source_confidence`). Two sources
+that disagree create a *disputed* edge pair — slice 2's
+consistency engine surfaces these as `Contradiction` nodes.
+
+**Template (slice 5T).** A YAML schema for a polymorphic
+domain type (thieves-guild mission, war campaign, black-market
+lot, NPC secret knowledge, etc.). The engine reads the YAML
+and registers N read-only MCP tools (`list_missions`,
+`get_mission`, `missions_by_target`, etc.) automatically.
+No Python change, no server restart — the host calls
+`reload()` to pick up new templates.
+
+## 2. Transports: stdio vs Streamable HTTP
+
+The engine ships two MCP transports. Choose by deployment
+context, not by preference.
+
+**stdio** — for local development, IDE plugins, in-process
+agents. The host spawns the server as a subprocess and pipes
+JSON-RPC messages over stdin/stdout. See
+`scripts/05_mcp_server.py`. Latency is ~1ms per call; no
+network, no auth.
+
+**Streamable HTTP** (slice 11) — for production deployments
+where the host is a remote service (web app, multi-user chat
+backend). The server runs in a hardened Docker container with
+a 1 MiB body cap, non-root user, and read-only filesystem.
+The host speaks HTTP+JSON-RPC against the `POST /mcp` endpoint.
+See `scripts/06_mcp_http_server.py` and the
+`docker-compose.yml` profile. Latency is ~5–50ms depending
+on host network.
+
+**The wire protocol is the same in both.** The host can
+write the integration code once and switch transports by
+swapping the RPC adapter. The only thing that changes is
+how the bytes get from the host to the engine.
+
+## 3. The 50-tool surface
+
+`tools/list` returns one entry per tool with `name`,
+`description`, and `inputSchema` (a JSON Schema). The full
+surface as of slice 6.7 + 5T.5 + 10 + 11:
+
+| Group | Count | Examples |
+|---|---|---|
+| Read | 12 | `lookup`, `entity_context`, `was_true_at`, `true_during`, `entities_present`, `events_during`, `timeline`, `ancestors_of`, `descendants_of`, `event_chain`, `lore_about`, `significance_of` |
+| List/expand | 6 | `list_lineage`, `list_offspring`, `location_hierarchy`, `expand_context`, `recent_changes`, `list_lore_sources` |
+| Read (consistency) | 5 | `run_consistency_check`, `latest_run`, `get_contradictions`, `get_anachronisms`, `get_orphans` |
+| Read (ontology) | 3 | `get_ontology_violations`, `list_ontology_rules`, `explain_violation` |
+| Write (entity) | 6 | `add_entity`, `add_relation`, `add_lore_source`, `set_alias`, `update_entity`, `delete_entity` |
+| Write (workflow) | 4 | `retcon`, `mark_verified`, `merge_entities`, `flag_for_review` |
+| Write (time) | 3 | `define_calendar`, `define_era`, `define_date` |
+| Template-generated | ~14 | `list_missions`, `get_mission`, `missions_by_target`, etc. (1 per `query:` in each template) |
+| Meta | 2 | `list_template_tools`, `reload_templates` |
+
+**The tool list is dynamic.** Every time the host calls
+`tools/list`, the engine returns the current registry
+including any templates that have been loaded. The host
+should re-fetch on `reload_templates` completion, not
+rely on a cached list.
+
+## 4. Read tools: the 24 read patterns
+
+The 24 read tools fall into 5 design-doc question types. The
+host's LLM caller should pick a type and follow the canonical
+tool sequence (see `docs/07-reasoning-harness.md` §"The five
+question types"):
+
+**Type 1 — Identity & description.** *"Who is Aldric?"*
+```
+lookup(query)
+entity_context(entity_id, at_time=current)
+expand_context(entity_id, hops=2, min_confidence=0.5)   # if sparse
+significance_of(entity_id)
+list_lineage(person)                                    # if Person
+```
+
+**Type 2 — Time-bounded fact check.** *"Was X true at T?"*
+```
+lookup(subject) + lookup(object)              # if not resolved
+was_true_at(RELATION, subject, object, at_time)
+cite(claim)                                    # if true
+true_during(RELATION, subject, object, era)    # if false
+```
+
+**Type 3 — World state at a time.** *"What was X like at T?"*
+```
+lookup(entity)
+entities_present(location, at_time)
+events_during(era, location=resolved)
+get_contradictions(subject=entity, severity=warn)
+```
+
+**Type 4 — Causal / chain reasoning.** *"Why did X happen?"*
+```
+lookup(event/event_chain_target)
+event_chain(event, depth=3)
+ancestors_of(person) + descendants_of(person)  # if Person
+get_anachronisms(entity=central)
+```
+
+**Type 5 — Open-ended narrative.** *"Tell me about X."*
+```
+lookup(entity)
+entity_context(entity)                          # state snapshot
+event_chain(entity, depth=3)
+lore_about(entity, type=prose, limit=10)
+narrate_arc(entity, style=chronicle)
+cite(claim)                                     # back the spine
+get_contradictions(subject=entity, severity=warn)
+```
+
+**Critical: every read tool returns a `sources` list.** A
+good integration module extracts the `sources` from each
+tool response and includes them in the final answer. A
+claim without a source is a hallucination (per the slice 7.2
+system prompt's Rule 2).
+
+**Critical: every read tool respects `at_time`.** A claim
+about "X was true" without a time scope is wrong by
+default. The host should pass `at_time` on every fact query;
+the engine's `current` reserved token resolves to the
+setting's `current_era`.
+
+## 5. Write tools: the 12 mutation patterns
+
+The 12 write tools (slice 10) are world-builder tools, not
+LLM tools. The integration module should generally **not**
+let the LLM call these — the LLM is a reader, not an editor.
+Allow them only behind an explicit confirmation flow (see
+`docs/19-retcon-policy.md` for the retcon workflow):
+
+```
+# 1. The world-builder wants to retcon "Roland married
+#    Aldric" — this is wrong, it was actually "allied with".
+add_relation(subject="Roland", relation="MARRIED", object="Aldric")   # or
+retcon(edge_id=..., new_object="Aldric", note="...")
+
+# 2. The world-builder wants to mark an edge as verified
+#    after a human read the source.
+mark_verified(edge_id=..., verified_by="world_builder", note="checked chronicles")
+```
+
+The two most important write tools are `retcon` and
+`mark_verified` (slice 10.2). Both stamp the edge with an
+audit log entry; both are append-only at the audit-log
+level, even when they mutate the edge itself. Every other
+write tool is a simpler `add_*` / `update_*` /
+`delete_*` variant.
+
+**Integration module must:** log every write tool call to
+the world-builder's audit log (timestamp, tool, args,
+caller). The audit log is the safety net — if a bad write
+ever lands, the roll-back path is to read the log.
+
+## 6. Template-generated tools: 14 polymorphic tools
+
+Slice 5T shipped 4 example templates (thieves-guild mission,
+war campaign, black-market lot, NPC secret knowledge). Each
+template has 3-4 `query:` blocks, each of which becomes an
+MCP tool at registration time. The total template-generated
+surface is ~14 tools, and it grows when the world-builder
+adds more `templates/*.yaml` files.
+
+The template tools are read-only; they run a Cypher query
+(allowlist-validated per slice 5T.3) against the
+`:DomainEntity` nodes the engine has ingested. The full
+killer demo walkthrough is in `docs/14-examples.md` §"Example
+5: Planes of existence" and the slice 5T ADR (`docs/adr/0012-typetemplate-polymorphism.md`).
+
+**Integration module must:** re-discover the tool list
+after every `reload_templates` call. A cached list from
+before a template was added will return
+`method_not_found` for the new tool.
+
+## 7. The 7 integration rules
+
+These are the rules a good integration module follows. They
+come from the system prompt (`prompts/system_prompt.md`,
+slice 7.2), the design docs, and the ADRs. The
+`tests/harness/test_questions.py` 50-question test set
+checks that the LLM's tool sequence satisfies them.
+
+**Rule 1 — Always `lookup` first.** Don't guess entity
+IDs. The cost of one `lookup` is 1ms; the cost of a wrong
+guess is a hallucinated answer.
+
+**Rule 2 — Cite every claim.** Every specific factual
+claim in the host's response must cite at least one source
+returned by a tool. A claim without a source is a
+hallucination.
+
+**Rule 3 — Time-window every fact query.** Pass `at_time`
+on every fact query (`was_true_at`, `true_during`, etc.).
+Default to `current` only when the user has not specified
+a time. Make the time explicit in the answer.
+
+**Rule 4 — Never resolve contradictions yourself.** If
+two sources disagree, surface both with both sources.
+The world-builder decides.
+
+**Rule 5 — `setting=` is mandatory for cross-setting
+questions.** When the user asks a question that could mix
+multiple settings, the host should pass `setting=<id>`
+explicitly. The default behaviour (no filter) is correct
+for single-setting worlds; the slice 6.5 cross-setting
+filter is the safe default for multi-setting worlds.
+
+**Rule 6 — Re-discover `tools/list` after `reload_templates`.**
+A cached list from before a template was added will
+return `method_not_found` for the new tool. The
+`reload_templates` tool's response is the contract that
+"the registry is now what you saw".
+
+**Rule 7 — For long historical arcs, check
+`latest_run()` first.** Stale consistency data is
+dangerous — a contradiction that the consistency engine
+found 2 weeks ago may have been resolved by a retcon
+since. `latest_run()` returns the timestamp and counts of
+the most recent consistency pass.
+
+## 8. The 6 failure modes the host must avoid
+
+These come from `docs/07-reasoning-harness.md` §"Failure
+modes the LLM must avoid" and are the same rules the
+host's LLM caller is told. The integration module should
+detect each and reject the response:
+
+**F1 — Answering from training data.** Symptom: the LLM
+says "Aldric is the heir to House Vyr" without calling
+`entity_context` first. The host's audit log should flag
+any tool-using turn that produces a specific fact claim
+without a corresponding tool call in the trace.
+
+**F2 — Resolving contradictions.** Symptom: the LLM
+picks one of two disagreeing sources. The host should
+reject any response that mentions a `is_disputed: true`
+edge and presents the answer as settled.
+
+**F3 — Confusing present and past.** Symptom: "Aldric
+rules Valdorn" without a time scope. The host should
+require `at_time` on every fact query and surface the
+time in the answer.
+
+**F4 — Treating `lore_verified: false` as canonical.**
+Symptom: the LLM cites an entity that only exists in
+encounter data and has no lore document. The host
+should mark provisional entities explicitly in the
+response.
+
+**F5 — Skipping the consistency check.** Symptom: the
+LLM answers a 5-generation family question without
+calling `get_anachronisms`. The host should make
+`get_anachronisms` mandatory for any question involving
+3+ entities or 1+ time hop.
+
+**F6 — Hallucinating tool results.** Symptom: the LLM
+says "the tool returned X" when the tool actually
+returned Y or nothing. The host should verify every
+quoted tool result against the actual tool return
+(cross-check the trace).
+
+## 9. The 4 metrics a good integration module measures
+
+A "good integration module" is one that catches its own
+regressions. The 4 metrics (slice 7.3) are the
+regression net:
+
+**Tool-selection accuracy** (per type). What fraction
+of the LLM's tool sequences match the canonical sequence
+for each question type. AC 7.3: ≥80% on the 50-question
+test set.
+
+**Citation rate.** What fraction of claims cite ≥1
+source. AC 7.4: ≥90%.
+
+**Hallucination rate.** Average number of unsourced
+facts per question. AC 7.5: <5%.
+
+**Time-window violation rate.** What fraction of answers
+made claims outside the question's `at_time` window.
+AC 7.6: <5%.
+
+The integration module should run the harness
+(`tests/harness/questions.json`) before each release and
+fail the build if any metric regresses. The
+`scripts/harness/run_questions.py` runner (slice 7.3,
+Track B — needs `$OLLAMA_API_KEY`) is the canonical
+way to measure.
+
+## 10. Adding a new domain type via templates/
+
+The killer demo (slice 5T.5). A new domain type is one
+YAML file away. Walkthrough:
+
+```bash
+# 1. Drop a template YAML
+cat > lore_engine_poc/seed/templates/npc_quirk.yaml <<'EOF'
+template:
+  id: npc_quirk
+  version: 1.0.0
+  label: NPCQuirk
+  description: A persistent behavioral quirk for an NPC.
+
+entity:
+  properties:
+    - {name: trigger, type: string, required: true}
+    - {name: response, type: string, required: true}
+    - {name: severity, type: enum, values: [minor, major, defining]}
+
+relations:
+  - {to_type: Person, type: QUIRK_OF}
+
+queries:
+  - id: list_quirks
+    description: List every quirk, sorted by severity.
+    cypher: |
+      MATCH (n:DomainEntity {type: 'NPCQuirk'})
+      RETURN n ORDER BY n.severity
+    parameters: {}
+
+  - id: quirks_of
+    description: All quirks of a given NPC.
+    cypher: |
+      MATCH (n:DomainEntity {type: 'NPCQuirk'})-[:QUIRK_OF]->(p {name: $name})
+      RETURN n
+    parameters:
+      name: {type: string, required: true}
+EOF
+
+# 2. Reload templates (no restart)
+python3 scripts/01_ingest.py --reload-templates --skip-cognee
+
+# 3. Ingest an instance
+cat > lore_engine_poc/seed/instances/aldric_quirks.yaml <<'EOF'
+template_id: npc_quirk
+instances:
+  - name: Aldric's coin flip
+    properties:
+      trigger: asked for a side
+      response: flips a Valdorni silver piece; calls in the air
+      severity: major
+    relations:
+      - {to: Aldric Raventhorne, type: QUIRK_OF}
+EOF
+
+python3 scripts/01_ingest.py --ingest-instance \
+    lore_engine_poc/seed/instances/aldric_quirks.yaml --skip-cognee
+
+# 4. Use the generated tool
+python3 scripts/05_mcp_server.py --port 18765 &
+curl -s http://127.0.0.1:18765/mcp \
+  -H 'Content-Type: application/json' \
+  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
+       "params":{"name":"quirks_of",
+                 "arguments":{"name":"Aldric Raventhorne"}}}'
+```
+
+The 2 new tools (`list_quirks`, `quirks_of`) appeared with
+no Python change and no engine restart. The same pattern
+works for any domain type the world-builder wants to model.
+
+## 11. Worked end-to-end example
+
+A 30-line host that asks "Was House Vyr allied with the
+Crimson Pact in 340 TA?" and gets a cited answer back:
+
+```python
+import json, subprocess, sys
+
+server = subprocess.Popen(
+    [sys.executable, "-m", "lore_engine_poc.mcp_stdio_entry"],
+    stdin=subprocess.PIPE, stdout=subprocess.PIPE,
+    text=True, bufsize=1,
+)
+
+def rpc(method, params=None, id_=1):
+    msg = {"jsonrpc": "2.0", "id": id_, "method": method,
+           "params": params or {}}
+    server.stdin.write(json.dumps(msg) + "\n")
+    server.stdin.flush()
+    return json.loads(server.stdout.readline())
+
+# 1. Initialize + discover.
+rpc("initialize", id_=1)
+tools = {t["name"]: t for t in rpc("tools/list", id_=2)["result"]["tools"]}
+
+# 2. Resolve both entities (Rule 1).
+rpc("tools/call",
+    params={"name": "lookup", "arguments": {"query": "House Vyr"}}, id_=3)
+rpc("tools/call",
+    params={"name": "lookup", "arguments": {"query": "Crimson Pact"}}, id_=4)
+
+# 3. Time-bounded fact query (Rule 3).
+fact = rpc("tools/call",
+    params={"name": "was_true_at",
+            "arguments": {"relation": "ALLIED_WITH",
+                          "subject": "House Vyr",
+                          "object": "Crimson Pact",
+                          "at_time": "3rd_age.year_340"}},
+    id_=5)["result"]
+
+# 4. Render the answer with citations (Rule 2).
+if fact["was_true"]:
+    answer = (f"Yes — House Vyr was allied with the Crimson Pact "
+              f"from {fact['valid_from']} to {fact['valid_until']}. "
+              f"Sources: {', '.join(fact['sources'])}")
+else:
+    answer = ("No — they were not allied at that time. "
+              f"Tools examined: {fact['edges_examined']}")
+
+print(answer)
+```
+
+Expected output (Mardonari codex, slice 0 fixture):
+```
+Yes — House Vyr was allied with the Crimson Pact
+from 3rd_age.year_312 to 3rd_age.year_345.
+Sources: chronicles-vyr.md, pact-treaties.md
+```
+
+## 12. Where to go next
+
+- [`integration-module-contract.md`](./integration-module-contract.md) — the
+  formal contract a host module must satisfy to be "good"
+- [`docs/00-overview.md`](./00-overview.md) — engine overview
+- [`docs/05-mcp-tools.md`](./05-mcp-tools.md) — the full tool catalog
+- [`docs/07-reasoning-harness.md`](./07-reasoning-harness.md) — the
+  5 question types and 6 failure modes
+- [`docs/11-extensibility.md`](./11-extensibility.md) — the
+  TypeTemplate polymorphic layer
+- [`docs/17-planes.md`](./17-planes.md) — the Setting/Plane
+  model
+- [`docs/19-retcon-policy.md`](./19-retcon-policy.md) —
+  retcon + mark_verified audit policy
+- [`docs/20-multi-setting-policy.md`](./20-multi-setting-policy.md) —
+  cross-setting rules
+- [`docs/21-quickstart.md`](./21-quickstart.md) — 5-minute
+  setup
+- [`docs/adr/`](./adr/) — the 13 ADRs that pin the design
+  decisions
+- `prompts/system_prompt.md` in the poc repo — the system
+  prompt the LLM caller is told
+- `tests/harness/questions.yaml` in the poc repo — the
+  50-question regression net
--- a/docs/integration-module-contract.md
+++ b/docs/integration-module-contract.md
@@ -0,0 +1,358 @@
+# Integration Module Contract
+
+**Audience:** authors of host modules — LLM agents, chat UIs,
+IDE plugins, Discord bots, CLIs, anything that wraps the Lore
+Engine's MCP server.
+
+**What this doc is:** the formal contract a host module must
+satisfy. The 7 rules in [`INTEGRATION.md`](./INTEGRATION.md) are
+the same rules; this doc is the version that's machine-checkable
+(every rule has a test, every test is in `tests/harness/`).
+
+**The contract is one-way.** The engine promises a fixed wire
+protocol (JSON-RPC over stdio or HTTP) and a fixed tool surface
+(name, description, JSON Schema per tool). The host promises to
+satisfy these rules; if it doesn't, the engine will produce
+wrong answers and the LLM caller will hallucinate.
+
+## The contract, version 1.2
+
+This contract is versioned alongside the system prompt
+(`prompts/system_prompt.md`, slice 7.2). When the prompt
+version bumps, this contract bumps; old hosts that
+satisfy v1.0 may not satisfy v1.2.
+
+| Rule | Test | What the host must do |
+|---|---|---|
+| R1 — Discover | `test_7_2_registry_well_formed` | Re-fetch `tools/list` after every `reload_templates` |
+| R2 — Lookup first | `test_7_1_every_question_has_expected_tools` | Call `lookup` before any `entity_context` / `was_true_at` / `entity_about` |
+| R3 — Cite | (host-side audit) | Include the `sources` from every tool response in the final answer |
+| R4 — Time-window | (host-side audit) | Pass `at_time` on every fact query; surface the time in the answer |
+| R5 — Don't resolve contradictions | `test_7_2_prompt_citation_rule_present` | Reject any response that mentions `is_disputed: true` and presents it as settled |
+| R6 — Setting filter | (host-side audit) | Pass `setting=<id>` when the user asks a cross-setting question |
+| R7 — Reload contract | (host-side audit) | Treat `reload_templates`'s response as the new registry state |
+
+## Rule 1 — Discover (test: `test_7_2_registry_well_formed`)
+
+**What:** the host's tool registry must reflect the engine's
+current tool list at the time of the call.
+
+**Why:** templates are hot-reloadable. A host that caches the
+tool list from a previous `tools/list` will call tools that
+no longer exist (after a template was removed) or miss tools
+that were just added (after a template was added).
+
+**Test:** `test_7_2_registry_well_formed` (slice 7.2) pins the
+*server-side* contract — the registry must be well-formed.
+The *client-side* contract is host-side: the host must
+re-fetch `tools/list` after every `reload_templates` call.
+
+**Failure mode:** the host calls `get_mission` after
+`thieves_guild_mission.yaml` was removed. The engine returns
+`method_not_found`; the host's LLM caller hallucinates an
+answer.
+
+**Mitigation:** every `reload_templates` response includes the
+new tool list. The host should store it as the canonical
+"current tools" and re-resolve on every dispatch.
+
+## Rule 2 — Lookup first (test: `test_7_1_every_question_has_expected_tools`)
+
+**What:** every question that resolves to an entity must
+call `lookup` before any other read tool.
+
+**Why:** entity names are ambiguous. "The dagger" is one of
+many; the LLM cannot know which one. `lookup` returns a
+canonical id (or a disambiguation list). The LLM picks one
+(or asks the user).
+
+**Test:** the 50-question set in
+`tests/harness/questions.yaml` requires `lookup` to appear
+in the canonical tool sequence for every question that names
+an entity.
+
+**Failure mode:** the LLM guesses the entity. The guess
+resolves to the wrong id. The tool returns "unknown entity"
+or a wrong entity's context. The LLM hallucinates an answer.
+
+**Mitigation:** the host's LLM caller must include `lookup`
+in every Type 1-4 question's tool sequence. The
+`test_7_1_every_question_has_expected_tools` test pins this
+on the server side; the host-side pin is "include `lookup`
+or your test suite fails".
+
+## Rule 3 — Cite every claim (test: `test_7_2_prompt_citation_rule_present`)
+
+**What:** every specific factual claim in the host's
+response must cite at least one source returned by a tool.
+
+**Why:** a claim without a source is a hallucination. The
+engine returns a `sources` list on every edge-bearing tool
+response; the host's job is to forward those sources
+through to the final answer.
+
+**Test:** `test_7_2_prompt_citation_rule_present` pins the
+*server-side* contract — the system prompt must contain
+the citation rule. The *client-side* pin is the citation
+rate metric (AC 7.4): ≥90% of claims cite ≥1 source.
+
+**Failure mode:** the LLM says "Aldric is the heir to House
+Vyr" with no source. The user can't verify; the answer
+might be from training data, not the codex.
+
+**Mitigation:** every tool response includes a `sources`
+list. The host should pass this list through to the LLM
+caller and require the LLM to include ≥1 source per claim
+in its response. A claim without a source is a
+hallucination and should be rejected.
+
+## Rule 4 — Time-window every fact query (test: `test_7_2_prompt_time_window_rule_present`)
+
+**What:** every fact query must pass `at_time`, and the
+host's response must surface the time in the answer.
+
+**Why:** "Was X true?" is incomplete without "When?". The
+codex is time-bounded; an answer about the past presented
+as the present is wrong by default.
+
+**Test:** `test_7_2_prompt_time_window_rule_present` pins the
+server-side rule. The client-side pin is the time-window
+violation rate metric (AC 7.6): <5% of answers make claims
+outside the question's `at_time`.
+
+**Failure mode:** "Aldric rules Valdorn" (he died in 360
+TA; the campaign is in 380 TA). The LLM should have
+scoped to 350 TA or earlier.
+
+**Mitigation:** the host's LLM caller should pass `at_time`
+on every `was_true_at`, `true_during`, `entities_present`,
+and `events_during` call. If the user didn't specify a
+time, default to the setting's `current_era`.
+
+## Rule 5 — Don't resolve contradictions (test: `test_7_2_prompt_citation_rule_present`)
+
+**What:** the host must surface contradictions, not
+resolve them.
+
+**Why:** two sources disagree. The LLM cannot know which
+is right — the world-builder decides. The engine marks
+the edge as `is_disputed: true` and points at the
+disagreeing edges via `disputed_with`. The host's job
+is to forward both sides.
+
+**Test:** the slice 2 consistency engine tests pin the
+server-side rule (the engine returns disputed edges
+with both sources). The client-side rule is "any
+response that mentions `is_disputed: true` and presents
+the answer as settled is a bug".
+
+**Failure mode:** the LLM picks the more recent source.
+The world-builder's source (older, authoritative) is
+silently dropped. The user gets a wrong answer.
+
+**Mitigation:** the host's LLM caller is told (in
+`prompts/system_prompt.md` Rule 4) to never resolve
+contradictions. The host should also reject any
+response that mentions `is_disputed: true` and presents
+the answer as settled — that's the host's enforcement
+layer for the rule.
+
+## Rule 6 — Setting filter for cross-setting questions
+
+**What:** when the user asks a question that could mix
+multiple settings, the host must pass `setting=<id>`
+explicitly.
+
+**Why:** the slice 6.5 setting filter exists exactly to
+prevent cross-setting bleed. A query for "events in the
+3rd Age" should not return events from both `mardonari`
+and `the_wild_dream` if the user only meant one.
+
+**Test:** `test_6_5_setting_filter_on_was_true_at` (slice
+6.5) pins the server-side rule — the filter is
+additive, `setting=None` (default) keeps the
+single-setting behaviour. The client-side rule is "any
+question whose answer could cross settings should
+pass `setting=<id>`".
+
+**Failure mode:** the user asks "What happened in the
+3rd Age?" and the LLM returns events from both settings
+without distinction. The user doesn't know which
+setting each event belongs to.
+
+**Mitigation:** the host should track the "active
+setting" in the conversation context. When the user
+mentions a setting name (e.g. "in Mardonari"), the host
+sets the active setting. When the user asks a question
+without a setting, the host either asks "which
+setting?" or uses the conversation's active setting
+explicitly.
+
+## Rule 7 — Reload contract
+
+**What:** after every `reload_templates` call, the host
+must treat the response's tool list as the new canonical
+state.
+
+**Why:** the template registry may have added, removed,
+or modified tools. A host that holds a stale tool list
+will dispatch to non-existent tools or miss new ones.
+
+**Test:** no automated test on the server side (the
+server's `reload_templates` always returns the new list).
+The client-side test is "after every `reload_templates`
+call, re-fetch `tools/list` and re-validate the host's
+tool registry".
+
+**Failure mode:** the world-builder adds a new template
+and calls `reload_templates`. The host doesn't re-fetch
+the tool list. The LLM caller tries to call
+`list_missions` and gets `method_not_found`. The LLM
+hallucinates an answer.
+
+**Mitigation:** the host's `reload_templates` handler
+should:
+
+1. Call `reload_templates` on the engine.
+2. Re-call `tools/list`.
+3. Replace the local tool registry.
+4. Re-validate any in-flight conversations (or surface
+   a "tools have changed" notice to the user).
+
+## Acceptance criteria
+
+A host module is "good" when it satisfies all 7 rules.
+The minimum acceptance suite:
+
+```python
+# test_host_compliance.py
+import json, subprocess, sys
+
+def test_host_uses_lookup_first():
+    """Every Type 1-4 question's tool trace must include lookup."""
+    ...
+
+def test_host_cites_every_claim():
+    """Every claim in the response must include ≥1 source."""
+    ...
+
+def test_host_time_windows_every_fact_query():
+    """Every fact query must pass at_time; the response surfaces it."""
+    ...
+
+def test_host_does_not_resolve_contradictions():
+    """Any response mentioning is_disputed and presenting as settled is rejected."""
+    ...
+
+def test_host_passes_setting_for_cross_setting():
+    """Cross-setting questions must pass setting=<id> explicitly."""
+    ...
+
+def test_host_refetches_tools_list_on_reload():
+    """After reload_templates, the host's tool registry must match the engine's."""
+    ...
+```
+
+The full harness (`tests/harness/test_questions.py` +
+`tests/harness/test_system_prompt.py` + the slice 7.3
+runner) is the regression net. The host is "good" when
+its 50-question run scores:
+
+- tool-selection accuracy ≥80%
+- citation rate ≥90%
+- hallucination rate <5%
+- time-window violation rate <5%
+
+## What the engine promises
+
+The engine is a fixed-target service from the host's
+point of view. The promises:
+
+- **Wire protocol is JSON-RPC 2.0** (per the MCP
+  specification). Every tool call is a single
+  request/response. No streaming, no async.
+- **Tool names are stable** within a major version.
+  A tool's `name` and `inputSchema` are versioned
+  together; a host that calls a v1.2 tool against a
+  v1.1 engine gets `invalid_params` (schema mismatch)
+  or `method_not_found` (tool removed).
+- **Tool responses are JSON objects with a stable
+  shape.** The `sources`, `at_time`, `valid_from`,
+  `valid_until`, `is_disputed` fields are guaranteed.
+  New fields may be added in minor versions; the host
+  should ignore unknown fields.
+- **Errors are JSON-RPC standard.** Invalid params,
+  method not found, internal error — each maps to a
+  standard `code` (per the JSON-RPC spec) and a
+  human-readable `message`. The host can branch on
+  the code without parsing the message.
+- **Idempotency:** `lookup`, `entity_context`,
+  `was_true_at`, and other read tools are pure. The
+  same arguments always return the same response (modulo
+  graph updates). Write tools are idempotent only when
+  the args are the same — re-running `add_entity` with
+  the same args is a no-op; re-running with different
+  args is an error.
+- **Hot-reload:** the engine supports `reload_templates`
+  at any time. The response is the new tool list. The
+  host can call this between conversations or even
+  mid-conversation (the active conversation's
+  `tools/list` call will return the new list).
+
+## Versioning
+
+| Engine version | Contract version | Notes |
+|---|---|---|
+| v1.0 (slice 0–3) | 1.0 | Initial tool surface (12 read tools) |
+| v1.1 (slice 4–5T) | 1.1 | + 12 read tools + 14 template tools = 38 |
+| v1.2 (slice 6–11) | 1.2 | + 6 read tools + 4 write tools + setting filter + Streamable HTTP transport |
+| v2.0 (planned) | 2.0 | Cypher-write templates, cross-LLM benchmarks, UI |
+
+The contract version is the same as the engine's schema
+version (the `schema_version` field on the `Setting`
+node, per slice 6.4). Hosts that target v1.2 will not
+work against a v1.1 engine (missing tools) or a v2.0
+engine (renamed/removed tools). The mismatch surfaces
+as a `method_not_found` or `invalid_params` error on
+the first call.
+
+## Out of scope (deferred)
+
+- **Streaming.** The engine does not support
+  `tools/call` with server-sent events. Long-running
+  queries (e.g. cross-codex searches) block until
+  complete. Streaming is a v2.0 follow-up.
+- **Authentication.** The stdio transport is local-only
+  (no auth). The Streamable HTTP transport runs in a
+  Docker container with a 1 MiB body cap and a
+  loopback bind by default; production deployments
+  should add a reverse proxy with auth.
+- **Multi-tenant.** A single engine instance holds one
+  graph. Multi-tenant (one engine, multiple worlds)
+  is a v2.0 follow-up; the v1.2 model is
+  multi-*setting* within one world.
+- **UI for failure-mode review.** The slice 7.4
+  red-team suite produces a failure-mode log; a UI to
+  review it is a v2.0 follow-up.
+
+## Cross-references
+
+- [`INTEGRATION.md`](./INTEGRATION.md) — the practical
+  how-to guide (companion to this contract)
+- [`docs/07-reasoning-harness.md`](./07-reasoning-harness.md) — the
+  5 question types and 6 failure modes
+- [`docs/05-mcp-tools.md`](./05-mcp-tools.md) — the full
+  tool catalog with response shapes
+- [`docs/19-retcon-policy.md`](./19-retcon-policy.md) —
+  the retcon + mark_verified audit policy
+- [`docs/20-multi-setting-policy.md`](./20-multi-setting-policy.md) —
+  the cross-setting rules
+- [`docs/adr/0011-graph-backend-protocol.md`](./adr/0011-graph-backend-protocol.md) —
+  the `GraphBackend` Protocol (informs the engine's
+  substrate promise)
+- [`docs/adr/0012-typetemplate-polymorphism.md`](./adr/0012-typetemplate-polymorphism.md) —
+  the slice 5T TypeTemplate layer
+- `prompts/system_prompt.md` in the poc repo — the
+  system prompt the LLM caller is told
+- `tests/harness/questions.yaml` in the poc repo — the
+  50-question regression net