- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves' now lists consistency engine, multi-world namespace, LLM consumer; 'next steps' section replaced with 'shipped in v2' - docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified against live bash examples/test_consistency.sh - docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both worlds + cross-world isolation tests, all output verified live - docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all output traced to examples/results/*.json - CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9) - examples/test_e2e.sh: T7 E2E validation script (untracked)
7.4 KiB
LLM Consumer — Worked Example
This is a live walkthrough of the LLM consumer that landed in v2.T4.
Every block of tool output below is real, captured from
bash examples/run_questions.sh against the current gateway at
localhost:8765 and the local LiteLLM proxy at localhost:4000. The
driver is examples/llm_consumer.py; the orchestrator is
examples/run_questions.sh; the system prompt template is
examples/system_prompt.txt; the per-question traces are saved under
examples/results/*.json. The full E2E report is in
examples/REPORT.md.
What the consumer proves
The MCP gateway is a tool server. The LLM is the consumer. v2 ships a real driver that:
- Calls
tools/liston the gateway. - Takes a natural-language question.
- Asks an LLM (via LiteLLM) to pick the right tool(s) and form the arguments.
- Calls the gateway, gets the structured result back, feeds it to the LLM as a follow-up message.
- Returns a prose answer.
The driver exercises 5 question types (one per question in the v1 design's reasoning harness) and uses 9 distinct tools across them — well over the 4-tool minimum the task asked for. Every answer is hand-verified against seed ground truth.
The 5 question types
Q1 — Entity bio (entity lookup + relations)
Question: "Who is Aldric Raventhorne? Give a brief bio and his known relations."
Tools the LLM chose:
[
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
{ "tool": "lineage_of", "args": { "person": "Aldric Raventhorne" } }
]
Answer (excerpt):
Aldric Raventhorne (born 220 – died 285) was a noble of Valdorni culture. He was a member of House Vyr and belonged to the House Vyr bloodline. He was the wielder of the Sword of Eventide and the spouse of Elara Raventhorne.
Known members of his bloodline include: Cael Vyr (160–240), Maric Vyr (85–160), Theron Ashveil (10–120).
Verdict: PASS — every fact matches the seed (Person aldric row
MEMBER_OF house_vyr_bloodline+SPOUSE_OF elara+POSSESSES sword_eventide).
Q2 — Time-bounded fact check
Question: "Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"
Tools the LLM chose:
[
{ "tool": "was_true_at", "args": {
"relation": "ALLIED_WITH",
"subject": "House Vyr",
"object": "Merchants Guild",
"at_time": "2nd_age.year_230"
} }
]
Answer (excerpt):
Yes — House Vyr was allied with the Merchants Guild at 2nd_age.year_230. The ALLIED_WITH relationship was established in 2nd_age.year_100 and has no recorded end date, so it was in effect at that time.
Verdict: PASS — the model formed the time slug 2nd_age.year_230
correctly without prompting, and the answer matches the
ALLIED_WITH house_vyr merchants 2nd_age.year_100 None relation in
the seed.
Q3 — Lineage / ancestry walk
Question: "What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."
Tools the LLM chose:
[
{ "tool": "ancestors_of", "args": { "person": "Aldric Raventhorne", "generations": 10 } },
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
{ "tool": "lineage_of", "args": { "person": "Aldric Raventhorne" } }
]
Answer (excerpt):
Aldric Raventhorne (b. 220, d. 285) — member of House Vyr (bloodline). Walking back through his known ancestors:
- Mother: Yssa Raventhorne (b. 165, d. unknown)
- Maternal grandfather: Cael Vyr (b. 160, d. 240) — House Vyr
- Cael's father: Maric Vyr (b. 85, d. 160) — House Vyr
- Maric's father: Theron Ashveil (b. 10, d. 120) — the furthest ancestor the graph records.
Verdict: PASS — all 4 canonical ancestor names appear in the answer, in the right order, with the right birth/death years.
Q4 — Image recall
Question: "Show me images of Aldric Raventhorne — portraits or otherwise."
Tools the LLM chose:
[
{ "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
{ "tool": "recall_images", "args": { "entity_id": "aldric" } }
]
Answer (excerpt):
Found one image for Aldric Raventhorne:
Portrait of Aldric Raventhorne, Lord of Thornwall (
img_aldric_portrait)
- Era: 2nd Age
- Caption: "Portrait of Aldric Raventhorne, Lord of Thornwall. Middle-aged, dark hair, a scar above the left eye."
- Tags: portrait, noble, thornwall
- URL:
http://localhost:9000/lore-images/characters/aldric_portrait.png?...presigned...
Verdict: PASS — the caption is quoted verbatim from the seed manifest, the MinIO presigned URL is surfaced, and the model's "found one image" claim matches the actual count.
Q5 — Consistency scan (multi-tool fan-out)
Question: "What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."
Tools the LLM chose:
[
{ "tool": "find_contradictions", "args": {} },
{ "tool": "find_anachronisms", "args": {} },
{ "tool": "find_orphans", "args": {} },
{ "tool": "find_ontology_violations", "args": {} }
]
Answer (excerpt) — from the v2.T4 snapshot in time:
No open consistency issues detected. All four consistency checks returned zero violations.
Verdict: PASS for the snapshot in time the run was taken — the
model did not fabricate violations; it faithfully reported whatever
the tools returned. The v2.T4 trace was captured before v2.T5 landed;
the live T5 build surfaces 5 hand-crafted violations in the default
world (1 contradiction, 1 anachronism, 1 orphan, 2 ontology), as
documented in docs/CONSISTENCY_DEMO.md. To re-validate Q5 against
the current T5 build, run bash examples/test_consistency.sh — that
script is the v2.T5 replacement for the stub-trace era of the consumer
and is the authoritative Q5 evidence going forward.
Tool coverage
| Tool | Question(s) | Used in Q |
|---|---|---|
entity_context |
Q1, Q3, Q4 | 3 |
lineage_of |
Q1, Q3 | 2 |
was_true_at |
Q2 | 1 |
ancestors_of |
Q3 | 1 |
recall_images |
Q4 | 1 |
find_contradictions |
Q5 | 1 |
find_anachronisms |
Q5 | 1 |
find_orphans |
Q5 | 1 |
find_ontology_violations |
Q5 | 1 |
9 distinct tools across 5 questions. The model discovered
them all from tools/list — no scripted routing. Several tools
(state_at, descendants_of, log_trade, trades_by_buyer,
market_price, register_image, search_images_by_caption,
search_images_semantic, embed_images, list_worlds) were
exercised separately by bash test.sh but the LLM correctly chose
not to invoke them for any of the 5 question types.
How to re-run
# 1. gateway + DBs must be up
cd /root/lore-engine-poc
docker compose up -d --build
python3 seed.py
# 2. LiteLLM proxy must be running on :4000 with the configured model
# 3. drive the 5 questions
bash examples/run_questions.sh
# raw traces in examples/results/
ls examples/results/
Files
examples/llm_consumer.py— the driver (httpx + LiteLLM + tool loop)examples/system_prompt.txt— the system prompt the LLM seesexamples/run_questions.sh— the orchestratorexamples/REPORT.md— the full E2E report (verdicts, ground truth, per-question traces)examples/test_consistency.sh— the v2.T5 consistency-only smoke runner (replacement for the Q5 stub trace)