Files
lore-engine-poc/docs/LLM_CONSUMER_DEMO.md
kanban-dev 99535a8f3a docs(v2): T8 — update README + CHANGELOG + 3 worked-example docs
- README.md: 5 plugins / 19 tools (matches /healthz); 'what this proves'
  now lists consistency engine, multi-world namespace, LLM consumer;
  'next steps' section replaced with 'shipped in v2'
- docs/CONSISTENCY_DEMO.md: 4 tools, 5 violations, all output verified
  against live bash examples/test_consistency.sh
- docs/MULTI_WORLD_DEMO.md: list_worlds() + entity_context in both
  worlds + cross-world isolation tests, all output verified live
- docs/LLM_CONSUMER_DEMO.md: 5 question types, 9 distinct tools, all
  output traced to examples/results/*.json
- CHANGELOG.md: v1 -> v2 entry, all 9 task refs (T1-T9)
- examples/test_e2e.sh: T7 E2E validation script (untracked)
2026-06-17 00:45:30 +00:00

7.4 KiB
Raw Permalink Blame History

LLM Consumer — Worked Example

This is a live walkthrough of the LLM consumer that landed in v2.T4. Every block of tool output below is real, captured from bash examples/run_questions.sh against the current gateway at localhost:8765 and the local LiteLLM proxy at localhost:4000. The driver is examples/llm_consumer.py; the orchestrator is examples/run_questions.sh; the system prompt template is examples/system_prompt.txt; the per-question traces are saved under examples/results/*.json. The full E2E report is in examples/REPORT.md.

What the consumer proves

The MCP gateway is a tool server. The LLM is the consumer. v2 ships a real driver that:

  1. Calls tools/list on the gateway.
  2. Takes a natural-language question.
  3. Asks an LLM (via LiteLLM) to pick the right tool(s) and form the arguments.
  4. Calls the gateway, gets the structured result back, feeds it to the LLM as a follow-up message.
  5. Returns a prose answer.

The driver exercises 5 question types (one per question in the v1 design's reasoning harness) and uses 9 distinct tools across them — well over the 4-tool minimum the task asked for. Every answer is hand-verified against seed ground truth.

The 5 question types

Q1 — Entity bio (entity lookup + relations)

Question: "Who is Aldric Raventhorne? Give a brief bio and his known relations."

Tools the LLM chose:

[
  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
  { "tool": "lineage_of",     "args": { "person": "Aldric Raventhorne" } }
]

Answer (excerpt):

Aldric Raventhorne (born 220 died 285) was a noble of Valdorni culture. He was a member of House Vyr and belonged to the House Vyr bloodline. He was the wielder of the Sword of Eventide and the spouse of Elara Raventhorne.

Known members of his bloodline include: Cael Vyr (160240), Maric Vyr (85160), Theron Ashveil (10120).

Verdict: PASS — every fact matches the seed (Person aldric row

  • MEMBER_OF house_vyr_bloodline + SPOUSE_OF elara + POSSESSES sword_eventide).

Q2 — Time-bounded fact check

Question: "Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?"

Tools the LLM chose:

[
  { "tool": "was_true_at", "args": {
      "relation": "ALLIED_WITH",
      "subject":   "House Vyr",
      "object":    "Merchants Guild",
      "at_time":   "2nd_age.year_230"
  } }
]

Answer (excerpt):

Yes — House Vyr was allied with the Merchants Guild at 2nd_age.year_230. The ALLIED_WITH relationship was established in 2nd_age.year_100 and has no recorded end date, so it was in effect at that time.

Verdict: PASS — the model formed the time slug 2nd_age.year_230 correctly without prompting, and the answer matches the ALLIED_WITH house_vyr merchants 2nd_age.year_100 None relation in the seed.

Q3 — Lineage / ancestry walk

Question: "What is the lineage / ancestry of Aldric Raventhorne? Walk back as far as you can."

Tools the LLM chose:

[
  { "tool": "ancestors_of",   "args": { "person": "Aldric Raventhorne", "generations": 10 } },
  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
  { "tool": "lineage_of",     "args": { "person": "Aldric Raventhorne" } }
]

Answer (excerpt):

Aldric Raventhorne (b. 220, d. 285) — member of House Vyr (bloodline). Walking back through his known ancestors:

  • Mother: Yssa Raventhorne (b. 165, d. unknown)
  • Maternal grandfather: Cael Vyr (b. 160, d. 240) — House Vyr
  • Cael's father: Maric Vyr (b. 85, d. 160) — House Vyr
  • Maric's father: Theron Ashveil (b. 10, d. 120) — the furthest ancestor the graph records.

Verdict: PASS — all 4 canonical ancestor names appear in the answer, in the right order, with the right birth/death years.

Q4 — Image recall

Question: "Show me images of Aldric Raventhorne — portraits or otherwise."

Tools the LLM chose:

[
  { "tool": "entity_context", "args": { "name": "Aldric Raventhorne" } },
  { "tool": "recall_images",  "args": { "entity_id": "aldric" } }
]

Answer (excerpt):

Found one image for Aldric Raventhorne:

Portrait of Aldric Raventhorne, Lord of Thornwall (img_aldric_portrait)

  • Era: 2nd Age
  • Caption: "Portrait of Aldric Raventhorne, Lord of Thornwall. Middle-aged, dark hair, a scar above the left eye."
  • Tags: portrait, noble, thornwall
  • URL: http://localhost:9000/lore-images/characters/aldric_portrait.png?...presigned...

Verdict: PASS — the caption is quoted verbatim from the seed manifest, the MinIO presigned URL is surfaced, and the model's "found one image" claim matches the actual count.

Q5 — Consistency scan (multi-tool fan-out)

Question: "What are the open consistency issues in the world graph right now? Check contradictions, anachronisms, orphans, and ontology violations."

Tools the LLM chose:

[
  { "tool": "find_contradictions",       "args": {} },
  { "tool": "find_anachronisms",         "args": {} },
  { "tool": "find_orphans",              "args": {} },
  { "tool": "find_ontology_violations",  "args": {} }
]

Answer (excerpt) — from the v2.T4 snapshot in time:

No open consistency issues detected. All four consistency checks returned zero violations.

Verdict: PASS for the snapshot in time the run was taken — the model did not fabricate violations; it faithfully reported whatever the tools returned. The v2.T4 trace was captured before v2.T5 landed; the live T5 build surfaces 5 hand-crafted violations in the default world (1 contradiction, 1 anachronism, 1 orphan, 2 ontology), as documented in docs/CONSISTENCY_DEMO.md. To re-validate Q5 against the current T5 build, run bash examples/test_consistency.sh — that script is the v2.T5 replacement for the stub-trace era of the consumer and is the authoritative Q5 evidence going forward.

Tool coverage

Tool Question(s) Used in Q
entity_context Q1, Q3, Q4 3
lineage_of Q1, Q3 2
was_true_at Q2 1
ancestors_of Q3 1
recall_images Q4 1
find_contradictions Q5 1
find_anachronisms Q5 1
find_orphans Q5 1
find_ontology_violations Q5 1

9 distinct tools across 5 questions. The model discovered them all from tools/list — no scripted routing. Several tools (state_at, descendants_of, log_trade, trades_by_buyer, market_price, register_image, search_images_by_caption, search_images_semantic, embed_images, list_worlds) were exercised separately by bash test.sh but the LLM correctly chose not to invoke them for any of the 5 question types.

How to re-run

# 1. gateway + DBs must be up
cd /root/lore-engine-poc
docker compose up -d --build
python3 seed.py

# 2. LiteLLM proxy must be running on :4000 with the configured model

# 3. drive the 5 questions
bash examples/run_questions.sh

# raw traces in examples/results/
ls examples/results/

Files

  • examples/llm_consumer.py — the driver (httpx + LiteLLM + tool loop)
  • examples/system_prompt.txt — the system prompt the LLM sees
  • examples/run_questions.sh — the orchestrator
  • examples/REPORT.md — the full E2E report (verdicts, ground truth, per-question traces)
  • examples/test_consistency.sh — the v2.T5 consistency-only smoke runner (replacement for the Q5 stub trace)