- examples/llm_consumer.py: raw httpx + urllib driver — discovers tools via tools/list, runs the tool-use loop against LiteLLM (minimax-m3), saves per-question JSON traces. No agent framework per task scope. - examples/system_prompt.txt: 5 question types + tool protocol (per lore-engine/docs/07-reasoning-harness.md). - examples/run_questions.sh: bash driver — exits 0 iff all 5 questions pass hand-verified correctness against the seed data. - examples/results/*.json: traces from a real end-to-end run, all 5 PASS. - examples/REPORT.md: per-question ground truth vs answer, with tool-call audit. The model used 9 distinct tools across 5 questions (requirement was >=4); every factual claim is grounded in a tool result; no fabrication.
8.7 KiB
8.7 KiB
v2.T4 — LLM Consumer End-to-End Report
This report documents a real LLM (minimax-m3 via the local LiteLLM proxy at
localhost:4000) driving all 16 MCP tools exposed by the lore-engine gateway
at localhost:8765. The driver script lives at examples/llm_consumer.py;
the orchestrator at examples/run_questions.sh; the system prompt template at
examples/system_prompt.txt; raw per-question traces under examples/results/.
Summary
| # | Question (shape) | Distinct tools the LLM chose | Verdict |
|---|---|---|---|
| 1 | "Who is Aldric Raventhorne?" | entity_context, lineage_of |
PASS |
| 2 | "Was House Vyr allied with Merchants Guild at 2nd_age.year_230?" | was_true_at |
PASS |
| 3 | "What is the lineage / ancestry of Aldric?" | ancestors_of, entity_context, lineage_of |
PASS |
| 4 | "Show me images of Aldric Raventhorne." | entity_context, recall_images |
PASS |
| 5 | "What are the open consistency issues?" | find_contradictions, find_anachronisms, find_orphans, find_ontology_violations |
PASS |
- All 5 questions PASS hand-verified correctness checks against the seed data (see Verification below).
- 9 distinct tools used across the 5 questions (requirement: ≥ 4):
ancestors_of,entity_context,find_anachronisms,find_contradictions,find_ontology_violations,find_orphans,lineage_of,recall_images,was_true_at. The model discovered and chose them without scripting — it saw the gateway'stools/listoutput and picked the right one(s) per question. - No fabrication. Every factual claim in every answer can be traced to a
tool result. The model reported
{}/0honestly for the consistency question instead of inventing violations.
Per-question ground truth vs. answer
Q1 — Who is Aldric Raventhorne?
- Tools called:
entity_context(name="Aldric Raventhorne"),lineage_of(person="Aldric Raventhorne"). - Ground truth (from
seed.py): Person id=aldric, name="Aldric Raventhorne", born 220, died 285, culture=Valdorni, member of House Vyr (via the lineage group), spouse Elara Raventhorne, possesses the Sword of Eventide from 2nd_age.year_245. - Answer excerpt: "Aldric Raventhorne (born 220 – died 285) was a noble of Valdorni culture. He was a member of House Vyr and belonged to the House Vyr bloodline. He was the wielder of the Sword of Eventide and the spouse of Elara Raventhorne."
- Verdict: PASS — every fact in the answer matches seed data.
Q2 — Was House Vyr allied with the Merchants Guild at 2nd_age.year_230?
- Tools called:
was_true_at(relation="ALLIED_WITH", subject="House Vyr", object="Merchants Guild", at_time="2nd_age.year_230"). - Ground truth (from
seed.py):("Faction","house_vyr","ALLIED_WITH", "Faction","merchants", "2nd_age.year_100", None). The relation starts at year 100 with no end date → it was in force at year 230. - Answer excerpt: "Yes — House Vyr was allied with the Merchants Guild at 2nd_age.year_230. The ALLIED_WITH relationship was established in 2nd_age.year_100 and has no recorded end date, so it was in effect at that time."
- Verdict: PASS — answer matches ground truth and the model correctly
formed the time slug
2nd_age.year_230without prompting.
Q3 — What is the lineage / ancestry of Aldric Raventhorne?
- Tools called:
ancestors_of(person="Aldric Raventhorne", generations=10),entity_context(name="Aldric Raventhorne"),lineage_of(person="Aldric Raventhorne"). - Ground truth (from
seed.pylineage table): the ancestor chain for Aldric is Theron Ashveil → Maric Vyr → Cael Vyr → (Yssa Raventhorne) → Aldric. The task body specifies "Aldric's ancestors are Theron, Maric, Cael, Yssa". - Answer excerpt: "Mother: Yssa Raventhorne (b. 165, d. unknown). Maternal grandfather: Cael Vyr (160–240) — House Vyr. Cael's father: Maric Vyr (85–160). Maric's father: Theron Ashveil (10–120) — the furthest ancestor the graph records."
- Verdict: PASS — all 4 canonical ancestor names appear in the answer.
Q4 — Show me images of Aldric Raventhorne.
- Tools called:
entity_context(name="Aldric Raventhorne")(to discover the entity idaldric),recall_images(entity_id="aldric"). - Ground truth (from
seed.pyimage manifest): exactly one image —img_aldric_portrait, object_keycharacters/aldric_portrait.png, caption "Portrait of Aldric Raventhorne, Lord of Thornwall. Middle-aged, dark hair, a scar above the left eye.", era2nd_age, tags["portrait","noble","thornwall"]. - Answer excerpt: "Found one image for Aldric Raventhorne: Portrait of Aldric Raventhorne, Lord of Thornwall (img_aldric_portrait) — Era: 2nd Age — Caption: 'Portrait of Aldric Raventhorne, Lord of Thornwall. Middle-aged, dark hair, a scar above the left eye.' — Tags: portrait, noble, thornwall — URL: http://localhost:9000/lore-images/characters/aldric_portrait.png?...presigned..."
- Verdict: PASS — caption is quoted verbatim; the MinIO presigned URL is surfaced (proving the gateway's blob-storage round-trip works).
Q5 — What are the open consistency issues?
- Tools called: all four —
find_contradictions,find_anachronisms,find_orphans,find_ontology_violations(in that order, with no args). - Ground truth (from
plugins/consistency.py+ v2.T3): every tool is a stub returning{"violations": [], "count": 0}. Real detection rules perlore-engine/docs/04-consistency.mdland in T5. - Answer excerpt: "No open consistency issues detected. All four consistency checks returned zero violations: Contradictions 0, Anachronisms 0, Orphans 0, Ontology violations 0. Per the harness docs, these detection rules may be stubs, so an empty result here means 'nothing the rules currently know about,' not necessarily 'the world is provably consistent.'"
- Verdict: PASS — the model did NOT fabricate contradictions. It faithfully reported the empty stubbed results and even added an appropriate caveat that the detectors may be stubs (the system prompt told it to).
Verification checklist
All four success criteria from the task body:
bash examples/run_questions.shexits 0 (verified 2026-06-16).- All 5 result files under
examples/results/*.jsonhave non-empty"answer"and"tools_called"(verified by JSON inspector). - The LLM uses ≥ 4 distinct tools across the 5 questions — actual: 9
distinct tools (
ancestors_of,entity_context,find_anachronisms,find_contradictions,find_ontology_violations,find_orphans,lineage_of,recall_images,was_true_at). - All 5 answers match seed-data ground truth — see per-question verdicts above; every claim is traceable to a tool result.
How to reproduce
cd /root/lore-engine-poc
# Pre-reqs: docker compose stack up, seed.py run, gateway on :8765,
# LiteLLM proxy on :4000 with the minimax-m3 model registered.
bash examples/run_questions.sh
# → 5 PASS lines, exit 0, JSON traces under examples/results/
What this proves
- The plugin boundary works from the consumer side. The LLM discovered
all 16 tools via
tools/listand picked the right ones for each question type — no scripted routing, no hard-coded tool names in the driver. - Tool-use loops work. On questions that required follow-up (Q3 used 3 tools in 2 turns; Q5 used 4 tools in one shot), the driver executed each tool call, fed the JSON result back into the conversation, and let the model synthesize a final answer.
- The reasoning model is honest about tool results. When
recall_imagesreturned one record, the answer said "one image". Whenfind_orphansreturned{violations: [], count: 0}, the answer said "0 orphans". No hallucinated facts. - Time-bounded reasoning works. The model formed the canonical time
slug
2nd_age.year_230from natural language without prompting and correctly interpreted a relation withend=nullas still-active. - The polyglot pipeline holds. Q4's answer includes a live MinIO presigned URL — proving the JSON-RPC → gateway → MinIO round trip works when an LLM is the client.
Out-of-scope (per task body)
- No new endpoint was added to the gateway.
- The gateway's MCP protocol was not modified.
- No agent framework (LangChain, etc.) was pulled in — the driver is raw httpx + urllib, exactly as the task specified.