- 00-overview: goals, what we inherit from GraphMCP-Example, naming
- 01-ontology: 14 node labels, 40+ edge types, time-bound properties
- 02-time-model: era hierarchy, {era}.{year} canonical format, time_in_window UDF
- 03-macro-micro: 3 association patterns, lookup+active context, expand_context
- 04-consistency: Contradiction/Anachronism/Orphan/OntologyViolation, 4 rule categories
- 05-mcp-tools: 30 tools (8 inherited + 22 new), 5 composition patterns, 10 starter rules
- 06-ingestion: 3 paths (prose, structured YAML, dialogue), YAML schemas for 6 source types
- 07-reasoning-harness: 5 question types, system prompt, failure modes, worked example
- 08-architecture: system diagram, services layout, UDFs, schema bootstrap
- 09-roadmap: 11 phases, MVP = 19 days end of phase 4
- 10-critique: pressure-test, S1-S4 severity, open questions
210 lines
9.6 KiB
Markdown
210 lines
9.6 KiB
Markdown
# 03 — Macro ↔ Micro Association
|
||
|
||
A high-fantasy world is full of micro details that only make sense in macro context:
|
||
|
||
- *Aldric carries a dagger* (micro) is meaningless without *Aldric is a Vyr* (macro) and *the Vyrs are a noble house* (more macro).
|
||
- *The tavern is on fire* (micro) is meaningless without *the tavern is in Mardsville* (macro) and *Mardsville is contested in the Border Wars* (more macro).
|
||
|
||
The engine's job is to make these associations *navigable*, not to make the LLM traverse a five-edge chain by hand every time. This document is how we do that.
|
||
|
||
## The principle: every node knows where it lives
|
||
|
||
Every node in the engine is implicitly connected, via a small number of well-indexed edges, to the macro structures it belongs to. The LLM can ask a question about *anything* and reach *everything* relevant in O(1) to O(3) hops.
|
||
|
||
For a `Person`, those connections are:
|
||
|
||
```
|
||
Person
|
||
├── MEMBER_OF → Faction(s) (which house / order / company)
|
||
├── BELONGS_TO → Culture(s) (which people they are)
|
||
├── WORSHIPS → Deity/Deities (what they believe)
|
||
├── PRACTICES → MagicSystem(s) (what magic they can use)
|
||
├── SPEAKS → Language(s) (what they speak)
|
||
├── LOCATED_IN → Location (where they are)
|
||
├── CLAIMS_TITLE → Title (what office they hold)
|
||
├── PARENT_OF → Person(s) (children)
|
||
├── SPOUSE_OF → Person(s) (partner)
|
||
├── BURIED_AT → Location (final rest)
|
||
└── EXISTED_DURING → Era(s) (when they lived)
|
||
```
|
||
|
||
For a `Faction`:
|
||
|
||
```
|
||
Faction
|
||
├── MEMBER_OF → Faction(s) (sub-group, vassal, parent org)
|
||
├── RULES → Location / Faction (sovereignty)
|
||
├── CONTROLS → Location(s) (military hold)
|
||
├── LOCATED_IN → Location (headquarters)
|
||
├── BELONGS_TO → Culture(s)
|
||
├── POSSESSES → Item(s) (holdings, relics)
|
||
├── CREATED → Item(s) (artifacts forged)
|
||
├── ALLIED_WITH → Faction(s)
|
||
├── ENEMY_OF → Faction(s)
|
||
└── EXISTED_DURING → Era(s)
|
||
```
|
||
|
||
For a `Location`:
|
||
|
||
```
|
||
Location
|
||
├── PART_OF → Location / Region (geographic hierarchy)
|
||
├── LOCATED_IN → Location / Region
|
||
├── RULES → Person / Faction (sovereign)
|
||
├── CONTROLS → Faction(s) (occupier)
|
||
├── NEAR → Location(s) (geographic proximity)
|
||
└── CULTURE_OF → Culture(s) (homeland of)
|
||
```
|
||
|
||
For an `Item`:
|
||
|
||
```
|
||
Item
|
||
├── POSSESSED_BY → Person / Faction (current holder)
|
||
├── CREATED → Person / Faction
|
||
├── FORGED_FROM → Material(s)
|
||
├── INHERITED_BY → Person(s) (lineage of ownership)
|
||
└── ORIGINATES_FROM → Location (where it was forged)
|
||
```
|
||
|
||
These are the **default association paths**. The MCP tool layer exposes them as composable queries.
|
||
|
||
## Three macro-association patterns
|
||
|
||
### Pattern 1: Direct association (one hop)
|
||
|
||
The simplest case. The LLM asks about Aldric, gets his faction, his location, his culture, his titles — all in a single `state_at` call.
|
||
|
||
**Cypher:**
|
||
```cypher
|
||
MATCH (p:Person {name: "Aldric Raventhorne"})
|
||
OPTIONAL MATCH (p)-[r1:MEMBER_OF]->(f:Faction)
|
||
OPTIONAL MATCH (p)-[r2:LOCATED_IN]->(loc:Location)
|
||
OPTIONAL MATCH (p)-[r3:BELONGS_TO]->(c:Culture)
|
||
OPTIONAL MATCH (p)-[r4:CLAIMS_TITLE]->(t:Title)
|
||
RETURN p, collect(DISTINCT f) AS factions,
|
||
collect(DISTINCT loc) AS locations,
|
||
collect(DISTINCT c) AS cultures,
|
||
collect(DISTINCT t) AS titles
|
||
```
|
||
|
||
**Tool:** `entity_context(name)` returns the one-hop summary. See `05-mcp-tools.md`.
|
||
|
||
### Pattern 2: Lineage chain (variable depth)
|
||
|
||
Aldric → his father → his grandfather → the Vyr bloodline. The LLM asks "what bloodline does Aldric belong to?" and gets an answer without traversing `PARENT_OF` 12 times.
|
||
|
||
This is why we have `Lineage` as a *node*, not just a property on `Person`. `Lineage` is a typed, queryable group with:
|
||
|
||
- `founding_ancestor` (the `Person` who started the bloodline)
|
||
- `cadet_branches[]` (sub-`Lineage` nodes)
|
||
- `MEMBER_OF` connections from `Person` to `Lineage`
|
||
|
||
The LLM's question *"What is Aldric's lineage, and who else is in it?"* becomes:
|
||
|
||
```cypher
|
||
MATCH (a:Person {name: "Aldric Raventhorne"})
|
||
-[:MEMBER_OF]->(lin:Lineage)
|
||
<-[:MEMBER_OF]-(relative:Person)
|
||
WHERE relative.name <> a.name
|
||
RETURN lin, collect(relative.name) AS bloodline_members
|
||
```
|
||
|
||
**Tool:** `list_lineage(person)` returns the bloodline, its cadet branches, the founding ancestor, and all known members.
|
||
|
||
### Pattern 3: Geographic / political hierarchy (region tree)
|
||
|
||
`Region` and `Location` form a tree via `PART_OF`. To go from *Aldric's dagger* (micro) to *the Kingdom of Valdorn's stance in the Border Wars* (macro), the engine traverses:
|
||
|
||
```
|
||
Aldric's dagger (Item)
|
||
→ CREATED_BY Aldric (Person)
|
||
→ MEMBER_OF House Vyr (Faction)
|
||
→ RULES Valdorn (Location)
|
||
→ PART_OF Northern Reaches (Region)
|
||
→ CULTURE_OF Valdorni (Culture)
|
||
→ LOCATED_IN — King Aelric's Court (Location)
|
||
→ CONTESTED_BY Crimson Pact (Faction) in the Border Wars (Event)
|
||
```
|
||
|
||
**Six hops.** With proper indexing, that's ~10–50ms in Neo4j. We make it a single tool call: `expand_context(entity, hops=6)`.
|
||
|
||
**Tool:** `expand_context(entity, hops, relation_filter)` returns the n-hop neighborhood filtered to specified relation types.
|
||
|
||
## The micro-anchoring problem
|
||
|
||
Here's the design risk: in a world with thousands of `Item` nodes, "Aldric's dagger" is one of 50,000 daggers. The LLM shouldn't have to say "Aldric's dagger" — it should be able to say *"the dagger"* and the engine infers which one.
|
||
|
||
**Solution: the context window.** Every LLM query in the engine happens in a *context window* — a working set of entities the LLM has been talking about. The MCP server tracks the active context per session:
|
||
|
||
```json
|
||
{
|
||
"active_context": [
|
||
{"type": "Person", "name": "Aldric Raventhorne", "id": "uuid-1"},
|
||
{"type": "Location", "name": "Thornwall Keep", "id": "uuid-2"},
|
||
{"type": "Item", "name": "Sword of Eventide", "id": "uuid-3"}
|
||
]
|
||
}
|
||
```
|
||
|
||
When the LLM calls `lookup("the dagger")`, the engine:
|
||
|
||
1. First checks the active context for `Item` nodes.
|
||
2. If exactly one `Item` matches "dagger" among them, returns it.
|
||
3. If multiple, returns the disambiguation list and asks the LLM to pick.
|
||
4. If none, falls back to a global fuzzy search.
|
||
|
||
The active context is updated automatically by other tools: every `state_at`, every `query_faction_at_time`, every `get_context` adds entities to the working set. The LLM doesn't manage it; the engine does.
|
||
|
||
**Tool:** `lookup(query)` is the disambiguating entry point.
|
||
|
||
## The "why does this matter" link
|
||
|
||
Some micro details matter only because of a macro context. Aldric's dagger matters because it's *the dagger that killed the Emperor*. The engine models this as a `SIGNIFICANCE_OF` edge from the item to a specific `Event`:
|
||
|
||
```cypher
|
||
(:Item {name: "Aldric's Dagger"})
|
||
-[:SIGNIFICANCE_OF {
|
||
role: "weapon",
|
||
context: "the assassination of Emperor Vael of the Crimson Throne"
|
||
}]->
|
||
(:Event {name: "Assassination of Emperor Vael"})
|
||
```
|
||
|
||
The LLM asking "why is this dagger famous?" gets a single tool call answer: `significance_of(item)`.
|
||
|
||
**Tool:** `significance_of(entity)` returns the historical events, cultural significance, or macro context that makes this entity noteworthy.
|
||
|
||
## Composition patterns (what the LLM does, not what the schema has)
|
||
|
||
These are the *recipes* the LLM uses, documented in `07-reasoning-harness.md` but prefigured here:
|
||
|
||
| Question | Tool sequence |
|
||
|---|---|
|
||
| "Who is Aldric?" | `entity_context(Aldric)` + `list_lineage(Aldric)` + `significance_of(Aldric)` |
|
||
| "What was happening in Valdorn in 340 TA?" | `state_at(Valdorn, "3rd_age.year_340")` + `entities_present(Valdorn, "3rd_age.year_340")` |
|
||
| "Why does Aldric care about the Crimson Throne?" | `entity_context(Aldric)` → get `MEMBER_OF House Vyr` → `expand_context(House Vyr, hops=2)` → find `RULES Crimson Throne` |
|
||
| "Is the dagger in the museum the real one?" | `lookup("the dagger")` → `significance_of(dagger)` → if Event-tagged, `get_event_chain(dagger)` to verify lineage of possession |
|
||
| "Was the Long Winter caused by the Sundering?" | `get_event_chain(Sundering)` → check for `CAUSED` edges to Long Winter |
|
||
|
||
## The risk: traversal explosion
|
||
|
||
A naive `expand_context` with `hops=10` on a dense world will return thousands of nodes. The LLM's context window will explode. We mitigate in three ways:
|
||
|
||
1. **Default hops=2.** The LLM must explicitly request deeper traversal, and the tool warns at hops=4.
|
||
2. **Relation filters.** `expand_context(Aldric, hops=2, relations=["MEMBER_OF", "RULES"])` returns only those.
|
||
3. **Confidence thresholds.** Nodes with `source_confidence < 0.5` are excluded unless requested.
|
||
4. **Result caps.** A hard cap of 200 nodes per call. The LLM paginates.
|
||
|
||
**Tool:** `expand_context(entity, hops, relation_filter, min_confidence, limit)` is rate-limited and capped.
|
||
|
||
## Summary
|
||
|
||
The macro↔micro layer is the part of the engine that makes the difference between *"a graph full of facts"* and *"a world you can reason about."* The ontology is the data; this is the access pattern.
|
||
|
||
Three rules of thumb:
|
||
|
||
- If the LLM has to do a 3+ hop traversal to answer a basic question, the ontology is missing an edge.
|
||
- If the LLM can't disambiguate *"the dagger"*, the active context is broken.
|
||
- If the LLM doesn't know *why* a fact matters, the `SIGNIFICANCE_OF` pattern is missing.
|