docs(plan): dual-confidence model + LoreSource as first-class node

Slice 0 acceptance criteria now distinguish three things the slice proves (time filter, integration, dual-confidence). 5 new criteria (0.11-0.15) verify the dual-confidence model in tests. The model: - extraction_confidence: did we extract this edge correctly? Frontmatter = 1.0, body-text heuristic = 0.6. - source_confidence: how reliable is the source document? Lives on a LoreSource node as reliability (canonical=1.0 | factional=0.75 | rumor=0.5 | dialogue=0.4 | fanon=0.3). - aggregate confidence returned to callers = min(extraction * source) across all sources on the edge. Slice 1 picks up LoreSource as a first-class graph node and SOURCED_FROM edges from every typed edge. Path-based reliability inference (Quests/Random/ -> rumor) ships in slice 0; slice 1 adds YAML frontmatter override and the graph node itself. Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-17 12:19:49 -04:00
parent e0085e4c61
commit 55bea31fa2
2 changed files with 58 additions and 5 deletions
--- a/docs/plan/00-slice-0-poc.md
+++ b/docs/plan/00-slice-0-poc.md
@@ -36,6 +36,45 @@ time-bounded edges, the `was_true_at` query, source attribution.
 | 0.8 | Every positive result has a non-empty `sources[]` pointing to a real file | ✅ |
 | 0.9 | Cognee import works, `cognee.cognify()` reaches the LLM-call step | ✅ (fails on missing key, gracefully) |
 | 0.10 | `scripts/03_reset.py` wipes the in-memory cache and (best-effort) the Cognee dataset | ✅ |
+| 0.11 | Dual-confidence model: extraction and source dimensions are tracked separately | ✅ `tests/test_confidence.py` 6/6 |
+| 0.12 | A frontmatter edge reports `extraction=1.0, source=1.0, aggregate=1.0` | ✅ |
+| 0.13 | A body-text-inferred edge reports `extraction=0.6, source=1.0, aggregate=0.6` | ✅ |
+| 0.14 | A rumor-sourced edge reports `extraction=1.0, source=0.5, aggregate=0.5` | ✅ |
+| 0.15 | Two agreeing sources on the same fact merge into one Edge with both per-source confidences preserved | ✅ |
+
+### What this slice proves vs. what it doesn't
+
+The acceptance criteria above prove three **independent** things, and
+it's worth being explicit about which is which so slice 1 doesn't
+duplicate effort:
+
+- **0.3 proves the time filter.** `time_in_window` is the
+  load-bearing primitive. 13 self-tests cover era-tree membership,
+  `current` resolution, sub-era windows, and open bounds. **This is
+  the only place in the slice where the time logic is actually
+  exercised end-to-end** — the demo queries pass `at_time` to
+  `was_true_at`, but every edge in the POC has
+  `valid_from = valid_until = null`, so the time filter accepts
+  everything by default.
+- **0.1–0.2, 0.4–0.10 prove the integration.** Cognee substrate is
+  installable, the codex parser produces typed triples, the
+  `was_true_at` tool resolves names, walks the graph, returns the
+  documented response shape, and cites sources.
+- **0.11–0.15 prove the dual-confidence model.** Two dimensions
+  are tracked: **extraction confidence** (did we extract this edge
+  correctly? Frontmatter=1.0, body-text heuristic=0.6) and
+  **source confidence** (how reliable is the document? lives on a
+  `LoreSource` node as `reliability: canonical | factional | rumor
+  | dialogue | fanon`). The aggregate confidence returned to
+  callers is `min(extraction * source)` across all sources on the
+  edge. This unblocks the `family_tree.yaml` (slice 1) and the
+  `LoreSource` node (slice 1) without retrofitting.
+
+**Slice 1 is what couples the three.** When `family_tree.yaml`
+ships with `valid_from` / `valid_until` per edge, the demo queries
+will exercise the time filter end-to-end. When `LoreSource` ships
+as a first-class node, the `reliability` field becomes structured
+data instead of a path-inferred heuristic.

 ## Test plan

--- a/docs/plan/01-slice-structured-yaml.md
+++ b/docs/plan/01-slice-structured-yaml.md
@@ -43,11 +43,22 @@ fuzziness. Every edge traces to a YAML line.
   `Spell` with `PRACTICES` edges.
 6. `lore_engine_poc/parsers/culture.py` — `Culture`, `Language`,
   `Deity` with `WORSHIPS` and `SPEAKS` edges.
-7. Schema validation: strict, fails loudly with line numbers (YAML
-   "gotchas" — `NO: false` parsing as `True`, tab/space sensitivity).
-8. `time_model.py` test suite grows: era-tree membership, month/day
-   precision, `current` token resolution against `:Now` config node,
-   null bounds semantics.
+7. **`LoreSource` as a first-class node** — every YAML file
+   becomes a `LoreSource` node with a `reliability` field
+   (`canonical | factional | rumor | dialogue | fanon`). Each
+   edge points to one or more `LoreSource` nodes via a
+   `SOURCED_FROM` edge. This is the structured-data form of
+   the dual-confidence model the POC already implements
+   (`tests/test_confidence.py`). The `reliability` field is
+   overridable per-file via YAML frontmatter; the default is
+   `canonical` for `*.yaml` files and is path-inferred for
+   prose files (Quests/Random/ → `rumor`).
+8. Schema validation: strict, fails loudly with line numbers
+   (YAML "gotchas" — `NO: false` parsing as `True`,
+   tab/space sensitivity).
+9. `time_model.py` test suite grows: era-tree membership,
+   month/day precision, `current` token resolution against
+   `:Now` config node, null bounds semantics.

 ## Acceptance criteria

@@ -61,6 +72,9 @@ fuzziness. Every edge traces to a YAML line.
 | 1.6 | Anachronism check flags a parent whose death precedes a child's birth |
 | 1.7 | Re-ingest is idempotent (`MERGE`, not `CREATE`) |
 | 1.8 | Three example YAMLs ship in `seed/` for demo |
+| 1.9 | `LoreSource` is a first-class node with `reliability` field, `SOURCED_FROM` edges from every typed edge |
+| 1.10 | YAML files default to `reliability: canonical`; frontmatter can override |
+| 1.11 | Time-bounded edges (from `family_tree.yaml` PARENT_OF) carry `valid_from` and `valid_until`; the demo's `was_true_at` queries actually exercise `time_in_window` |

 ## Test plan