docs(plan): dual-confidence model + LoreSource as first-class node

Slice 0 acceptance criteria now distinguish three things the slice
proves (time filter, integration, dual-confidence). 5 new criteria
(0.11-0.15) verify the dual-confidence model in tests.

The model:
  - extraction_confidence: did we extract this edge correctly?
    Frontmatter = 1.0, body-text heuristic = 0.6.
  - source_confidence: how reliable is the source document?
    Lives on a LoreSource node as reliability
    (canonical=1.0 | factional=0.75 | rumor=0.5 | dialogue=0.4
    | fanon=0.3).
  - aggregate confidence returned to callers = min(extraction * source)
    across all sources on the edge.

Slice 1 picks up LoreSource as a first-class graph node and
SOURCED_FROM edges from every typed edge. Path-based reliability
inference (Quests/Random/ -> rumor) ships in slice 0; slice 1
adds YAML frontmatter override and the graph node itself.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-06-17 12:19:49 -04:00
parent e0085e4c61
commit 55bea31fa2
2 changed files with 58 additions and 5 deletions

View File

@@ -36,6 +36,45 @@ time-bounded edges, the `was_true_at` query, source attribution.
| 0.8 | Every positive result has a non-empty `sources[]` pointing to a real file | ✅ |
| 0.9 | Cognee import works, `cognee.cognify()` reaches the LLM-call step | ✅ (fails on missing key, gracefully) |
| 0.10 | `scripts/03_reset.py` wipes the in-memory cache and (best-effort) the Cognee dataset | ✅ |
| 0.11 | Dual-confidence model: extraction and source dimensions are tracked separately | ✅ `tests/test_confidence.py` 6/6 |
| 0.12 | A frontmatter edge reports `extraction=1.0, source=1.0, aggregate=1.0` | ✅ |
| 0.13 | A body-text-inferred edge reports `extraction=0.6, source=1.0, aggregate=0.6` | ✅ |
| 0.14 | A rumor-sourced edge reports `extraction=1.0, source=0.5, aggregate=0.5` | ✅ |
| 0.15 | Two agreeing sources on the same fact merge into one Edge with both per-source confidences preserved | ✅ |
### What this slice proves vs. what it doesn't
The acceptance criteria above prove three **independent** things, and
it's worth being explicit about which is which so slice 1 doesn't
duplicate effort:
- **0.3 proves the time filter.** `time_in_window` is the
load-bearing primitive. 13 self-tests cover era-tree membership,
`current` resolution, sub-era windows, and open bounds. **This is
the only place in the slice where the time logic is actually
exercised end-to-end** — the demo queries pass `at_time` to
`was_true_at`, but every edge in the POC has
`valid_from = valid_until = null`, so the time filter accepts
everything by default.
- **0.10.2, 0.40.10 prove the integration.** Cognee substrate is
installable, the codex parser produces typed triples, the
`was_true_at` tool resolves names, walks the graph, returns the
documented response shape, and cites sources.
- **0.110.15 prove the dual-confidence model.** Two dimensions
are tracked: **extraction confidence** (did we extract this edge
correctly? Frontmatter=1.0, body-text heuristic=0.6) and
**source confidence** (how reliable is the document? lives on a
`LoreSource` node as `reliability: canonical | factional | rumor
| dialogue | fanon`). The aggregate confidence returned to
callers is `min(extraction * source)` across all sources on the
edge. This unblocks the `family_tree.yaml` (slice 1) and the
`LoreSource` node (slice 1) without retrofitting.
**Slice 1 is what couples the three.** When `family_tree.yaml`
ships with `valid_from` / `valid_until` per edge, the demo queries
will exercise the time filter end-to-end. When `LoreSource` ships
as a first-class node, the `reliability` field becomes structured
data instead of a path-inferred heuristic.
## Test plan

View File

@@ -43,11 +43,22 @@ fuzziness. Every edge traces to a YAML line.
`Spell` with `PRACTICES` edges.
6. `lore_engine_poc/parsers/culture.py``Culture`, `Language`,
`Deity` with `WORSHIPS` and `SPEAKS` edges.
7. Schema validation: strict, fails loudly with line numbers (YAML
"gotchas" — `NO: false` parsing as `True`, tab/space sensitivity).
8. `time_model.py` test suite grows: era-tree membership, month/day
precision, `current` token resolution against `:Now` config node,
null bounds semantics.
7. **`LoreSource` as a first-class node** — every YAML file
becomes a `LoreSource` node with a `reliability` field
(`canonical | factional | rumor | dialogue | fanon`). Each
edge points to one or more `LoreSource` nodes via a
`SOURCED_FROM` edge. This is the structured-data form of
the dual-confidence model the POC already implements
(`tests/test_confidence.py`). The `reliability` field is
overridable per-file via YAML frontmatter; the default is
`canonical` for `*.yaml` files and is path-inferred for
prose files (Quests/Random/ → `rumor`).
8. Schema validation: strict, fails loudly with line numbers
(YAML "gotchas" — `NO: false` parsing as `True`,
tab/space sensitivity).
9. `time_model.py` test suite grows: era-tree membership,
month/day precision, `current` token resolution against
`:Now` config node, null bounds semantics.
## Acceptance criteria
@@ -61,6 +72,9 @@ fuzziness. Every edge traces to a YAML line.
| 1.6 | Anachronism check flags a parent whose death precedes a child's birth |
| 1.7 | Re-ingest is idempotent (`MERGE`, not `CREATE`) |
| 1.8 | Three example YAMLs ship in `seed/` for demo |
| 1.9 | `LoreSource` is a first-class node with `reliability` field, `SOURCED_FROM` edges from every typed edge |
| 1.10 | YAML files default to `reliability: canonical`; frontmatter can override |
| 1.11 | Time-bounded edges (from `family_tree.yaml` PARENT_OF) carry `valid_from` and `valid_until`; the demo's `was_true_at` queries actually exercise `time_in_window` |
## Test plan