docs(plan): 01-slice-impl-plan.md — TDD sub-slices for structured YAML ingest

This commit is contained in:
2026-06-18 00:30:07 -04:00
parent c3fa2f7ce4
commit 121ef2b761

View File

@@ -0,0 +1,176 @@
# Slice 1 — TDD Implementation Plan
**Owner:** this loop (Claude).
**Scope:** `docs/plan/01-slice-structured-yaml.md` (the AC table is
the contract). Implementation lives in
`~/projects/lore-engine-poc/` alongside slice 0.
**TDD rule:** every new behaviour ships with a failing test first;
the test names follow `test_<AC>_<description>` so a single
`pytest --collect-only` makes AC coverage visible.
## Decision points locked before coding starts
| # | Decision | Source | Implication |
|---|---|---|---|
| D1 | Edges with time bounds are reified `:Relation` nodes (not native edges) | ADR 0009 | The in-memory `Edge` from slice 0 stays the substrate; YAML parsers emit time-bounded edges with `valid_from`/`valid_until` populated. |
| D2 | `Lineage``Faction` | ADR 0003 | `family_tree.yaml` produces `Lineage` nodes only; `factions.yaml` produces `Faction` nodes only. No "House" leakage between them. |
| D3 | `LoreSource` is a first-class node | slice 1 AC 1.9, 1.10 | Move the existing `LoreSource` dataclass out of `parsers.py` into a graph-side module so it's not just an attribute on triples. |
| D4 | `time_in_window` is the only time predicate | 02-time-model.md + slice 0 | All structured-YAML edges go through `time_in_window` (which the demo's `was_true_at` already calls). |
| D5 | Yaml loader: `PyYAML` safe_load only | 06-ingestion.md §YAML | Reject the Norway problem (`NO: false``True`) via strict schema, not the YAML parser. |
| D6 | File-relative validity: a YAML file is one source | slice 0 dual-confidence | Every triple in `family_tree.yaml` shares the file's `reliability` (default `canonical`). |
## Sub-slice ordering and parallelisation
The 9 sub-slices below respect the dependency order
`parsers → LoreSource → validation → time tests → seed YAMLs → demo`.
Independent items can fan out to sub-agents in the same iteration.
```
1.1 family_tree parser ──┐
├──> 1.3 LoreSource as graph node ──┐
1.2 factions parser ──────┘ │
├──> 1.5 schema validation + idempotent re-ingest
┌──> 1.4 timeline/gazetteer/ │
│ bestiary/magic/culture │
│ │
└──> 1.6 time_model ≥30 cases │
├──> 1.7 seed/ YAMLs + demo extension
```
- **1.1 ∥ 1.2** (no shared code) → sub-agent A and B in parallel.
- **1.4** (5 parsers) splits internally into 5 sub-tasks; one parser per sub-agent, all reading the same schema-validation contract from 1.5's draft.
- **1.6** is independent of 1.11.5 — it only touches `time_model.py`. Run it as a stand-alone sub-agent.
- **1.3, 1.5, 1.7** are integration points; do them in the main loop.
## AC → test map
Each AC row gets at least one pytest. Tests live in
`tests/test_parsers/test_<parser>.py` and
`tests/test_time_model.py`. The mapping:
| AC | Test name | Sub-slice |
|---|---|---|
| 1.1 | `test_all_six_parsers_emit_expected_edge_shape` | 1.4 |
| 1.2 | `test_family_tree_edges_carry_non_null_bounds` | 1.1 |
| 1.3 | (pytest parametrize, 30+ cases) `test_time_model_<case>` | 1.6 |
| 1.4 | `test_was_true_at_filters_by_window` | 1.1 (integration) |
| 1.5 | `test_schema_validator_rejects_malformed_with_line` | 1.5 |
| 1.6 | `test_parent_dies_before_child_birth_raises` | 1.1 |
| 1.7 | `test_reingest_is_idempotent` | 1.5 |
| 1.8 | `test_seed_yaml_files_exist` (filesystem fixture) | 1.7 |
| 1.9 | `test_lore_source_is_graph_node` | 1.3 |
| 1.10 | `test_yaml_default_reliability_canonical` | 1.3 |
| 1.11 | `test_demo_queries_exercise_time_in_window` | 1.7 (integration) |
| 1.12 | `test_family_tree_emits_lineage_not_faction` | 1.1 |
| 1.13 | `test_faction_member_has_reason_field` | 1.2 |
| 1.14 | `test_multiple_memberships_non_overlapping` | 1.2 |
| 1.15 | `test_cross_lineage_marriage_child_in_named_lineage` | 1.1 |
## Sub-slice briefs
### 1.1 family_tree.yaml parser (TDD-first)
**First failing test** (this is the gate for the whole slice):
```python
def test_family_tree_emits_lineage_not_faction(tmp_path):
yaml = tmp_path / "ashveil.yaml"
yaml.write_text(textwrap.dedent('''
lineage: "ashveil_bloodline"
founding_ancestor: "theron_ashveil"
members:
- id: "theron_ashveil"
name: "Theron Ashveil"
born: "1st_age.year_412"
died: "2nd_age.year_87"
parents: []
- id: "aldric_raventhorne"
name: "Aldric Raventhorne"
born: "3rd_age.year_300"
parents: ["theron_ashveil"]
'''))
entities, triples = parse_structured_yaml(str(tmp_path))
node_labels = {e.type for e in entities}
assert "Lineage" in node_labels
assert "Faction" not in node_labels
# ...assert the exact triple list
```
Then make it pass with the smallest code possible.
### 1.2 factions.yaml parser
Same shape, parallel track. Emit one `Faction` node per file, plus
`MEMBER_OF(Faction)` edges with `valid_from = member.joined`,
`valid_until = member.left`, and a `reason` field stored as a
property on the triple (slice 4 will lift it into the tool layer).
### 1.3 LoreSource as a graph node
Move `LoreSource` from `parsers.py` into a new
`lore_engine_poc/lore_source.py` module, expose it on `Graph` as a
separate index `graph.sources_by_path: dict[str, LoreSource]`, and
add a `SOURCED_FROM` triple to the output list (currently the
`source_path` is a property — make it a first-class edge so slice
2's consistency engine can reason about it). Tests assert that
every `Triple` produced by `extract_triples` has a matching
`SOURCED_FROM` triple pointing at a `LoreSource` node.
### 1.4 Five more parsers
Same TDD-first pattern. Each parser takes a `tmp_path` fixture with
one YAML file and asserts:
1. The expected node labels appear.
2. The expected (subject, relation, object, valid_from, valid_until)
triples appear.
3. Re-running yields the same set (idempotency).
### 1.5 Schema validation + idempotent re-ingest
A single `validate_family_tree(data, source_path) -> None` raises
`FamilyTreeSchemaError(message, line=N)` with the offending line.
Tests assert:
- `NO: false` (the Norway problem) is rejected.
- `parents: null` on a non-root is rejected.
- A duplicate `(lineage, member_id)` on re-ingest is silently merged,
not duplicated (AC 1.7).
### 1.6 time_model.py ≥30 cases
Parametrize the existing 13 self-tests, plus add:
- month/day precision (`3rd_age.year_345.month_3.day_17`)
- `:Now` config node resolution (the slice 0 code path already
takes `current_time=`; promote it to module-level config)
- half-open window edge cases
- both bounds null with `at=None` → True
- a malformed time string raises `ValueError`
Target: 30+ cases, all passing.
### 1.7 seed/ YAMLs + demo
Three example YAMLs in `lore_engine_poc/seed/yaml/`:
- `ashveil_bloodline.yaml` — lineage (the AC 1.15 cross-lineage case)
- `house_raventhorne.yaml` — faction (AC 1.14 multi-membership)
- `battle_of_black_spire.yaml` — timeline event (AC 1.11 demo query)
`scripts/02_demo.py` gains three queries that *only* return true
inside the window (the negative case proves the time filter is real,
not bypassed).
## Spot-check protocol
After every sub-slice lands, a single `pytest -q` runs in
`lore-engine-poc/`. Sub-agent commits get reviewed before merging
into `main`:
- `git diff main..wt/<branch>` — small, scoped to one AC family
- `pytest -q` from the sub-agent's branch tip
- If anything looks out-of-scope, surface it instead of merging.
## Out of scope (deferred)
- LLM extraction (slice 3) — separate track, separate slice.
- Consistency engine (slice 2) — needs both 1 and 3, lands later.
- Neo4j UDF port — slice 0's pure-Python port stays as the test
oracle; Neo4j Java port is a polish item per ADR 0008.