Per docs/plan/exec/07-harness.md sub-slice 7.2:
- lore_engine_poc/prompts/system_prompt.md — the
canonical system prompt. 5 question types with
canonical tool sequences, the citation rule
("cite every claim"), the time-window rule
(default at_time, explicit time in answer), the
contradiction rule (surface, don't resolve), the
6 failure modes the LLM must avoid. v1.2-aware:
mentions the slice 5T TypeTemplate tools and the
slice 6 Setting/Plane setting= filter.
- lore_engine_poc/prompts/registry.json — the
version registry. Pins the system prompt to v1.2
with model_target=minimax-m3:cloud. Old runs stay
comparable when the prompt iterates (D3).
- lore_engine_poc/prompts/loader.py — the loader.
list_registered_prompts() and load_current_system_prompt()
are the canonical entry points; the harness
runner uses them to fetch the prompt + stamp
results with the version.
- tests/harness/test_system_prompt.py — 9 tests:
registry well-formed, system_prompt registered,
path resolves, loader returns (text, version),
prompt has 5 question types, citation rule
present, time-window rule present, mentions
template tools, mentions setting filter.
Track A only (no API key). Track B uses the loader
when executing the harness.
Suite: 767 → 776 (+9).
Co-Authored-By: Claude <noreply@anthropic.com>
Per docs/plan/exec/07-harness.md sub-slice 7.1:
- tests/harness/questions.yaml — the human-friendly
YAML source. 50 questions across the 5 design-doc
types (10 each): identity, time_fact, world_state,
causal, narrative. Each question pins id, type,
query, expected_tools, expected_answer_shape, and
expected_citations. Targets the Mardonari codex
(the slice 0 fixture) so the harness can run
end-to-end against the real graph.
- tests/harness/questions.json — the compiled JSON
(committed so the runner reads it without rebuilding).
- scripts/harness/build_questions.py — the strict
compiler. Validates the YAML schema, counts questions
per type, enforces uniqueness, writes the JSON.
Validation errors fail loudly with field paths.
- tests/harness/test_questions.py — 6 tests pinning the
contract: schema, 50 total, 10 per type, expected_tools
non-empty, ids unique, version set.
Track A only (no API key needed). Track B (executing
against the live LLM) is gated on $OLLAMA_API_KEY.
Suite: 761 → 767 (+6).
Co-Authored-By: Claude <noreply@anthropic.com>
Two changes:
1. apply_plane_migration now counts only *newly
materialised* planes (existence-check before add),
so the PlaneMigrationSummary.planes_added field
reflects the graph delta on re-runs. Previously it
counted plans processed, which made "idempotency"
invisible in the summary shape.
2. New tests/test_planes/test_killer_demo.py — the
slice 6 end-to-end regression net. Wires the 6.4
backfill + 6.6 migration + 6.5 setting filter +
6.2 graph layer in one graph. Pins:
- cross-setting facts are filtered out under
setting="mardonari"
- within-setting facts survive the filter
- entities_present + events_during honour
the setting filter
- the full slice 6 pipeline is idempotent at
both the plane layer and the LAYER_OF edge
layer
Suite: 756 → 761 (+5).
Co-Authored-By: Claude <noreply@anthropic.com>
Adds the slice 6.6 migration as a library helper
(lore_engine_poc.migration: scan_codex_for_planes /
apply_plane_migration) plus a thin CLI wrapper at
scripts/05_migrate_planes.py.
Discriminator: frontmatter signals — plane: true,
tags contains plane, OR type: plane — promote an
entry to :Plane. Default Material Plane convention
remains mardonari.material (added by the 6.4
backfill). Idempotent: re-running on the same codex
produces the same graph state.
LAYER_OF edges are created only between co-referenced
Planes (markdown body [[X]] → Plane node X). Direction
follows docs/17-planes.md: (:Plane A)-[:LAYER_OF]->
(:Plane B) where A is the layer and B is the parent.
In the voldramir fixture, Voldramir(demiplane) LAYER_OF
Underdark(plane).
The script supports --dry-run (print planned changes,
exit 0) and --codex / --setting overrides.
Suite: 747 → 756 tests (+9). No regressions.
Co-Authored-By: Claude <noreply@anthropic.com>
Per AC 6.5, 6.6 — adds a keyword-only 'setting' parameter to:
- lookup (filter on matched name's setting membership)
- entity_context (entity-level filter; empty shape if excluded)
- was_true_at (both subject and object must be in setting)
- true_during (subject must be in setting)
- entities_present (located entity must be in setting)
- events_during (event subject must be in setting)
The filter resolves via graph.setting_entities(setting_id) — O(1)
reverse lookup from slice 6.2's Protocol methods. An unknown
setting returns empty results (defensive). Omitting 'setting'
preserves slice 4 / 9 behaviour (back-compat fence).
MCP tool schemas updated for all 6 entries to expose 'setting'
as an optional [string, null] parameter; opt_string_params
toggled so 'null' is coerced to None by the dispatcher.
The cross-setting fact test (Roland ENCOUNTERED The Wanderer)
is the canonical LLM-target: with setting='mardonari', Roland's
home setting, the answer is was_true=False because The Wanderer
is in the_wild_dream.
+8 tests (739 → 747). All green. No regressions.
Adds lore_engine_poc.migration.migrate_setting_id_to_exists_in()
— idempotent helper that materialises a :Setting node, a default
Material Plane, and per-entity EXISTS_IN facts. Used by:
- 01_ingest.py on every ingest (safe; idempotent)
- scripts/05_migrate_planes.py (slice 6.6 region↔plane pass)
Per docs/17-planes.md: the Material Plane's id is
{setting_id}.material and its kind is 'material'. The EXISTS_IN
fact is the timeless type-assertion (per docs/17-planes.md +
ADR 0009); time-bounded membership is the slice 6.5 reified
:Relation work.
API:
migrate_setting_id_to_exists_in(
graph, setting_id, *,
entity_ids=None, # explicit; no all_names() walk
current_era='unspecified',
schema_version='1.2',
kind='campaign',
) -> BackfillSummary
Returns a BackfillSummary so the caller can verify what changed
(test surface: 7 tests covering idempotency, default-plane
materialisation, partial overlap, multi-setting isolation, and
custom metadata).
+7 tests (732 → 739). All green. No regressions.
Pins the contract for REFLECTS, LAYER_OF, ADJACENT_TO,
ACCESSIBLE_VIA: each is a typed, timeless Edge (Layer 1), NOT a
reified :Relation node (Layer 2). Per docs/17-planes.md, plane
relations are structural — they don't carry per-entity time
bounds; reified :Relation is reserved for time-bounded facts
(slice 6.5).
Tests cover round-trip through InMemoryGraph's subject / object
/ id indexes, timelessness (valid_from/valid_until both None),
and the non-reified property (no :Relation node created on add).
The 4 edge names are pinned in EDGE_TYPES per slice 6.1.
+6 tests (726 → 732). All green. No regressions.
Extends the GraphBackend Protocol with 6 new methods (add_setting,
find_setting, add_plane, find_plane, planes_in_setting,
add_exists_in, entity_planes, setting_entities). InMemoryGraph
implements them with O(1) reverse lookups (planes_by_setting,
entities_by_setting, settings_by_entity). Neo4jGraph gains
NotImplementedError stubs so isinstance(neo4j, GraphBackend) keeps
passing until the slice 6 follow-up mirrors the Cypher.
EXISTS_IN is the timeless type-assertion per docs/17-planes.md;
time-bounded membership is the slice 6.5 reified :Relation work.
+8 tests (718 → 726). All green. No regressions.
Promotes Setting and Plane from the write-tools allowlist to
first-class Layer-1 NODE_LABELS. Adds 5 plane-relation edge types
to EDGE_TYPES (EXISTS_IN, REFLECTS, LAYER_OF, ADJACENT_TO,
ACCESSIBLE_VIA).
Per docs/17-planes.md: EXISTS_IN is the timeless type-assertion
that an entity belongs to a Setting; time-bounded planar
membership is carried by a separate reified :Relation (slice 6.5).
+6 tests (712 → 718). All green. No regressions.
Templates module (package init) and the runtime registry:
* TemplateRegistry
- .reload() rescans templates_dir and rebuilds the
{template_id: TemplateSpec} dict atomically
- .query(template_id, qid, args, graph) dispatches to the
graph backend via query_cypher()
- Duplicate template ids in the same dir raise
TemplateRegistryError with both source paths
- Every loaded query is re-validated through the Cypher
allowlist, so a bad body never reaches a backend
- Coerces string-typed optional params the same way the
core tools do ("null" / "None" / "" -> None)
* dynamic_tools.build_dynamic_tools(registry)
- One MCP ToolEntry per declared query: name = query.id,
description = query.description, inputSchema derived
from parameters (required = non-optional names).
- One list_template_tools discovery entry that returns
{templates: [{id, queries: [...]}, ...]} and accepts
an optional template_id filter.
- Same _make_adapter shape as the core tools so the
existing MCPServer dispatcher serves them with no
changes.
* 15 dedicated tests in test_template_registry.py cover:
empty dir, single template, query dispatch, duplicate-id
rejection, reload invalidates cache, dynamic tool schema
shape, list_template_tools (with and without filter),
merge into TOOL_REGISTRY visible in tools/list, end-to-end
tool call (with and without rows), missing required param
yields [], unknown template id / query id raise, and the
defining test: drop a new template file, reload(), see a
new tool appear with no code change.
Suite: 685 -> 700 (+15).
Co-Authored-By: Claude <noreply@anthropic.com>
The safety boundary for every template query. Every Cypher
body passes through validate() before it ever reaches the
in-memory matcher or the Neo4j driver.
What the allowlist does:
- Accepts read-only constructs: MATCH, OPTIONAL MATCH, WHERE,
RETURN, ORDER BY, SKIP, LIMIT.
- Accepts allowlisted aggregation functions in RETURN
(count, coalesce, min, max, sum, avg, toLower, toUpper,
length, id).
- Rejects mutations and control flow: CREATE, MERGE, SET,
DELETE, DETACH, REMOVE, CALL, UNION, WITH, FOREACH, LOAD,
USING -- with the offending keyword and a 1-based line
number.
- Rejects variable-length path patterns ('*', '*1..3').
- Enforces parameter consistency between the body and the
template's parameters: section (typo guard both ways).
- The body is never concatenated with parameter values;
the in-memory matcher / Neo4j driver binds $name via
the parameter API, so a value like "O'Brien); MERGE ...
" is a string, not Cypher.
The allowlist is the safety boundary. Everything downstream
trusts the body once validate() returns.
Suite: 665 -> 685 (+20).
Co-Authored-By: Claude <noreply@anthropic.com>
Lock the TemplateSpec validation surface with 18 dedicated tests:
- valid template populates every field (round-trip)
- missing template.id / unknown field type rejected
- enum without values / empty values rejected
- duplicate relation names / duplicate query ids rejected
- unknown relation.to_type rejected (not in NODE_LABELS)
- query without cypher body rejected
- version regex accepts X.Y / X.Y.Z, rejects v1 / "abc" / "1.x"
- idempotent re-parse yields the same spec
- tools[] with write: true is parsed and flagged (deferred)
All assertions land in the existing YamlSchemaError path with
line/field context (delegated to parsers/_yaml.py).
Suite: 647 -> 665 (+18).
Co-Authored-By: Claude <noreply@anthropic.com>
Three docker-gated tests for the full Neo4j compose stack:
* test_compose_neo4j_profile_healthy: docker compose
--profile neo4j up -d brings neo4j + lore-engine-ingest
+ lore-engine-mcp-neo4j to a healthy state within 60s.
* test_compose_neo4j_was_true_at_round_trip: was_true_at
through the Neo4j-backed MCP server returns the same
answer as the pickle-backed server for a known fact
(Roland Raventhorne / House Raventhorne / 3rd_age.year_345
→ was_true: true).
* test_compose_neo4j_down_cleans_volumes: docker compose
--profile neo4j down -v removes the neo4j_data volume.
docker-compose.yml changes:
* New neo4j:5 service with NEO4J_AUTH=none, loopback
HTTP + Bolt ports (17474/17687 by default to avoid
conflict with a developer's manual neo4j on the standard
7474/7687 ports), 1GiB mem_limit, pids_limit, healthcheck
via wget on the HTTP root.
* New lore-engine-ingest service (profile neo4j) that
runs scripts/01_ingest.py --skip-cognee --write-neo4j
after Neo4j is healthy. One-shot; no restart policy.
* The pickle-backed lore-engine-mcp service moved onto
the pickle profile (so it doesn't conflict on the
same host port when the neo4j profile is active).
* New lore-engine-mcp-neo4j service (profile neo4j)
that depends on both neo4j (service_healthy) and
lore-engine-ingest (service_completed_successfully).
Same hardening as the pickle service: cap_drop ALL,
no-new-privileges, mem_limit 512m, read_only rootfs,
tmpfs /tmp.
* Named volume neo4j_data for the Neo4j store.
Profile split (pickle | neo4j) keeps the two stacks from
colliding on the same host port when both are activated.
Run with docker compose --profile pickle up -d for the
default or --profile neo4j up -d for the production
graph substrate.
Slice 11.4 test update:
* tests/test_mcp/test_dockerfile.py test_docker_compose_up_and_round_trip
now uses --profile pickle so the pickle service
activates only.
Pre-prod hardening noted in compose yml: NEO4J_AUTH=none
is loopback-only; switch to a username/password and update
LORE_NEO4J_URI before exposing beyond loopback. Tracked in
docs/plan/05-slice-neo4j-backend.md.
Suite: 629 -> 632 passed (+3 compose-neo4j tests, all 559
baseline + 50 Neo4j + consistency + ingest + backend-switch
+ compose-neo4j tests preserved). The plan's 632 final-test
target is reached.
Both MCP entry scripts (05_mcp_server.py for stdio and
06_mcp_http_server.py for Streamable HTTP) now select their
graph backend at startup through a shared loader
(scripts/mcp_server_entry.load_graph):
* LORE_GRAPH_BACKEND=pickle (default) — load the
.graph.pkl built by 01_ingest.py.
* LORE_GRAPH_BACKEND=neo4j — connect to Neo4j at
$LORE_NEO4J_URI (default bolt://127.0.0.1:7687) and
load the mirrored graph.
* Anything else — clear error and exit 4.
Exit codes:
* 0: graph loaded (only happens if the caller ignores
the sys.exit() call below and treats load_graph() as
non-throwing — for the supported backends, load_graph
returns normally).
* 1: pickle path missing.
* 2: neo4j_graph not importable.
* 3: neo4j unreachable.
* 4: unknown backend value.
Neo4jGraph.__init__ now eagerly calls verify_connectivity()
so the loader fails loudly at startup rather than on the
first query — the driver pool opens sockets lazily otherwise,
and the first session.run would be too late for the
entry scripts to log a clear error.
Refactors:
* scripts/05_mcp_server.py: removed inline _load_graph(),
now imports from scripts.mcp_server_entry.
* scripts/06_mcp_http_server.py: same.
* lore_engine_poc/neo4j_graph.py: Neo4jGraph.__init__
eagerly verifies connectivity.
Tests:
* tests/test_mcp/test_backend_switch.py — 5 docker-gated
tests (pickle default, neo4j up, neo4j down exits 3,
garbage backend exits 4, trivial registry works with
both backends).
Suite: 624 -> 629 passed (+5 backend-switch tests, all 559
baseline + 38 Neo4j + consistency + ingest + backend-switch
tests preserved).
After the in-memory graph + pickle are written, the new flag
mirrors the full graph into the Neo4j 5 container at
$LORE_NEO4J_URI (default bolt://127.0.0.1:7687). The flag
is opt-in (default off) so the existing test suite's
invocations of 01_ingest.py without Docker still work.
The mirror logic:
* Pre-pass for LoreSource nodes (full metadata via
add_lore_source so SOURCED_FROM links find them with
name, source_type, reliability, source_confidence).
* Pre-pass for bare names (entities registered without
any edge participation — keeps :Entity count in sync
with in-memory all_names()).
* Then the edges, add()-ed one by one.
Failure semantics:
* Neo4j unreachable at startup → log + exit 3.
* neo4j_graph not importable → log + exit 2.
* Pickle is always written before the mirror attempt, so
a flaky Neo4j container never loses the in-memory state.
Consistency runner stability:
_detect_contradictions Pattern 2 (same object, different
subjects) now sorts the two claims alphabetically so
claim_a / claim_b are stable across runs. The
graph.all_names() set iteration order is otherwise
non-deterministic across Python processes and across
the in-memory / Neo4j backends, and the original
dict-iteration insertion order broke when slice 5.4
migrated to all_names().
Tests:
* tests/test_scripts/test_ingest_neo4j.py — 5 docker-gated
tests (exits zero, entity count, relation count,
default-off untouched, fails loud on unreachable URI).
* tests/test_consistency/test_runner_categories.py — one
test updated to assert claim_a/claim_b as a set rather
than a specific order (matches the runner's new
lexicographic-sort contract).
Suite: 619 -> 624 passed (+5 ingest-neo4j tests, all 559
baseline + 32 Neo4j + consistency + ingest tests preserved).
Eight docker-gated tests covering the write + full-codex
round-trip against the real .graph.pkl fixture:
* Build a Neo4jGraph from the real codex (smoke test).
* Entity count matches in-memory (corrected for the
LoreSource-as-subject edge case).
* Relation count matches in-memory.
* LoreSource count matches in-memory.
* was_true_at Roland / House Raventhorne / 3rd_age.year_345
— both backends agree.
* was_true_at Aldric / Maric sibling query.
* was_true_at Voldramir / Mardonus PART_OF query.
* Add a new edge via the Neo4j backend and verify
was_true_at against Neo4j sees it (write/read
round-trip in the same process).
The mirror helper (_mirror_in_memory_to_neo4j) pre-passes
LoreSource nodes (full metadata) and bare registered names
(entities that don't participate in any edge) so the Neo4j
backed is observationally equal to the in-memory graph
for the full codex.
Suite: 611 -> 619 passed (+8 full-codex tests, all 559
baseline + 27 Neo4j tests preserved).
Implements the read + write surface of Neo4jGraph against the
reified :Relation shape (ADR 0009). The read tools (slice 4) and
the consistency runner / ontology rules (slice 2) are migrated
to use only GraphBackend Protocol methods, so the same Python
code works against both InMemoryGraph and Neo4jGraph.
Reads (Neo4jGraph):
* edges_for_subject(name, relation=None) -> list[Edge]
* edges_for_object(name) -> list[Edge]
* find_edge_by_id(edge_id) -> Edge | None
* by_name, all_names, all_entity_types, entities_of_type,
lore_source (slice 5.3)
* Round-trip: each Edge field is stored as a :Relation node
property and rehydrated on read; Cypher ORDER BY edge_id
so list order matches the in-memory insertion order
Writes (Neo4jGraph):
* add(edge): MERGE subject + object :Entity nodes, upsert
:Relation (id-keyed), link :FROM/:TO, link :SOURCED_FROM
to each :LoreSource in the edge's sources list
* replace_edge(old_id, new_edge): in-place property update
for same (subject, relation, object); drop+re-add for
different endpoints (preserves edge_id for retcon audit)
* remove_entity(name): DETACH DELETE the :Entity + alias
cleanup; returns the number of edges that were attached
* remove_entity_of_type(name, type_): REMOVE n:Label
* rename_entity(old, new): rename + register old as alias
* resolve_alias, register_name, register_alias, add_lore_source,
add_entity_of_type (slice 5.3)
Migrations (read tools + consistency + ontology):
* tools.py: was_true_at uses graph.edges_for_subject(...)
* read_tools.py: 22 sites of graph.edges_by_subject.get /
.items / .values / graph.edges_by_object.get / graph.entities_by_type
.items / graph.lore_sources.get / graph.names migrated to the
Protocol methods
* consistency_runner.py: 4 sites (all_edges flatten,
anachronism detector, orphan detector)
* ontology_rules.py: 13 sites (10 ontology rules + helper)
* write_tools.py: 3 sites (label membership check, era walk)
CI fence (test_graph_backend_writes.py):
test_no_direct_dict_access_outside_graph_backend now greps for
the broader pattern (bracket, .get, .items, .values, .keys on
graph.edges_by_*, entities_by_type, lore_sources, aliases; and
bare graph.names). Fails the build on regression.
Parity tests (test_neo4j_read_tools_parity.py): 15 docker-gated
tests, one per read tool, asserting the in-memory and Neo4j
backends produce matching answers for a known fixture.
Suite: 596 -> 611 passed (+15 parity tests, 559 baseline preserved)
- Migrate add_lore_source direct mutation in write_tools.py:189
to graph.add_lore_source(source)
- Migrate update_entity type relabel to use graph.all_entity_types() +
graph.remove_entity_of_type() + graph.add_entity_of_type() (loop
over an explicit list since dict shape is no longer public)
- Migrate retcon (36 lines of inlined index surgery) to single
graph.replace_edge(edge_id, new_edge) call
- Migrate mark_verified (9 lines of inlined index surgery) to
graph.replace_edge(edge_id, new_edge)
- Add all_entity_types() method to InMemoryGraph (returns keys
of entities_by_type; was reachable as a private attr before)
- Add 9 tests in test_graph_backend_writes.py: add_lore_source,
remove_entity_of_type, register_name, entities_of_type,
retcon+mark_verified chokepoint contracts, edges_for_subject
with None, replace_edge preserves list position, CI fence
- CI fence: greps lore_engine_poc/ for graph.edges_by_*[
graph.entities_by_type[, graph.lore_sources[, graph.aliases[
outside graph_backend.py; fails the build on regression
Suite: 575 -> 584 passed (+9 new tests, 559 baseline preserved)
- Lift Graph dataclass from tools.py into graph_backend.py as
InMemoryGraph (the slice-0/4/10 body, byte-identical).
- New GraphBackend Protocol (PEP 544 + @runtime_checkable) with
14 method points (7 read, 7 write). Mirrors the LLMProvider
pattern in lore_engine_poc/llm.py:47-48.
- tools.Graph is now a back-compat alias (Graph = InMemoryGraph).
Zero test churn across the 559 existing tests.
- New replace_edge(old_id, new_edge) chokepoint. Lifts the
inlined index surgery that lived in write_tools.py retcon +
mark_verified. Same-endpoint swap is in-place; subject/
relation/object change drops + re-adds.
- New helpers: edges_for_subject, edges_for_object, entities_of_type,
lore_source, all_names, add_lore_source, remove_entity_of_type,
register_alias, register_name.
- 16 contract tests in tests/test_tools/test_graph_backend.py.
- Suite: 559 -> 575. No regressions.
Co-Authored-By: Claude <noreply@anthropic.com>
- Lift _TRIVIAL_REGISTRY (the _Tool dataclass + _echo + _failing
test double) out of test_server.py and test_mcp_http_module.py
into a shared tests/test_mcp/_trivial_registry.py module. The
verbatim duplication was a drift hazard flagged in the slice-11
review (one copy drifts away from the other and tests silently
test different things).
- test_server.py now imports the shared registry.
- New subprocess test: test_subprocess_malformed_json_returns_400
— mirrors the in-process test 5 path over a real socket.
- New subprocess test: test_subprocess_tool_body_exception_returns_is_error
— mirrors the in-process test 9 path over a real socket. Uses
add_entity with no 'name' (real registry's version of the
'failing' tool) since the trivial registry isn't on the wire.
- Tighten _wait_ready regex: anchor on 'Uvicorn running on
http://host:NNNNN' with 4-5 digit port (was matching any
':<digits>' substring — fragile if a future log line contains
an unrelated port).
Co-Authored-By: Claude <noreply@anthropic.com>
- New non-root user 'lore' (uid 10001) created early so --chown
works on subsequent COPYs. The final USER directive means
everything reachable from the MCP wire (pickle load, 36 tools
including 12 mutators) runs as UID 10001, not root.
- --chown=lore:lore on all COPY lines.
- Removed the redundant .graph.pkl COPY (the file is bundled via
the lore_engine_poc/ directory copy, but the explicit line was
hiding that — restore the 'override at runtime' instruction in
the README).
- New test: test_docker_runs_as_non_root — execs 'id -u' inside
the container and asserts the uid is non-zero.
Co-Authored-By: Claude <noreply@anthropic.com>
- MAX_BODY_BYTES = 1 MiB; reject with HTTP 413 + -32600 envelope
before the JSON parser allocates a Python object. Closes the
OOM-by-giant-body DoS vector.
- Drop dead try/except ImportError fallback for ERR_* constants —
always import from mcp_server (same package).
- stream() typing: AsyncIterator[bytes] (was Iterable[bytes]).
- build_app(graph: Graph, tool_registry: list) parameter types.
- Drop unused CONTENT_JSON constant.
- New test: test_post_oversized_body_rejected (HTTP 413).
- New test: test_post_unknown_method_returns_32601 — symmetric
with the stdio server.py coverage of the same path.
Co-Authored-By: Claude <noreply@anthropic.com>
* Dockerfile: python:3.12-slim, layer-cached requirements first,
COPY lore_engine_poc/scripts, bake .graph.pkl, EXPOSE 8765,
HEALTHCHECK via stdlib urllib against POST /mcp initialize.
CMD runs scripts/06_mcp_http_server.py --host 0.0.0.0.
* .dockerignore: exclude __pycache__, tests/, .git/, data/raw/, etc.
* docker-compose.yml: one service lore-engine-mcp, port 8765:8765
(overridable via $LORE_HTTP_PORT), bind mount for graph
override (commented), healthcheck. Image tag overridable via
$TAG.
* tests/test_mcp/test_dockerfile.py: 4 tests gated on docker
availability. Build, run + round-trip, compose up + round-trip,
healthcheck reaches 'healthy'. All 4 pass on this host.
550 -> 554 green.
* scripts/06_mcp_http_server.py: uvicorn entry. Mirrors
05_mcp_server.py's _load_graph() shape. CLI flags --host,
--port, --log-level. Env overrides LORE_GRAPH_PATH,
LORE_HTTP_HOST, LORE_HTTP_PORT. Single-process only; multi-worker
is intentionally not exposed (graph is in-memory per-process and
write tools do not persist).
* tests/test_mcp/test_scripts_06.py: 7 subprocess tests booting
on LORE_HTTP_PORT=0 (OS-assigned) and parsing the bound port
from the 'Uvicorn running on' line. Tests 1-5: initialize,
tools/list (36), was_true_at, SSE, notification 202. Test 6:
SIGTERM exits 0 or -SIGTERM (no traceback). Test 7: missing
graph exits non-zero with the expected error message.
543 -> 550 green.
* Add _wants_sse() and SSE branch in mcp_endpoint:
text/event-stream in Accept -> StreamingResponse with one
'event: message\ndata: <json>\n\n' frame. Default JSON path
unchanged. Empty body now rejected with 400 + -32600
(previously coerced to {}).
* 5 new in-process tests (10-14): Accept routing, SSE body shape,
GET 405, empty body 400. 538 -> 543 green.
- mcp_tools.TOOL_REGISTRY goes 24 → 36 entries (12 new write tools)
- Exposes: add_entity, add_relation, add_lore_source (slice 4.7 trio
that had been callable from scripts/02_demo.py only), plus
set_alias, update_entity, delete_entity (10.1), retcon,
mark_verified, merge_entities (10.2), define_calendar,
define_era, define_date (10.3)
- Hand-written JSON Schema per tool; trailing-underscore wire
fields (name_, object_) match the Python kwarg convention
used by the underlying functions
- test_tool_registry.py: EXPECTED_TOOLS / EXPECTED_FN grown to 36
entries; the schema-vs-signature drift detector (already in
place) validates the trailing-underscore convention
- test_protocol.py: tools/list count 24 → 36
- test_slice10_dispatch.py: 12 new dispatch tests, one per
new tool; retcon / mark_verified verify envelope shape only
because edge_id doesn't survive a subprocess restart (in-memory
graph) — actual mutation behaviour is covered in
test_write_tools_slice10b.py
- Suite 529/529 green (was 517; +12)
Co-Authored-By: Claude <noreply@anthropic.com>
- ALLOWED_LABELS gains 'Date' (Era, Calendar were already there)
- write_tools.define_calendar: name + optional days_per_year / months;
rejects empty/duplicate name and non-positive days/months
- write_tools.define_era: name + calendar + start [+ end]; validates
time bounds; stamps PART_OF Calendar edge and, when applicable,
PRECEDED edge to the most recent prior era in the same calendar
(linear ordering; world-builder can override with retcon)
- write_tools.define_date: calendar + year [+ month + day + era];
canonical time atom is '{era}.year_{Y}.month_{M}.day_{D}' (era
prefix optional); stamps INSTANCE_OF Calendar + DURING Era;
idempotent — calling twice with the same args returns the same
canonical and does not duplicate the date node
- 24 new tests in tests/test_tools/test_write_tools_slice10c.py
- Suite 517/517 green (was 493; +24)
Co-Authored-By: Claude <noreply@anthropic.com>
- Edge.edge_id: stable per-edge identity (8 hex chars, default factory)
- Graph.edges_by_id: dict[str, Edge] reverse index, populated by add()
- Graph.find_edge_by_id(id): O(1) lookup
- Graph.rename_entity: also registers old name as alias of new canonical
(merge_entities depends on this)
- Graph.remove_entity: keeps edges_by_id consistent with subject/object
indexes
- add_relation: returns the actual edge.edge_id (was fabricating a separate
uuid), so retcon / mark_verified can target it directly
- Edge.retcon_at / retcon_note: audit metadata stamped by retcon
- Edge.verified_by / verified_at / verified_note: stamped by mark_verified
- write_tools.retcon: amend edge bounds/relation/object; in-place mutation
via dataclasses.replace; validates time bounds; refuses inverted bounds
- write_tools.mark_verified: appends (1.0, 1.0, 'human_verified') source
tuple so aggregate confidence floors to 1.0
- write_tools.merge_entities: folds from_name into to_name, refuses if the
two have different labels, preserves from_name as an alias
- 25 new tests in tests/test_tools/test_write_tools_slice10b.py
- Suite 493/493 green (was 468; +25)
Co-Authored-By: Claude <noreply@anthropic.com>
- Graph.aliases: dict[str, set[str]] field; Graph.by_name follows aliases
- Graph.remove_entity(name) -> int: cascades through edges_by_subject/object,
type index, and aliases; returns edges removed
- Graph.rename_entity(old, new) -> int: re-points edges via dataclasses.replace,
preserves old name as alias of new canonical
- write_tools.set_alias: register alt name; rejects empty / duplicate
- write_tools.update_entity: label/rename with edge cascade; name_ kwarg to
avoid colliding with positional name (MCP layer maps user-facing name JSON
field to name_)
- write_tools.delete_entity: removes entity + all touching edges + aliases
- 20 new tests in tests/test_tools/test_write_tools_slice10.py
- Suite 468/468 green (was 448; +20)
Co-Authored-By: Claude <noreply@anthropic.com>
The 12 read_tools in lore_engine_poc.read_tools (entity_context,
true_during, entities_present, timeline, list_lineage, list_offspring,
ancestors_of, descendants_of, location_hierarchy, event_chain,
events_during, lore_about) were already implemented + unit-tested
in tests/test_tools/ but had not been exposed over the MCP wire.
This slice is pure registration: hand-written JSON Schema + adapter
binding for each tool, no changes to the underlying functions.
- mcp_tools.py: TOOL_REGISTRY goes from 12 → 24 entries. Docstring
updated to reflect the new total.
- test_tool_registry.py: EXPECTED_TOOLS / EXPECTED_FN grown to 24
entries; new tools' signatures cross-checked against the schema
by the existing schema-vs-signature test (caught zero drift).
- test_protocol.py: tools/list test updated to 24 tools; the
"multiple requests on one connection" test likewise.
- test_slice9_dispatch.py: 13 new subprocess tests, one per new
tool (entity_context has 2: happy path + unknown entity). Each
test boots scripts/05_mcp_server.py and verifies the response
shape against the real seed codex.
Live smoke: mcp_server.tools/list returns 24 tools, and tools/call
returns correct data for list_offspring, ancestors_of, etc.
448/448 tests pass (was 435 pre-slice; +13 from new dispatch tests).
- 01_ingest.py: LORE_INGEST_LLM=1 enables LLM extraction after the
deterministic path; build_graph is now called AFTER LLM triples
merge in (the 3.4 ordering fix).
- LORE_INGEST_FAKE_LLM=1 + LORE_INGEST_FAKE_LLM_SCRIPT=path selects
FakeProvider for offline/CI runs.
- Missing OLLAMA_API_KEY degrades gracefully: stderr warning, rc=0,
deterministic graph still built (no crash, no LLM triples).
- scripts/06_llm_smoke.py: one-shot manual smoke for the real
Ollama Cloud provider; loads one NPC, runs extractor, prints
triples. Skips (rc=0, helpful message) when OLLAMA_API_KEY unset.
- FakeProvider gains dict-style {match_any, response} / {match_any,
raise} entries so tests can skip exact-prompt matching when the
body is large.
- tests/test_extraction/test_ingest_wiring.py: 8 subprocess tests
covering default-off, enabled, idempotency (x2), adds-fact,
provider-failure tolerance, bad-JSON tolerance, and missing-key
fallback.
- tests/fixtures/llm_empty_script.json: [] (used by the enabled-
path test where no triples are expected).
435/435 tests pass (was 382 pre-slice; +53). End-to-end ingest with
--skip-cognee runs cleanly on default-off path.
- scripts/04_consistency.py: standalone on-demand run of the consistency
engine over the seed codex; prints summary + per-category detail
- scripts/02_demo.py: append consistency section using the singleton
consistency_tools path so 'latest_run()' agrees with the run summary
- tests/test_consistency/test_consistency_script.py: 2 tests (end-to-end
run + --codex flag)
- tests/test_consistency/test_demo.py: 2 tests (end-to-end run + --query
flag exercises the consistency section)
- 249/249 tests pass
- New lore_engine_poc/consistency_config.py: ConsistencyConfig dataclass
with disable_rules[], severity (default 'warn' per AC 2.8),
confidence_threshold (per-rule floor), acknowledged set (AC 2.9).
is_disabled(rule_id), is_acknowledged(id), acknowledge(id) helpers.
- ConsistencyRunner.run() now accepts an optional config parameter;
applies severity override, skips disabled rules, suppresses below
threshold, suppresses acknowledged violations.
- Anachronism dataclass now carries source_confidences (parallel
to sources) so confidence_threshold can suppress low-confidence
findings. Default = 1.0 when not set.
- get_anachronisms() got an include_flagged param (default False);
flagged violations are hidden by default.
- 9/9 new tests; full suite 245/245 (was 236).
Co-Authored-By: Claude <noreply@anthropic.com>
- New lore_engine_poc/consistency_tools.py: 10 tools from
docs/05-mcp-tools.md. Each is a thin function over a singleton
ConsistencyRunner plus its last-violations list.
Tool Function
------------------- ----------------------------------------
run_consistency_check Force-run the engine; returns ConsistencyRun
latest_run Most recent run summary (or None)
get_contradictions Filter by subject/severity/limit
get_anachronisms Filter by entity/limit
get_orphans Filter by reason/limit
get_ontology_violations Filter by rule_id/severity/limit
flag_for_review Set flagged=True on a violation (acknowledge)
explain_violation Return rule + edges + sources for a violation
add_ontology_rule Register a new rule (ValueError on dup id)
list_ontology_rules All registered rules (≥10 starter rules)
- Tools share a module-level singleton ConsistencyRunner. Tests
reset via the private _reset_runner hook.
- 19/19 new tests; full suite 236/236 (was 217).
Co-Authored-By: Claude <noreply@anthropic.com>
- New lore_engine_poc/ontology_rules.py: 10 starter rules from
docs/05-mcp-tools.md#starter-rules, each as a pure-Python callable
that takes a Graph and returns a list of OntologyViolation nodes.
Rule ids: no-overlapping-rulers, no-overlapping-spouses,
no-anachronism-participation, no-anachronism-rule, no-orphan-events,
no-orphan-locations, lineage-continuity, magic-system-coherence,
deity-worship-coherence, item-lineage. Severity: 'error' for the two
no-overlapping-* rules, 'warn' for the rest per AC 2.8.
- ConsistencyRunner.run() now invokes every registered rule (Category C)
in addition to A/B/D. rules_run=10 in the ConsistencyRun summary.
- Improved _edge_window_overlap: standard interval-overlap test
[a_from, a_until] ∩ [b_from, b_until] (half-open: a_from == b_until
is NOT overlap).
- 13/13 new tests; full suite 217/217 (was 204).
Co-Authored-By: Claude <noreply@anthropic.com>
- New lore_engine_poc/consistency.py: 4 violation dataclasses (Contradiction,
Anachronism, Orphan, OntologyViolation) + ConsistencyRun summary node.
All severity=warn by default per AC 2.8; flagged=False for acknowledge
mechanism per AC 2.9; 4 distinct classes per AC 2.1.
- New lore_engine_poc/consistency_runner.py: ConsistencyRunner walks an
in-memory Graph and emits:
* Category A (Contradiction) — two patterns: same-subject-different-object
(e.g. Aldric in two Factions at once) and same-object-different-subject
(e.g. two family trees give Aldric different fathers). Time-window
overlap required.
* Category B (Anachronism) — Person participates in event outside
inferred lifespan. Lifespan inferred from MEMBER_OF(Lineage) edges.
* Category D (Orphan) — entity with no outgoing edges and not referenced
as object anywhere. Uses pre-baked reason vocabulary from
docs/04-consistency.md §Category D.
* ConsistencyRun summary node with id, started_at, finished_at,
duration_ms, rules_run, *_found counts. latest_run() returns the
most recent summary.
- 21/21 new tests pass; full suite 204/204 (was 183; no slice 0/1 regressions).
Co-Authored-By: Claude <noreply@anthropic.com>