Per docs/plan/exec/07-harness.md sub-slice 7.2:
- lore_engine_poc/prompts/system_prompt.md — the
canonical system prompt. 5 question types with
canonical tool sequences, the citation rule
("cite every claim"), the time-window rule
(default at_time, explicit time in answer), the
contradiction rule (surface, don't resolve), the
6 failure modes the LLM must avoid. v1.2-aware:
mentions the slice 5T TypeTemplate tools and the
slice 6 Setting/Plane setting= filter.
- lore_engine_poc/prompts/registry.json — the
version registry. Pins the system prompt to v1.2
with model_target=minimax-m3:cloud. Old runs stay
comparable when the prompt iterates (D3).
- lore_engine_poc/prompts/loader.py — the loader.
list_registered_prompts() and load_current_system_prompt()
are the canonical entry points; the harness
runner uses them to fetch the prompt + stamp
results with the version.
- tests/harness/test_system_prompt.py — 9 tests:
registry well-formed, system_prompt registered,
path resolves, loader returns (text, version),
prompt has 5 question types, citation rule
present, time-window rule present, mentions
template tools, mentions setting filter.
Track A only (no API key). Track B uses the loader
when executing the harness.
Suite: 767 → 776 (+9).
Co-Authored-By: Claude <noreply@anthropic.com>
Per docs/plan/exec/07-harness.md sub-slice 7.1:
- tests/harness/questions.yaml — the human-friendly
YAML source. 50 questions across the 5 design-doc
types (10 each): identity, time_fact, world_state,
causal, narrative. Each question pins id, type,
query, expected_tools, expected_answer_shape, and
expected_citations. Targets the Mardonari codex
(the slice 0 fixture) so the harness can run
end-to-end against the real graph.
- tests/harness/questions.json — the compiled JSON
(committed so the runner reads it without rebuilding).
- scripts/harness/build_questions.py — the strict
compiler. Validates the YAML schema, counts questions
per type, enforces uniqueness, writes the JSON.
Validation errors fail loudly with field paths.
- tests/harness/test_questions.py — 6 tests pinning the
contract: schema, 50 total, 10 per type, expected_tools
non-empty, ids unique, version set.
Track A only (no API key needed). Track B (executing
against the live LLM) is gated on $OLLAMA_API_KEY.
Suite: 761 → 767 (+6).
Co-Authored-By: Claude <noreply@anthropic.com>
Two changes:
1. apply_plane_migration now counts only *newly
materialised* planes (existence-check before add),
so the PlaneMigrationSummary.planes_added field
reflects the graph delta on re-runs. Previously it
counted plans processed, which made "idempotency"
invisible in the summary shape.
2. New tests/test_planes/test_killer_demo.py — the
slice 6 end-to-end regression net. Wires the 6.4
backfill + 6.6 migration + 6.5 setting filter +
6.2 graph layer in one graph. Pins:
- cross-setting facts are filtered out under
setting="mardonari"
- within-setting facts survive the filter
- entities_present + events_during honour
the setting filter
- the full slice 6 pipeline is idempotent at
both the plane layer and the LAYER_OF edge
layer
Suite: 756 → 761 (+5).
Co-Authored-By: Claude <noreply@anthropic.com>
Adds the slice 6.6 migration as a library helper
(lore_engine_poc.migration: scan_codex_for_planes /
apply_plane_migration) plus a thin CLI wrapper at
scripts/05_migrate_planes.py.
Discriminator: frontmatter signals — plane: true,
tags contains plane, OR type: plane — promote an
entry to :Plane. Default Material Plane convention
remains mardonari.material (added by the 6.4
backfill). Idempotent: re-running on the same codex
produces the same graph state.
LAYER_OF edges are created only between co-referenced
Planes (markdown body [[X]] → Plane node X). Direction
follows docs/17-planes.md: (:Plane A)-[:LAYER_OF]->
(:Plane B) where A is the layer and B is the parent.
In the voldramir fixture, Voldramir(demiplane) LAYER_OF
Underdark(plane).
The script supports --dry-run (print planned changes,
exit 0) and --codex / --setting overrides.
Suite: 747 → 756 tests (+9). No regressions.
Co-Authored-By: Claude <noreply@anthropic.com>
Per AC 6.5, 6.6 — adds a keyword-only 'setting' parameter to:
- lookup (filter on matched name's setting membership)
- entity_context (entity-level filter; empty shape if excluded)
- was_true_at (both subject and object must be in setting)
- true_during (subject must be in setting)
- entities_present (located entity must be in setting)
- events_during (event subject must be in setting)
The filter resolves via graph.setting_entities(setting_id) — O(1)
reverse lookup from slice 6.2's Protocol methods. An unknown
setting returns empty results (defensive). Omitting 'setting'
preserves slice 4 / 9 behaviour (back-compat fence).
MCP tool schemas updated for all 6 entries to expose 'setting'
as an optional [string, null] parameter; opt_string_params
toggled so 'null' is coerced to None by the dispatcher.
The cross-setting fact test (Roland ENCOUNTERED The Wanderer)
is the canonical LLM-target: with setting='mardonari', Roland's
home setting, the answer is was_true=False because The Wanderer
is in the_wild_dream.
+8 tests (739 → 747). All green. No regressions.
Adds lore_engine_poc.migration.migrate_setting_id_to_exists_in()
— idempotent helper that materialises a :Setting node, a default
Material Plane, and per-entity EXISTS_IN facts. Used by:
- 01_ingest.py on every ingest (safe; idempotent)
- scripts/05_migrate_planes.py (slice 6.6 region↔plane pass)
Per docs/17-planes.md: the Material Plane's id is
{setting_id}.material and its kind is 'material'. The EXISTS_IN
fact is the timeless type-assertion (per docs/17-planes.md +
ADR 0009); time-bounded membership is the slice 6.5 reified
:Relation work.
API:
migrate_setting_id_to_exists_in(
graph, setting_id, *,
entity_ids=None, # explicit; no all_names() walk
current_era='unspecified',
schema_version='1.2',
kind='campaign',
) -> BackfillSummary
Returns a BackfillSummary so the caller can verify what changed
(test surface: 7 tests covering idempotency, default-plane
materialisation, partial overlap, multi-setting isolation, and
custom metadata).
+7 tests (732 → 739). All green. No regressions.
Pins the contract for REFLECTS, LAYER_OF, ADJACENT_TO,
ACCESSIBLE_VIA: each is a typed, timeless Edge (Layer 1), NOT a
reified :Relation node (Layer 2). Per docs/17-planes.md, plane
relations are structural — they don't carry per-entity time
bounds; reified :Relation is reserved for time-bounded facts
(slice 6.5).
Tests cover round-trip through InMemoryGraph's subject / object
/ id indexes, timelessness (valid_from/valid_until both None),
and the non-reified property (no :Relation node created on add).
The 4 edge names are pinned in EDGE_TYPES per slice 6.1.
+6 tests (726 → 732). All green. No regressions.
Extends the GraphBackend Protocol with 6 new methods (add_setting,
find_setting, add_plane, find_plane, planes_in_setting,
add_exists_in, entity_planes, setting_entities). InMemoryGraph
implements them with O(1) reverse lookups (planes_by_setting,
entities_by_setting, settings_by_entity). Neo4jGraph gains
NotImplementedError stubs so isinstance(neo4j, GraphBackend) keeps
passing until the slice 6 follow-up mirrors the Cypher.
EXISTS_IN is the timeless type-assertion per docs/17-planes.md;
time-bounded membership is the slice 6.5 reified :Relation work.
+8 tests (718 → 726). All green. No regressions.
Promotes Setting and Plane from the write-tools allowlist to
first-class Layer-1 NODE_LABELS. Adds 5 plane-relation edge types
to EDGE_TYPES (EXISTS_IN, REFLECTS, LAYER_OF, ADJACENT_TO,
ACCESSIBLE_VIA).
Per docs/17-planes.md: EXISTS_IN is the timeless type-assertion
that an entity belongs to a Setting; time-bounded planar
membership is carried by a separate reified :Relation (slice 6.5).
+6 tests (712 → 718). All green. No regressions.
Templates module (package init) and the runtime registry:
* TemplateRegistry
- .reload() rescans templates_dir and rebuilds the
{template_id: TemplateSpec} dict atomically
- .query(template_id, qid, args, graph) dispatches to the
graph backend via query_cypher()
- Duplicate template ids in the same dir raise
TemplateRegistryError with both source paths
- Every loaded query is re-validated through the Cypher
allowlist, so a bad body never reaches a backend
- Coerces string-typed optional params the same way the
core tools do ("null" / "None" / "" -> None)
* dynamic_tools.build_dynamic_tools(registry)
- One MCP ToolEntry per declared query: name = query.id,
description = query.description, inputSchema derived
from parameters (required = non-optional names).
- One list_template_tools discovery entry that returns
{templates: [{id, queries: [...]}, ...]} and accepts
an optional template_id filter.
- Same _make_adapter shape as the core tools so the
existing MCPServer dispatcher serves them with no
changes.
* 15 dedicated tests in test_template_registry.py cover:
empty dir, single template, query dispatch, duplicate-id
rejection, reload invalidates cache, dynamic tool schema
shape, list_template_tools (with and without filter),
merge into TOOL_REGISTRY visible in tools/list, end-to-end
tool call (with and without rows), missing required param
yields [], unknown template id / query id raise, and the
defining test: drop a new template file, reload(), see a
new tool appear with no code change.
Suite: 685 -> 700 (+15).
Co-Authored-By: Claude <noreply@anthropic.com>
The safety boundary for every template query. Every Cypher
body passes through validate() before it ever reaches the
in-memory matcher or the Neo4j driver.
What the allowlist does:
- Accepts read-only constructs: MATCH, OPTIONAL MATCH, WHERE,
RETURN, ORDER BY, SKIP, LIMIT.
- Accepts allowlisted aggregation functions in RETURN
(count, coalesce, min, max, sum, avg, toLower, toUpper,
length, id).
- Rejects mutations and control flow: CREATE, MERGE, SET,
DELETE, DETACH, REMOVE, CALL, UNION, WITH, FOREACH, LOAD,
USING -- with the offending keyword and a 1-based line
number.
- Rejects variable-length path patterns ('*', '*1..3').
- Enforces parameter consistency between the body and the
template's parameters: section (typo guard both ways).
- The body is never concatenated with parameter values;
the in-memory matcher / Neo4j driver binds $name via
the parameter API, so a value like "O'Brien); MERGE ...
" is a string, not Cypher.
The allowlist is the safety boundary. Everything downstream
trusts the body once validate() returns.
Suite: 665 -> 685 (+20).
Co-Authored-By: Claude <noreply@anthropic.com>
Lock the TemplateSpec validation surface with 18 dedicated tests:
- valid template populates every field (round-trip)
- missing template.id / unknown field type rejected
- enum without values / empty values rejected
- duplicate relation names / duplicate query ids rejected
- unknown relation.to_type rejected (not in NODE_LABELS)
- query without cypher body rejected
- version regex accepts X.Y / X.Y.Z, rejects v1 / "abc" / "1.x"
- idempotent re-parse yields the same spec
- tools[] with write: true is parsed and flagged (deferred)
All assertions land in the existing YamlSchemaError path with
line/field context (delegated to parsers/_yaml.py).
Suite: 647 -> 665 (+18).
Co-Authored-By: Claude <noreply@anthropic.com>
The docker-compose Neo4j test re-ran 01_ingest.py (via the
lore-engine-ingest service), which rebuilt the .graph.pkl
fixture. The new bytes are identical to the previous build
except for the pickle metadata header timestamp.
No semantic change.
Updates the POC README with a 'Storage backends' section
documenting the GraphBackend Protocol, the two implementations
(InMemoryGraph and Neo4jGraph), and the LORE_GRAPH_BACKEND
env-var selection in the MCP entry scripts. Adds the slice 5
plan doc (docs/plan/05-slice-neo4j-backend.md in the design
repo) and ADR 0011 capturing the Protocol + dual-write
model decisions.
Three docker-gated tests for the full Neo4j compose stack:
* test_compose_neo4j_profile_healthy: docker compose
--profile neo4j up -d brings neo4j + lore-engine-ingest
+ lore-engine-mcp-neo4j to a healthy state within 60s.
* test_compose_neo4j_was_true_at_round_trip: was_true_at
through the Neo4j-backed MCP server returns the same
answer as the pickle-backed server for a known fact
(Roland Raventhorne / House Raventhorne / 3rd_age.year_345
→ was_true: true).
* test_compose_neo4j_down_cleans_volumes: docker compose
--profile neo4j down -v removes the neo4j_data volume.
docker-compose.yml changes:
* New neo4j:5 service with NEO4J_AUTH=none, loopback
HTTP + Bolt ports (17474/17687 by default to avoid
conflict with a developer's manual neo4j on the standard
7474/7687 ports), 1GiB mem_limit, pids_limit, healthcheck
via wget on the HTTP root.
* New lore-engine-ingest service (profile neo4j) that
runs scripts/01_ingest.py --skip-cognee --write-neo4j
after Neo4j is healthy. One-shot; no restart policy.
* The pickle-backed lore-engine-mcp service moved onto
the pickle profile (so it doesn't conflict on the
same host port when the neo4j profile is active).
* New lore-engine-mcp-neo4j service (profile neo4j)
that depends on both neo4j (service_healthy) and
lore-engine-ingest (service_completed_successfully).
Same hardening as the pickle service: cap_drop ALL,
no-new-privileges, mem_limit 512m, read_only rootfs,
tmpfs /tmp.
* Named volume neo4j_data for the Neo4j store.
Profile split (pickle | neo4j) keeps the two stacks from
colliding on the same host port when both are activated.
Run with docker compose --profile pickle up -d for the
default or --profile neo4j up -d for the production
graph substrate.
Slice 11.4 test update:
* tests/test_mcp/test_dockerfile.py test_docker_compose_up_and_round_trip
now uses --profile pickle so the pickle service
activates only.
Pre-prod hardening noted in compose yml: NEO4J_AUTH=none
is loopback-only; switch to a username/password and update
LORE_NEO4J_URI before exposing beyond loopback. Tracked in
docs/plan/05-slice-neo4j-backend.md.
Suite: 629 -> 632 passed (+3 compose-neo4j tests, all 559
baseline + 50 Neo4j + consistency + ingest + backend-switch
+ compose-neo4j tests preserved). The plan's 632 final-test
target is reached.
Both MCP entry scripts (05_mcp_server.py for stdio and
06_mcp_http_server.py for Streamable HTTP) now select their
graph backend at startup through a shared loader
(scripts/mcp_server_entry.load_graph):
* LORE_GRAPH_BACKEND=pickle (default) — load the
.graph.pkl built by 01_ingest.py.
* LORE_GRAPH_BACKEND=neo4j — connect to Neo4j at
$LORE_NEO4J_URI (default bolt://127.0.0.1:7687) and
load the mirrored graph.
* Anything else — clear error and exit 4.
Exit codes:
* 0: graph loaded (only happens if the caller ignores
the sys.exit() call below and treats load_graph() as
non-throwing — for the supported backends, load_graph
returns normally).
* 1: pickle path missing.
* 2: neo4j_graph not importable.
* 3: neo4j unreachable.
* 4: unknown backend value.
Neo4jGraph.__init__ now eagerly calls verify_connectivity()
so the loader fails loudly at startup rather than on the
first query — the driver pool opens sockets lazily otherwise,
and the first session.run would be too late for the
entry scripts to log a clear error.
Refactors:
* scripts/05_mcp_server.py: removed inline _load_graph(),
now imports from scripts.mcp_server_entry.
* scripts/06_mcp_http_server.py: same.
* lore_engine_poc/neo4j_graph.py: Neo4jGraph.__init__
eagerly verifies connectivity.
Tests:
* tests/test_mcp/test_backend_switch.py — 5 docker-gated
tests (pickle default, neo4j up, neo4j down exits 3,
garbage backend exits 4, trivial registry works with
both backends).
Suite: 624 -> 629 passed (+5 backend-switch tests, all 559
baseline + 38 Neo4j + consistency + ingest + backend-switch
tests preserved).
After the in-memory graph + pickle are written, the new flag
mirrors the full graph into the Neo4j 5 container at
$LORE_NEO4J_URI (default bolt://127.0.0.1:7687). The flag
is opt-in (default off) so the existing test suite's
invocations of 01_ingest.py without Docker still work.
The mirror logic:
* Pre-pass for LoreSource nodes (full metadata via
add_lore_source so SOURCED_FROM links find them with
name, source_type, reliability, source_confidence).
* Pre-pass for bare names (entities registered without
any edge participation — keeps :Entity count in sync
with in-memory all_names()).
* Then the edges, add()-ed one by one.
Failure semantics:
* Neo4j unreachable at startup → log + exit 3.
* neo4j_graph not importable → log + exit 2.
* Pickle is always written before the mirror attempt, so
a flaky Neo4j container never loses the in-memory state.
Consistency runner stability:
_detect_contradictions Pattern 2 (same object, different
subjects) now sorts the two claims alphabetically so
claim_a / claim_b are stable across runs. The
graph.all_names() set iteration order is otherwise
non-deterministic across Python processes and across
the in-memory / Neo4j backends, and the original
dict-iteration insertion order broke when slice 5.4
migrated to all_names().
Tests:
* tests/test_scripts/test_ingest_neo4j.py — 5 docker-gated
tests (exits zero, entity count, relation count,
default-off untouched, fails loud on unreachable URI).
* tests/test_consistency/test_runner_categories.py — one
test updated to assert claim_a/claim_b as a set rather
than a specific order (matches the runner's new
lexicographic-sort contract).
Suite: 619 -> 624 passed (+5 ingest-neo4j tests, all 559
baseline + 32 Neo4j + consistency + ingest tests preserved).
Eight docker-gated tests covering the write + full-codex
round-trip against the real .graph.pkl fixture:
* Build a Neo4jGraph from the real codex (smoke test).
* Entity count matches in-memory (corrected for the
LoreSource-as-subject edge case).
* Relation count matches in-memory.
* LoreSource count matches in-memory.
* was_true_at Roland / House Raventhorne / 3rd_age.year_345
— both backends agree.
* was_true_at Aldric / Maric sibling query.
* was_true_at Voldramir / Mardonus PART_OF query.
* Add a new edge via the Neo4j backend and verify
was_true_at against Neo4j sees it (write/read
round-trip in the same process).
The mirror helper (_mirror_in_memory_to_neo4j) pre-passes
LoreSource nodes (full metadata) and bare registered names
(entities that don't participate in any edge) so the Neo4j
backed is observationally equal to the in-memory graph
for the full codex.
Suite: 611 -> 619 passed (+8 full-codex tests, all 559
baseline + 27 Neo4j tests preserved).
Implements the read + write surface of Neo4jGraph against the
reified :Relation shape (ADR 0009). The read tools (slice 4) and
the consistency runner / ontology rules (slice 2) are migrated
to use only GraphBackend Protocol methods, so the same Python
code works against both InMemoryGraph and Neo4jGraph.
Reads (Neo4jGraph):
* edges_for_subject(name, relation=None) -> list[Edge]
* edges_for_object(name) -> list[Edge]
* find_edge_by_id(edge_id) -> Edge | None
* by_name, all_names, all_entity_types, entities_of_type,
lore_source (slice 5.3)
* Round-trip: each Edge field is stored as a :Relation node
property and rehydrated on read; Cypher ORDER BY edge_id
so list order matches the in-memory insertion order
Writes (Neo4jGraph):
* add(edge): MERGE subject + object :Entity nodes, upsert
:Relation (id-keyed), link :FROM/:TO, link :SOURCED_FROM
to each :LoreSource in the edge's sources list
* replace_edge(old_id, new_edge): in-place property update
for same (subject, relation, object); drop+re-add for
different endpoints (preserves edge_id for retcon audit)
* remove_entity(name): DETACH DELETE the :Entity + alias
cleanup; returns the number of edges that were attached
* remove_entity_of_type(name, type_): REMOVE n:Label
* rename_entity(old, new): rename + register old as alias
* resolve_alias, register_name, register_alias, add_lore_source,
add_entity_of_type (slice 5.3)
Migrations (read tools + consistency + ontology):
* tools.py: was_true_at uses graph.edges_for_subject(...)
* read_tools.py: 22 sites of graph.edges_by_subject.get /
.items / .values / graph.edges_by_object.get / graph.entities_by_type
.items / graph.lore_sources.get / graph.names migrated to the
Protocol methods
* consistency_runner.py: 4 sites (all_edges flatten,
anachronism detector, orphan detector)
* ontology_rules.py: 13 sites (10 ontology rules + helper)
* write_tools.py: 3 sites (label membership check, era walk)
CI fence (test_graph_backend_writes.py):
test_no_direct_dict_access_outside_graph_backend now greps for
the broader pattern (bracket, .get, .items, .values, .keys on
graph.edges_by_*, entities_by_type, lore_sources, aliases; and
bare graph.names). Fails the build on regression.
Parity tests (test_neo4j_read_tools_parity.py): 15 docker-gated
tests, one per read tool, asserting the in-memory and Neo4j
backends produce matching answers for a known fixture.
Suite: 596 -> 611 passed (+15 parity tests, 559 baseline preserved)
- Migrate add_lore_source direct mutation in write_tools.py:189
to graph.add_lore_source(source)
- Migrate update_entity type relabel to use graph.all_entity_types() +
graph.remove_entity_of_type() + graph.add_entity_of_type() (loop
over an explicit list since dict shape is no longer public)
- Migrate retcon (36 lines of inlined index surgery) to single
graph.replace_edge(edge_id, new_edge) call
- Migrate mark_verified (9 lines of inlined index surgery) to
graph.replace_edge(edge_id, new_edge)
- Add all_entity_types() method to InMemoryGraph (returns keys
of entities_by_type; was reachable as a private attr before)
- Add 9 tests in test_graph_backend_writes.py: add_lore_source,
remove_entity_of_type, register_name, entities_of_type,
retcon+mark_verified chokepoint contracts, edges_for_subject
with None, replace_edge preserves list position, CI fence
- CI fence: greps lore_engine_poc/ for graph.edges_by_*[
graph.entities_by_type[, graph.lore_sources[, graph.aliases[
outside graph_backend.py; fails the build on regression
Suite: 575 -> 584 passed (+9 new tests, 559 baseline preserved)
- Lift Graph dataclass from tools.py into graph_backend.py as
InMemoryGraph (the slice-0/4/10 body, byte-identical).
- New GraphBackend Protocol (PEP 544 + @runtime_checkable) with
14 method points (7 read, 7 write). Mirrors the LLMProvider
pattern in lore_engine_poc/llm.py:47-48.
- tools.Graph is now a back-compat alias (Graph = InMemoryGraph).
Zero test churn across the 559 existing tests.
- New replace_edge(old_id, new_edge) chokepoint. Lifts the
inlined index surgery that lived in write_tools.py retcon +
mark_verified. Same-endpoint swap is in-place; subject/
relation/object change drops + re-adds.
- New helpers: edges_for_subject, edges_for_object, entities_of_type,
lore_source, all_names, add_lore_source, remove_entity_of_type,
register_alias, register_name.
- 16 contract tests in tests/test_tools/test_graph_backend.py.
- Suite: 559 -> 575. No regressions.
Co-Authored-By: Claude <noreply@anthropic.com>
Note the new pre-step: 01_ingest.py --skip-cognee must run on the
host before 'docker build' so the .graph.pkl bundled via the
source-tree COPY is present in the image. (Slice 11.4)
Co-Authored-By: Claude <noreply@anthropic.com>
- Lift _TRIVIAL_REGISTRY (the _Tool dataclass + _echo + _failing
test double) out of test_server.py and test_mcp_http_module.py
into a shared tests/test_mcp/_trivial_registry.py module. The
verbatim duplication was a drift hazard flagged in the slice-11
review (one copy drifts away from the other and tests silently
test different things).
- test_server.py now imports the shared registry.
- New subprocess test: test_subprocess_malformed_json_returns_400
— mirrors the in-process test 5 path over a real socket.
- New subprocess test: test_subprocess_tool_body_exception_returns_is_error
— mirrors the in-process test 9 path over a real socket. Uses
add_entity with no 'name' (real registry's version of the
'failing' tool) since the trivial registry isn't on the wire.
- Tighten _wait_ready regex: anchor on 'Uvicorn running on
http://host:NNNNN' with 4-5 digit port (was matching any
':<digits>' substring — fragile if a future log line contains
an unrelated port).
Co-Authored-By: Claude <noreply@anthropic.com>
- Port mapping 127.0.0.1:${LORE_HTTP_PORT:-8765}:8765 — by default
only loopback can reach the server. Prevents accidental LAN
exposure during dev.
- cap_drop: [ALL], security_opt: no-new-privileges:true — drops
all Linux capabilities; no escalation.
- read_only: true with tmpfs /tmp — rootfs is read-only at
runtime.
- mem_limit: 512m, pids_limit: 256 — bound the worst case.
- Drop hardcoded container_name (parallel CI runs collide).
Co-Authored-By: Claude <noreply@anthropic.com>
- New non-root user 'lore' (uid 10001) created early so --chown
works on subsequent COPYs. The final USER directive means
everything reachable from the MCP wire (pickle load, 36 tools
including 12 mutators) runs as UID 10001, not root.
- --chown=lore:lore on all COPY lines.
- Removed the redundant .graph.pkl COPY (the file is bundled via
the lore_engine_poc/ directory copy, but the explicit line was
hiding that — restore the 'override at runtime' instruction in
the README).
- New test: test_docker_runs_as_non_root — execs 'id -u' inside
the container and asserts the uid is non-zero.
Co-Authored-By: Claude <noreply@anthropic.com>
- MAX_BODY_BYTES = 1 MiB; reject with HTTP 413 + -32600 envelope
before the JSON parser allocates a Python object. Closes the
OOM-by-giant-body DoS vector.
- Drop dead try/except ImportError fallback for ERR_* constants —
always import from mcp_server (same package).
- stream() typing: AsyncIterator[bytes] (was Iterable[bytes]).
- build_app(graph: Graph, tool_registry: list) parameter types.
- Drop unused CONTENT_JSON constant.
- New test: test_post_oversized_body_rejected (HTTP 413).
- New test: test_post_unknown_method_returns_32601 — symmetric
with the stdio server.py coverage of the same path.
Co-Authored-By: Claude <noreply@anthropic.com>
* Dockerfile: python:3.12-slim, layer-cached requirements first,
COPY lore_engine_poc/scripts, bake .graph.pkl, EXPOSE 8765,
HEALTHCHECK via stdlib urllib against POST /mcp initialize.
CMD runs scripts/06_mcp_http_server.py --host 0.0.0.0.
* .dockerignore: exclude __pycache__, tests/, .git/, data/raw/, etc.
* docker-compose.yml: one service lore-engine-mcp, port 8765:8765
(overridable via $LORE_HTTP_PORT), bind mount for graph
override (commented), healthcheck. Image tag overridable via
$TAG.
* tests/test_mcp/test_dockerfile.py: 4 tests gated on docker
availability. Build, run + round-trip, compose up + round-trip,
healthcheck reaches 'healthy'. All 4 pass on this host.
550 -> 554 green.
* scripts/06_mcp_http_server.py: uvicorn entry. Mirrors
05_mcp_server.py's _load_graph() shape. CLI flags --host,
--port, --log-level. Env overrides LORE_GRAPH_PATH,
LORE_HTTP_HOST, LORE_HTTP_PORT. Single-process only; multi-worker
is intentionally not exposed (graph is in-memory per-process and
write tools do not persist).
* tests/test_mcp/test_scripts_06.py: 7 subprocess tests booting
on LORE_HTTP_PORT=0 (OS-assigned) and parsing the bound port
from the 'Uvicorn running on' line. Tests 1-5: initialize,
tools/list (36), was_true_at, SSE, notification 202. Test 6:
SIGTERM exits 0 or -SIGTERM (no traceback). Test 7: missing
graph exits non-zero with the expected error message.
543 -> 550 green.
* Add _wants_sse() and SSE branch in mcp_endpoint:
text/event-stream in Accept -> StreamingResponse with one
'event: message\ndata: <json>\n\n' frame. Default JSON path
unchanged. Empty body now rejected with 400 + -32600
(previously coerced to {}).
* 5 new in-process tests (10-14): Accept routing, SSE body shape,
GET 405, empty body 400. 538 -> 543 green.
- mcp_tools.TOOL_REGISTRY goes 24 → 36 entries (12 new write tools)
- Exposes: add_entity, add_relation, add_lore_source (slice 4.7 trio
that had been callable from scripts/02_demo.py only), plus
set_alias, update_entity, delete_entity (10.1), retcon,
mark_verified, merge_entities (10.2), define_calendar,
define_era, define_date (10.3)
- Hand-written JSON Schema per tool; trailing-underscore wire
fields (name_, object_) match the Python kwarg convention
used by the underlying functions
- test_tool_registry.py: EXPECTED_TOOLS / EXPECTED_FN grown to 36
entries; the schema-vs-signature drift detector (already in
place) validates the trailing-underscore convention
- test_protocol.py: tools/list count 24 → 36
- test_slice10_dispatch.py: 12 new dispatch tests, one per
new tool; retcon / mark_verified verify envelope shape only
because edge_id doesn't survive a subprocess restart (in-memory
graph) — actual mutation behaviour is covered in
test_write_tools_slice10b.py
- Suite 529/529 green (was 517; +12)
Co-Authored-By: Claude <noreply@anthropic.com>
- ALLOWED_LABELS gains 'Date' (Era, Calendar were already there)
- write_tools.define_calendar: name + optional days_per_year / months;
rejects empty/duplicate name and non-positive days/months
- write_tools.define_era: name + calendar + start [+ end]; validates
time bounds; stamps PART_OF Calendar edge and, when applicable,
PRECEDED edge to the most recent prior era in the same calendar
(linear ordering; world-builder can override with retcon)
- write_tools.define_date: calendar + year [+ month + day + era];
canonical time atom is '{era}.year_{Y}.month_{M}.day_{D}' (era
prefix optional); stamps INSTANCE_OF Calendar + DURING Era;
idempotent — calling twice with the same args returns the same
canonical and does not duplicate the date node
- 24 new tests in tests/test_tools/test_write_tools_slice10c.py
- Suite 517/517 green (was 493; +24)
Co-Authored-By: Claude <noreply@anthropic.com>
- Edge.edge_id: stable per-edge identity (8 hex chars, default factory)
- Graph.edges_by_id: dict[str, Edge] reverse index, populated by add()
- Graph.find_edge_by_id(id): O(1) lookup
- Graph.rename_entity: also registers old name as alias of new canonical
(merge_entities depends on this)
- Graph.remove_entity: keeps edges_by_id consistent with subject/object
indexes
- add_relation: returns the actual edge.edge_id (was fabricating a separate
uuid), so retcon / mark_verified can target it directly
- Edge.retcon_at / retcon_note: audit metadata stamped by retcon
- Edge.verified_by / verified_at / verified_note: stamped by mark_verified
- write_tools.retcon: amend edge bounds/relation/object; in-place mutation
via dataclasses.replace; validates time bounds; refuses inverted bounds
- write_tools.mark_verified: appends (1.0, 1.0, 'human_verified') source
tuple so aggregate confidence floors to 1.0
- write_tools.merge_entities: folds from_name into to_name, refuses if the
two have different labels, preserves from_name as an alias
- 25 new tests in tests/test_tools/test_write_tools_slice10b.py
- Suite 493/493 green (was 468; +25)
Co-Authored-By: Claude <noreply@anthropic.com>
- Graph.aliases: dict[str, set[str]] field; Graph.by_name follows aliases
- Graph.remove_entity(name) -> int: cascades through edges_by_subject/object,
type index, and aliases; returns edges removed
- Graph.rename_entity(old, new) -> int: re-points edges via dataclasses.replace,
preserves old name as alias of new canonical
- write_tools.set_alias: register alt name; rejects empty / duplicate
- write_tools.update_entity: label/rename with edge cascade; name_ kwarg to
avoid colliding with positional name (MCP layer maps user-facing name JSON
field to name_)
- write_tools.delete_entity: removes entity + all touching edges + aliases
- 20 new tests in tests/test_tools/test_write_tools_slice10.py
- Suite 468/468 green (was 448; +20)
Co-Authored-By: Claude <noreply@anthropic.com>
The 12 read_tools in lore_engine_poc.read_tools (entity_context,
true_during, entities_present, timeline, list_lineage, list_offspring,
ancestors_of, descendants_of, location_hierarchy, event_chain,
events_during, lore_about) were already implemented + unit-tested
in tests/test_tools/ but had not been exposed over the MCP wire.
This slice is pure registration: hand-written JSON Schema + adapter
binding for each tool, no changes to the underlying functions.
- mcp_tools.py: TOOL_REGISTRY goes from 12 → 24 entries. Docstring
updated to reflect the new total.
- test_tool_registry.py: EXPECTED_TOOLS / EXPECTED_FN grown to 24
entries; new tools' signatures cross-checked against the schema
by the existing schema-vs-signature test (caught zero drift).
- test_protocol.py: tools/list test updated to 24 tools; the
"multiple requests on one connection" test likewise.
- test_slice9_dispatch.py: 13 new subprocess tests, one per new
tool (entity_context has 2: happy path + unknown entity). Each
test boots scripts/05_mcp_server.py and verifies the response
shape against the real seed codex.
Live smoke: mcp_server.tools/list returns 24 tools, and tools/call
returns correct data for list_offspring, ancestors_of, etc.
448/448 tests pass (was 435 pre-slice; +13 from new dispatch tests).
- 01_ingest.py: LORE_INGEST_LLM=1 enables LLM extraction after the
deterministic path; build_graph is now called AFTER LLM triples
merge in (the 3.4 ordering fix).
- LORE_INGEST_FAKE_LLM=1 + LORE_INGEST_FAKE_LLM_SCRIPT=path selects
FakeProvider for offline/CI runs.
- Missing OLLAMA_API_KEY degrades gracefully: stderr warning, rc=0,
deterministic graph still built (no crash, no LLM triples).
- scripts/06_llm_smoke.py: one-shot manual smoke for the real
Ollama Cloud provider; loads one NPC, runs extractor, prints
triples. Skips (rc=0, helpful message) when OLLAMA_API_KEY unset.
- FakeProvider gains dict-style {match_any, response} / {match_any,
raise} entries so tests can skip exact-prompt matching when the
body is large.
- tests/test_extraction/test_ingest_wiring.py: 8 subprocess tests
covering default-off, enabled, idempotency (x2), adds-fact,
provider-failure tolerance, bad-JSON tolerance, and missing-key
fallback.
- tests/fixtures/llm_empty_script.json: [] (used by the enabled-
path test where no triples are expected).
435/435 tests pass (was 382 pre-slice; +53). End-to-end ingest with
--skip-cognee runs cleanly on default-off path.
- scripts/04_consistency.py: standalone on-demand run of the consistency
engine over the seed codex; prints summary + per-category detail
- scripts/02_demo.py: append consistency section using the singleton
consistency_tools path so 'latest_run()' agrees with the run summary
- tests/test_consistency/test_consistency_script.py: 2 tests (end-to-end
run + --codex flag)
- tests/test_consistency/test_demo.py: 2 tests (end-to-end run + --query
flag exercises the consistency section)
- 249/249 tests pass
- New lore_engine_poc/consistency_config.py: ConsistencyConfig dataclass
with disable_rules[], severity (default 'warn' per AC 2.8),
confidence_threshold (per-rule floor), acknowledged set (AC 2.9).
is_disabled(rule_id), is_acknowledged(id), acknowledge(id) helpers.
- ConsistencyRunner.run() now accepts an optional config parameter;
applies severity override, skips disabled rules, suppresses below
threshold, suppresses acknowledged violations.
- Anachronism dataclass now carries source_confidences (parallel
to sources) so confidence_threshold can suppress low-confidence
findings. Default = 1.0 when not set.
- get_anachronisms() got an include_flagged param (default False);
flagged violations are hidden by default.
- 9/9 new tests; full suite 245/245 (was 236).
Co-Authored-By: Claude <noreply@anthropic.com>