Files
Lore Engine Dev eb179c913c slice 2.4: config (disable_rules, threshold, acknowledge, severity) (9/9 tests; AC 2.8, 2.9, 2.10)
- New lore_engine_poc/consistency_config.py: ConsistencyConfig dataclass
  with disable_rules[], severity (default 'warn' per AC 2.8),
  confidence_threshold (per-rule floor), acknowledged set (AC 2.9).
  is_disabled(rule_id), is_acknowledged(id), acknowledge(id) helpers.
- ConsistencyRunner.run() now accepts an optional config parameter;
  applies severity override, skips disabled rules, suppresses below
  threshold, suppresses acknowledged violations.
- Anachronism dataclass now carries source_confidences (parallel
  to sources) so confidence_threshold can suppress low-confidence
  findings. Default = 1.0 when not set.
- get_anachronisms() got an include_flagged param (default False);
  flagged violations are hidden by default.
- 9/9 new tests; full suite 245/245 (was 236).

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-18 02:43:26 -04:00

221 lines
7.4 KiB
Python

"""Lore Engine POC — consistency engine (slice 2).
The consistency engine is the part of the Lore Engine that catches
the LLM before it lies. It runs a set of rules over the graph and
surfaces violations as first-class nodes. Cognee ships **no**
contradiction machinery of its own — :class:`Contradiction`,
:class:`Anachronism`, :class:`Orphan`, and
:class:`OntologyViolation` are all Lore Engine types, built from
scratch in slice 2.
This module is the in-memory POC (slice 0/1 already established
that the slice 0/1 code path is in-memory; the Cognee materialization
is a later sub-slice). The four violation dataclasses are the
contract; :class:`ConsistencyRunner` lives in ``consistency_runner.py``
to keep this module's import surface small.
The shape of each violation follows ``docs/04-consistency.md``:
every violation has ``id``, ``severity`` (default ``"warn"`` per
AC 2.8), ``flagged`` (default ``False``; the world-builder's
acknowledge toggle, AC 2.9), ``detected_at`` (set by the runner),
and category-specific fields. The 4 categories map to 4 distinct
classes — ``isinstance`` does not cross between them.
"""
from __future__ import annotations
import uuid
from dataclasses import dataclass, field
from datetime import datetime, timezone
from typing import Optional
def _utc_now_iso() -> str:
"""ISO 8601 UTC timestamp with 'Z' suffix; matches the slice
2 spec's ``detected_at`` / ``started_at`` / ``finished_at`` format.
"""
return datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
def _new_violation_id(prefix: str) -> str:
"""Stable-ish id for a violation. The runner usually overrides
this to a deterministic hash of the violation's content (so a
re-run on the same graph yields the same id — idempotent), but
the dataclass itself defaults to a fresh uuid for callers that
don't care about determinism.
"""
return f"{prefix}-{uuid.uuid4().hex[:8]}"
# ---------------------------------------------------------------------------
# Violation dataclasses — AC 2.1, 2.8, 2.9
# ---------------------------------------------------------------------------
@dataclass
class _ViolationBase:
"""Common fields for every violation type.
Subclassing keeps the four classes small without committing to
a heavy abstraction. ``_ViolationBase`` is private: callers
depend on the four concrete classes, not on the base.
"""
id: str
severity: str = "warn"
flagged: bool = False
detected_at: str = field(default_factory=_utc_now_iso)
@dataclass
class Contradiction(_ViolationBase):
"""Two sources disagree on the same fact (Category A).
The most common case: two ``LoreSource`` files give different
parents to the same Person, or two markdown codex files claim
the same spell is in two different MagicSystems. The runner
detects this by finding edges with the same ``(subject,
relation)`` whose time windows overlap and whose ``object``
differs.
"""
subject: str = ""
predicate: str = ""
claim_a: str = ""
claim_b: str = ""
# The source documents that produced the two claims. The runner
# populates this from the edges' ``sources`` list — we keep it
# as a tuple for immutability (dataclass field default
# factory).
sources: tuple[str, ...] = field(default_factory=tuple)
def __post_init__(self) -> None:
# If the runner didn't pass an id, mint a fresh one.
if not self.id:
self.id = _new_violation_id("c")
@dataclass
class Anachronism(_ViolationBase):
"""A person/faction participates in an event outside their
lifespan (Category B).
The runner flags ``EXISTED_BEFORE`` (event before entity's
birth), ``EXISTED_AFTER`` (event after entity's death), and
``EXISTED_DURING_MISMATCH`` (entity's participation window
doesn't contain the event time). ``expected`` and ``actual``
carry the time strings the LLM needs to surface to the
world-builder.
``source_confidences`` carries the per-source confidence
scores (parallel to ``sources``) so the
``confidence_threshold`` config knob (slice 2.4) can suppress
low-confidence violations.
"""
entity_name: str = ""
event_name: str = ""
claim: str = "EXISTED_AFTER" # one of: EXISTED_BEFORE, EXISTED_AFTER, EXISTED_DURING_MISMATCH
expected: Optional[str] = None
actual: Optional[str] = None
sources: tuple[str, ...] = field(default_factory=tuple)
source_confidences: tuple[float, ...] = field(default_factory=tuple)
def __post_init__(self) -> None:
if not self.id:
self.id = _new_violation_id("a")
@dataclass
class Orphan(_ViolationBase):
"""An entity has no structural relationships (Category D).
Per ``docs/04-consistency.md`` the four orphan checks are:
- ``Person`` with no ``PARENT_OF``/``DESCENDED_FROM`` and no
``birth`` property → "Person of unknown lineage."
- ``Faction`` with no ``FOUNDED`` connection → "Faction of
unknown origin."
- ``Location`` with no ``PART_OF`` to a ``Region`` → "Unmapped
location."
- ``Event`` with no ``OCCURRED_AT``/``OCCURRED_DURING`` →
"Event with no location/era."
- ``Item`` with no ``POSSESSES`` connection → "Unowned artifact."
- ``Spell`` with no ``PART_OF_SYSTEM`` → "Spell with no magic
system."
The ``reason`` field is human-readable; the runner builds the
string from a vocabulary of pre-baked reasons so a UI can
group/filter on them.
"""
entity_name: str = ""
entity_type: str = ""
reason: str = ""
def __post_init__(self) -> None:
if not self.id:
self.id = _new_violation_id("o")
@dataclass
class OntologyViolation(_ViolationBase):
"""An instance breaks a declarative ontology rule (Category C).
Ontology rules are user-extensible; the engine ships 10 starter
rules (see :mod:`lore_engine_poc.ontology_rules`). Each
violation references the rule that produced it via ``rule_id``
so ``explain_violation`` can return the rule's cypher / rule
spec.
"""
rule_id: str = ""
subject: str = ""
predicate: str = ""
# The list of offending entity names / values. For a "no
# overlapping rulers" rule this is the two rulers; for a
# "spells need a magic system" rule it's the spell name. The
# type is list (not tuple) because the rule's natural
# representation is heterogeneous (names, ids, values).
claim: list = field(default_factory=list)
sources: tuple[str, ...] = field(default_factory=tuple)
def __post_init__(self) -> None:
if not self.id:
self.id = _new_violation_id("v")
# ---------------------------------------------------------------------------
# ConsistencyRun — the per-run summary node
# ---------------------------------------------------------------------------
@dataclass
class ConsistencyRun:
"""One consistency run's summary.
The runner writes one of these per invocation. Slice 2.4+ will
surface ``latest_run`` as an MCP tool, and the nightly cron
writes a fresh one each morning. ``id`` is the run's identity;
``started_at``/``finished_at`` are ISO 8601 UTC; counts are the
totals for the run.
"""
id: str
started_at: str
finished_at: str
duration_ms: int
rules_run: int
violations_found: int
anachronisms_found: int
orphans_found: int
__all__ = [
"Anachronism",
"Contradiction",
"ConsistencyRun",
"OntologyViolation",
"Orphan",
]