docs(adr): 0001 — aggregate confidence is the floor, not the mean

When a fact has multiple agreeing sources, aggregate_confidence =
min(extraction_confidence * source_confidence) across the per-source
products. The floor is the only formula that prevents a weak source
from being silently masked by a strong one — disagreement surfaces
as low confidence, which the slice-2 consistency engine will hook
into.

The dimensions themselves (extraction_confidence, source_confidence)
are tracked separately, never collapsed, so callers can ask 'is this
uncertain because we extracted it badly, or because the source is
unreliable?' — the two have different remedies.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-06-17 12:48:19 -04:00
parent 798f12825a
commit 22f25ca043

View File

@@ -0,0 +1,19 @@
# Aggregate confidence is the floor of (extraction × source) across all sources
**Status:** accepted.
When a fact has multiple sources that agree, the engine reports
`aggregate_confidence = min(extraction_confidence * source_confidence)`
across the per-source products. We chose the **floor** (worst-case)
over the **ceiling** (best-case) or the **mean** because the floor
is the only formula that prevents a weak source from being
silently masked by a strong one — the floor surfaces disagreement
as low confidence, which is the behaviour the consistency engine
in slice 2 will hook into.
The dimensions themselves (`extraction_confidence` and
`source_confidence`) are tracked separately, never collapsed, so
callers can ask "is this uncertain because we extracted it
badly, or because the source is unreliable?" — the two have
different remedies (better LLM extraction prompt vs. better
source curation).