Files
obsidian-foundry-sync/docs/epics.md
2026-06-23 03:53:35 +00:00

155 KiB
Raw Blame History

stepsCompleted, inputDocuments
stepsCompleted inputDocuments
step-01-validate-prerequisites
step-02-design-epics
step-03-create-stories
step-04-final-validation
docs/prds/prd-foundry-obsidian-sync-2026-06-22/prd.md
docs/prds/prd-foundry-obsidian-sync-2026-06-22/.decision-log.md
docs/prds/prd-foundry-obsidian-sync-2026-06-22/review-engineering.md
docs/prds/prd-foundry-obsidian-sync-2026-06-22/review-launchable.md

foundry-obsidian-sync - Epic Breakdown

Overview

This document provides the complete epic and story breakdown for foundry-obsidian-sync, decomposing the requirements from the Live Relay Sync — Auto-Sync & Bidirectional Hardening PRD (final) into implementable stories. Scope = full live-sync surface: (A) safe O→F auto-sync, (B) F→O auto direction, (C) operational hardening, plus launchable-grade security, error contracts, and data integrity.

No separate Architecture or UX Design documents exist — the engineering review folded technical decisions (Foundry-side hash input, relay constraints, code citations) into the PRD, and dashboard UX is specified inline in the FRs (F3 conflict row, F4 status, F6 onboarding, F7 security).

Requirements Inventory

Functional Requirements

F1 — Obsidian→Foundry auto-sync (safe)

  • FR-1.1: Watch the refined vault dir for .md saves (recursive fs.watch + per-subdir fallback); skip .obsidian, dotfiles, and the reserved status-note path.
  • FR-1.2: On save, skip notes with no foundry.cc_uuid (unlinked) or no foundry.contentHash baseline (unseeded).
  • FR-1.3: Skip if current Obsidian body hash equals the foundry.contentHash baseline (no-op save / post-push baseline write).
  • FR-1.4: Before pushing, compute the Foundry-side hash and compare to foundry.ccHash (new field). Reuse the /get that pushNote already performs — no extra round-trip. Hash input = canonicalize(htmlToMarkdown(flags["campaign-codex"].data)) + name + folder_path (same contentHash pipeline so sides are comparable). Extend baselineFoundryBlock/baselineNote to rewrite ccHash:. Hash-stability unit test across push→/get round-trip required before ship.
  • FR-1.5: Route Obsidian-changed + Foundry-unchanged → push O→F via pushNote; re-baseline both sides on success.
  • FR-1.6: If Foundry side unreadable (/get 404/timeout/session down), skip push + surface error row — never fall back to Obsidian-side-only check.
  • FR-1.7: After successful push, re-baseline both foundry.contentHash and foundry.ccHash to new values (real vault + .bak, apply mode only).
  • FR-1.8: Auto-sync always applies live to Foundry (dry-run not honored).
  • FR-1.9: Auto-sync requires apply mode; blocked in dev mode with an explanatory banner.
  • FR-1.10: TOCTOU guard — after updateEntry succeeds, re-/get and verify the Foundry-side hash matches what was written; if it diverged (concurrent edit), surface a conflict row instead of baselining.

F2 — Foundry→Obsidian auto-sync (two layers; relay has no push channel, /search minified = {uuid,id,name,img,documentType}, no folder/content)

  • FR-2.1: Shallow poll (default ON) — poll /search minified on a configurable cadence; build {uuid → name/img} snapshot; detect renames (name change on known uuid), new, missing. No folder/content detection here.
  • FR-2.2: Deep poll (default ON, minutes cadence) — per linked note /get + compute Foundry-side hash (FR-1.4 input); compare to ccHash to detect content changes + folder moves. Concurrency-capped (mapPool). Supersedes ADR-005 for rename/new/missing + content/move.
  • FR-2.3: For Foundry-changed linked notes where Obsidian is unchanged: /get, convert to refined markdown, write into vault, re-baseline both sides (apply mode only).
  • FR-2.4: Never clobber an Obsidian-side change; vault-newer or both-diverged route to F3 conflict handling.
  • FR-2.5: New (cc-only) Foundry entries surface in a separate "live new entries" list (not the LevelDB ccOnly pool); one-click "Import as new refined note" with plain-language explanation; no auto-import.
  • FR-2.6: Manual "catch up now" trigger forces an immediate deep sweep; cadences configurable with jitter.

F3 — Divergence detection & conflict routing

  • FR-3.1: Every tick computes both-side hashes and routes per the 2×2 (parity / O-changed / F-changed / both-changed). A per-uuid lock shared by watcher + poll paths ensures one direction acts at a time. FR-1.4's /get evaluated after debounce drains.
  • FR-3.2: both-changed → no auto-overwrite; conflict row with side-by-side diff + one-line plain-language summary.
  • FR-3.3: Conflict actions: "Push vault → Foundry", "Pull Foundry → vault", "Accept both as-is (keep divergence)".
  • FR-3.4: Conflict state persists until resolved, across ticks AND server restarts (sync-state.json).
  • FR-3.5: Foundry-side renames and folder moves surface as changes/conflicts, not silently absorbed.
  • FR-3.6: Conflict diff format = side-by-side + plain-language summary line.
  • FR-3.7: "Accept both as-is (keep divergence)" re-baselines both hashes to current values without transferring content; confirmation dialog states what happens to each side.
  • FR-3.8: Each conflict action states, before commit, what it will do to each side + baselines (one-line preview); no irreversible action without confirm.
  • FR-3.9: A resolved conflict produces an activity-panel entry stating which side won and that the other side's edits were not transferred.
  • FR-3.10: Conflict-row ordering neutral: vault left, Foundry right, no pre-highlighted action.

F4 — Sync status & parity

  • FR-4.1: Dashboard persistent sync-status header: ON/OFF, mode (apply only for auto-sync), watched dir.
  • FR-4.2: Dashboard parity indicator: in-parity / O-pending / F-pending / conflict / unsynced-linked counts + last-sync timestamp.
  • FR-4.3: Status note at reserved dot-path ${VAULT}/.sync-status.md (covered by dotfile skip) + foundry.sync_status: true sentinel; both O→F watcher and F→O poll check path AND sentinel and skip on either; lost sentinel → surfaced as user error, not synced.
  • FR-4.4: When sync OFF, dashboard shows loud "SYNC PAUSED" state.
  • FR-4.5: Dashboard parity + vault status note reflect one underlying state (sync-state.json).
  • FR-4.6: Status-note exclusion airtight by both path and sentinel (rename-safe).
  • FR-4.7: Status state (parity counts, conflict state, last-sync) in persisted sync-state.json surviving restart.

F5 — Operational hardening

  • FR-5.1: Recursive-watch fallback verified on host kernel (re-scan on subdir create/rename).
  • FR-5.2: Debounce 700ms + max concurrency 3 (defaults, configurable), validated against ~50-note burst.
  • FR-5.3: Retry split: transient (408/504, 5xx, session-temp-unavailable) → bounded backoff; persistent (404 invalid clientId, 401 bad key, 404 no connected clients) → no retry, surface immediately.
  • FR-5.4: Activity panel: last 200 events, scrollable, time/note/status/message per op.
  • FR-5.5: Inflight dedup + per-uuid shared lock verified under burst — no dropped events, no duplicate pushes, no cross-direction oscillation.
  • FR-5.6: Before any auto/manual push, /get + cache prior Foundry entry to foundry-backups/<uuid>/<iso>.json (last N retained). Reuses FR-1.4's /get.
  • FR-5.7: "Revert last push" dashboard action restores most recent cached Foundry state for a note via /update.
  • FR-5.8: All ops append to persistent rotated log logs/sync-<date>.log (survives restart).
  • FR-5.9: "Copy diagnostics" dashboard action — redacted bundle of log tail + config (secrets redacted) + parity + relay/clientId status.

F6 — Onboarding & config (given operator prerequisites)

  • FR-6.1: Empty RELAY_CLIENT_ID → clear "no clientId configured" state + guidance (not silent 404).
  • FR-6.2: No connected Foundry client → "Foundry not connected" + disable auto-sync + re-check on cadence.
  • FR-6.3: List connected relay clients from the UI (relay /search no-clientId → client list on >1, "No connected clients" on 0); exactly-one client → "auto-resolved, no pick needed".

F7 — Security & access control

  • FR-7.1: Dashboard authenticates by default (token/password via env or first-run prompt); unauthenticated → 401.
  • FR-7.2: Default bind 127.0.0.1; 0.0.0.0 requires opt-in AND an auth token set (refuse to start otherwise).
  • FR-7.3: Secrets (RELAY_API_KEY, RELAY_PASSWORD) never rendered to the browser; masked presence only.
  • FR-7.4: POST mutation endpoints require CSRF token or same-origin check.

NonFunctional Requirements

  • NFR-1: No-clobber safety — no auto-sync op may overwrite a side changed since last sync; both-diverged → conflict; TOCTOU closed by FR-1.10. Current O→F code violates this — fix before/within delivery A.
  • NFR-2: Fail-safe — relay can't read Foundry side → skip + surface, never blind push.
  • NFR-3: Performance — debounce 700ms + concurrency 3 handle ~50-note burst; operating envelope validated against ≥N notes ≥M JournalEntries (above author's size); cadences shallow=seconds-tens, deep=minutes; relay-load ceiling documented.
  • NFR-4: Reliability — transient errors retried with backoff; persistent surfaced within one tick (no retry).
  • NFR-5: Observability — every op visible in activity panel + status note; no silent skips/overwrites; status-note writes never produce a sync op; history persistent across restart.
  • NFR-6: Onboardability (honest) — given operator prerequisites, non-author DM reaches connected live sync via dashboard with no shell beyond those prereqs.
  • NFR-7: Configurability — poll cadences, debounce, concurrency, status-note path, backup retention, auth token env/config-driven with safe defaults.
  • NFR-8: Backward compatibility — manual buttons, seed/sync/rePull/import rows, dev/apply modes, CLI full index all keep working; auto-sync newly gated to apply mode.
  • NFR-9: Security — no unauthenticated mutation path; no secret egress; default bind localhost; TLS recommended beyond localhost.
  • NFR-10: Data integrity — Foundry-side overwrites always preceded by local backup (FR-5.6); dashboard restore path (FR-5.7); complements, not replaces, Foundry world backups.
  • NFR-11: Upgrades — foundry: block carries schema_version; hash/identity changes ship with idempotent migration run at startup before auto-sync engages, with dashboard banner.

Additional Requirements

From the PRD's engineering + launchable reviews (technical decisions folded into the PRD):

  • Foundry-side hash input is fixed (FR-1.4): canonicalize(htmlToMarkdown(flags["campaign-codex"].data)) + name + folder_path, reusing the contentHash pipeline. Requires a linkedom-based HTML→markdown inverse of obsidianToFoundryJsonLive. Hash-stability unit test across push→/get round-trip is a hard gate.
  • foundry.ccHash is a new frontmatter field; baselineFoundryBlock (src/server.ts:289) and baselineNote (src/server.ts:307) must be extended to rewrite it; readFoundryBlock consumers read it.
  • foundry.schema_version is a new frontmatter field (NFR-11); startup migration pass re-hashes/re-baselines old notes before auto-sync engages.
  • Per-uuid shared lock (FR-3.1) replaces the per-relPath inflight set for cross-direction safety; the watcher and poll paths share it.
  • sync-state.json (FR-4.7) is the single persisted source for parity counts, conflict state, last-sync.
  • foundry-backups/<uuid>/<iso>.json (FR-5.6) local cache + retention.
  • logs/sync-<date>.log (FR-5.8) persistent rotated log.
  • Operator prerequisites (PRD §2): 5 infrastructure gates (relay container, API key, headless session, rest-api module wiring, deps+dashboard start) are wired by the operator outside the dashboard; the dashboard detects/guides but does not perform them.
  • Behavior changes from current code: default bind 0.0.0.0127.0.0.1 + auth required (FR-7.2); auto-sync no longer runs in dev mode (FR-1.9); AutoSyncController.process() gets the divergence guard (FR-1.4/1.5/1.6) — it is NOT safe to ship the uncommitted controller as-is.
  • Live end-to-end verification (SM-2) is gated on the operator bringing up the headless Foundry session + a valid RELAY_CLIENT_ID. All other story work can proceed offline.

UX Design Requirements

No separate UX Design document. Dashboard UX is specified inline in the FRs:

  • UX-DR1 (F3): Conflict row — side-by-side diff + one-line plain-language summary (FR-3.2/3.6); three labeled actions with per-action one-line previews before commit (FR-3.3/3.8); confirmation dialog for "Accept both as-is (keep divergence)" stating what happens to each side (FR-3.7); resolved-conflict activity entry (FR-3.9); neutral ordering vault-left/Foundry-right, no pre-highlighted action (FR-3.10).
  • UX-DR2 (F4): Sync-status header (ON/OFF/mode/watched dir) + parity indicator (counts + last-sync) (FR-4.1/4.2); loud "SYNC PAUSED" state when off (FR-4.4); status note content shape (on/off, last sync, parity, recent events) (FR-4.3).
  • UX-DR3 (F6): "No clientId configured" / "Foundry not connected" / client-list picker states with guidance, no shell (FR-6.1/6.2/6.3); single-client = "auto-resolved, no pick needed".
  • UX-DR4 (F5): Activity panel (last 200, scrollable) (FR-5.4); "Revert last push" action (FR-5.7); "Copy diagnostics" action (FR-5.9).
  • UX-DR5 (F7): First-run auth prompt (token/password) (FR-7.1); masked secret presence (FR-7.3).
  • UX-DR6 (F2): "live new entries" list with one-click "Import as new refined note" + plain explanation (FR-2.5); "catch up now" trigger (FR-2.6).

FR Coverage Map

Foundation / shared primitives (E0):

  • FR-1.4 (shared ccHash component + hash input contract): E0 + E1a (stability test) + E1b (divergence guard consuming it)
  • FR-3.1 (per-uuid shared lock, bidirectional): E0 (built + frozen) — consumed by E1b (O→F) and E2 (F→O)
  • FR-5.5 (inflight dedup + per-uuid lock): E0 (lock) + E1b/E2 (verified under burst on each path)

E1a — hash spike (go/no-go gate):

  • FR-1.4 (hash-stability unit test across push→/get round-trip): E1a

E1b — controller hardening (Slice 0):

  • FR-1.1: E1b — vault watcher (recursive + fallback, skip rules incl. status-note path)
  • FR-1.2: E1b — skip unlinked/unseeded
  • FR-1.3: E1b — body-hash parity skip
  • FR-1.4: E1b — Foundry-side ccHash divergence guard reusing pushNote's /get (no extra round-trip)
  • FR-1.5: E1b — 2×2 push routing (O-changed + F-unchanged → push)
  • FR-1.6: E1b — fail-safe (F-unreadable → skip + surface)
  • FR-1.7: E1b — dual re-baseline (contentHash + ccHash)
  • FR-1.8: E1b — auto-sync always live to Foundry
  • FR-1.9: E1b — auto-sync gated to apply mode
  • FR-1.10: E1b — TOCTOU post-push re-verify (rides E0 lock)
  • FR-5.1: E1b — recursive-watch fallback verification (the watcher lives here)
  • FR-5.2: E1b — debounce/concurrency defaults (validated against burst)
  • FR-5.3: E1b — transient/persistent retry policy + O→F application (E2 applies it to poll)
  • FR-5.6: E1b — pre-push Foundry backup cache (foundry-backups/<uuid>/<iso>.json)
  • FR-5.7: E1b — "Revert last push" action
  • FR-5.8: E1b — persistent rotated log logs/sync-<date>.log (pulled forward from E5; records what revert/push did)

E7 — security (Slice 0, parallel with E1b):

  • FR-7.1: E7 — auth by default
  • FR-7.2: E7 — localhost default + auth-gated 0.0.0.0
  • FR-7.3: E7 — no secret egress (masked presence)
  • FR-7.4: E7 — CSRF/same-origin on mutations

E2 — F→O (Slice 1):

  • FR-2.1: E2 — shallow poll (renames/new/missing)
  • FR-2.2: E2 — deep poll (content/moves, minutes cadence, mapPool-capped)
  • FR-2.3: E2 — F→O pull + dual re-baseline (vault write, dev-mirror-safe)
  • FR-2.4: E2 — never clobber O-side change (routes to E3)
  • FR-2.5: E2 — live-new-entries list + one-click import
  • FR-2.6: E2 — "catch up now" trigger + jittered cadence
  • FR-5.3 (poll application): E2

E4 — status & parity (Slice 1):

  • FR-4.1: E4 — sync-status header
  • FR-4.2: E4 — parity indicator
  • FR-4.3: E4 — status note (dot-path + sentinel); watcher exclusion wired in E1b
  • FR-4.4: E4 — SYNC PAUSED state
  • FR-4.5: E4 — single status source
  • FR-4.6: E4 — airtight status-note exclusion
  • FR-4.7: E4 — persisted sync-state.json
  • FR-5.4: E4 — activity panel (last 200) (observability pulled forward from E5)
  • PREP/RUN-THE-MATCH mode flag: E4 (owns sync-state → natural owner)

E3 — conflict resolution UX (Slice 2):

  • FR-3.1 (routing): E3 (2×2 routing; lock from E0)
  • FR-3.2: E3 — conflict row (side-by-side + summary)
  • FR-3.3: E3 — three resolution actions
  • FR-3.4: E3 — conflict persists across ticks + restarts (via E4's sync-state.json)
  • FR-3.5: E3 — renames/moves surface as conflicts
  • FR-3.6: E3 — side-by-side diff format
  • FR-3.7: E3 — "Accept both as-is" semantics + confirmation
  • FR-3.8: E3 — per-action preview before commit
  • FR-3.9: E3 — resolved-conflict activity entry
  • FR-3.10: E3 — neutral conflict ordering

E5 (remnant) — diagnostics polish (Tail):

  • FR-5.9: E5 — "Copy diagnostics" action
  • (retry-split, persistent log, activity panel migrated into E1b/E2/E4)

E6 — onboarding & config (Tail):

  • FR-6.1: E6 — empty clientId detection + guidance
  • FR-6.2: E6 — no-connected-client detection + re-check
  • FR-6.3: E6 — client-list picker (single-client auto-resolve)
  • README "how to run" stub touched every slice (E6 owns the final pass)

NFR ownership (primary): NFR-1 → E1b + E3 · NFR-2 → E1b · NFR-3 → E1b + E2 · NFR-4 → E1b (retry policy) + E2 (poll retry) · NFR-5 → E1b (log) + E4 (status/activity) · NFR-6 → E6 · NFR-7 → cross-cutting · NFR-8 → cross-cutting · NFR-9 → E7 · NFR-10 → E1b · NFR-11 → E1b (flagsSchemaVersion migration) + E4 (syncStateSchemaVersion migration).

Epic List (locked — E0 + 3 slices + tail)

Structure decision (2026-06-22): Approved via Party Mode scrutiny (Architect, Dev, Analyst, PM). The flat 7-epic plan was restructured to extract shared primitives (E0), split the fat root (E1a spike-gate + E1b hardening), pull observability forward (persistent log → E1b; activity panel → E4), make E7 a hard launch gate (localhost-default; auth only when bind changes), and reframe sequencing around launchable slices. Two disambiguations: flagsSchemaVersion (E1b) vs syncStateSchemaVersion (E4); PREP/RUN-THE-MATCH mode flag owned by E4. O→F auto-sync ships OFF-by-default / opt-in per session until E3's conflict resolution lands (the divergence guard reduces but does not eliminate within-window clobber risk). Feature-flagged landing on server.ts/dashboard.html throughout.

Epic E0: Shared primitives (foundation — gates everything)

Frozen-contract primitives consumed by E1b and E2 so neither has to wait on the other's design. Per-uuid bidirectional lock (replaces the per-relPath inflight set; one design, both directions plumbed from day one — the lock cares about uuid+resource, not direction). ccHash as a shared component (hash input contract: canonicalize(htmlToMarkdown(flags["campaign-codex"].data)) + name + folder_path, reusing the contentHash pipeline) — owned here so E1b's divergence guard and E2's deep-pull compare share one implementation. Schema-version naming fixed up front: flagsSchemaVersion (Foundry-side flags) vs syncStateSchemaVersion (local sync-state) — distinct names, distinct owners (E1b / E4). FRs covered: FR-1.4 (component + contract), FR-3.1 (lock), FR-5.5 (lock) NFRs: NFR-1 (lock), NFR-3 (lock under burst) Dependencies: none — landed and frozen first.

Epic E1a: Hash spike (go/no-go gate)

The HTML→markdown inverse of obsidianToFoundryJsonLive (linkedom) + a round-trip hash-stability unit test (push a note → /get → re-hash → compare). If the hash can't be made stable across the round-trip, E1b's entire design changes (fallback: canonicalize Foundry HTML, hash the HTML, never hash markdown). One sitting; the spike is a gate, not a story in sequence — E1b is not designed until this passes. Timeboxed to 23 days (a spike with no box is a research project that blocks all 37 stories), and the pass criterion is a binary, reproducible test file — a vitest suite that goes green or red, not a debatable judgment call: O→F→O and F→O→F round-trips across the eight fixture categories, anchored against canonicalize (src/normalize.ts:19). If the spike goes NO-GO, the fork is the E1b-alt stub epic (canonicalize Foundry HTML directly, hash the HTML, never hash markdown) — an explicit, cataloged fork, not a void; see E1b-alt below. FRs covered: FR-1.4 (stability test) NFRs: NFR-1 (hash comparability) Dependencies: E0 (hash input contract).

Epic E1b-alt: NO-GO fallback (stub — unstaffed unless E1a fails)

Activated ONLY if E1a's hash-stability spike returns NO-GO (the htmlToMarkdown round-trip cannot be made stable). Replaces E1b's markdown-hash divergence guard with a direct-HTML design: canonicalize Foundry's flags["campaign-codex"].data HTML and hash the canonicalized HTML (never hash markdown), so the O-side baseline becomes the canonicalized HTML of the last-pushed body rather than a markdown round-trip. E0.2's frozen contract changes: ccHash = contentHash(canonicalizeHtml(flags["campaign-codex"].data) + "\n" + name + "\n" + folder) — the HtmlToMarkdown seam is dropped, a CanonicalizeHtml seam is added, and E2's F→O pull writes the Foundry HTML verbatim into the note (no markdown inversion) or stores it under a foundry.ccHtml field. This stub is cataloged so the NO-GO fork is explicit in the structure; it re-baselines E1b (8 stories) and every downstream consumer of the hash contract (E2, E3, E4 parity). Not staffed while E1a is GO. FRs covered: FR-1.4 (alt contract) — supersedes E1b/E2 hash consumers. NFRs: NFR-1 (alt hash comparability). Dependencies: E0 (lock; ccHash contract re-opened), E1a (NO-GO verdict).

Epic E1b: Safe O→F controller hardening (Slice 0)

When the DM saves a linked, seeded note during prep, it pushes to live Foundry instantly — and never overwrites a Foundry-side edit. Builds the divergence guard on E0's ccHash + lock, reusing pushNote's existing /get (no extra round-trip). First merged story = the AutoSyncController no-clobber fix (don't build safety on the unsafe substrate). Then TOCTOU post-push re-verify, pre-push Foundry backup + "Revert last push", apply-mode gating, flags migration (flagsSchemaVersion), and the persistent log (records what revert/push did — pulled forward from E5). Auto-sync ships OFF-by-default / opt-in per session until E3. The uncommitted AutoSyncController becomes safe to ship. Live end-to-end verification (SM-2) is gated on the operator's headless session. Both baselines (contentHash + ccHash) persist in the note's existing foundry: frontmatter block (extended baselineFoundryBlock), NOT in E4's sync-state.json — so E1b ships full 2×2 no-clobber in Slice 0 without cross-depending on E4 (Slice 1); E4's sync-state.json is a separate aggregate (parity/activity/mode/conflicts). FRs covered: FR-1.11.10, FR-5.1, FR-5.2, FR-5.3 (policy + O→F), FR-5.6, FR-5.7, FR-5.8 NFRs: NFR-1, NFR-2, NFR-3, NFR-4 (policy), NFR-5 (log), NFR-10, NFR-11 (flags) Dependencies: E0 (lock + ccHash), E1a (hash stability — gate).

Epic E7: Security & access control (Slice 0, parallel with E1b — hard launch gate)

The dashboard is safe to expose. Bind 127.0.0.1 by default (no auth needed on-box); require an auth token only when the operator changes the bind to 0.0.0.0 (refuse to start otherwise). Never render secrets to the browser (masked presence only). Guard POST mutation endpoints with CSRF/same-origin. First story = the auth contract (middleware signature, how endpoints declare their auth requirement), locked before E2/E3/E4 add endpoints. Scoped to auth + bind only — no poll-loop edits (stays parallel-safe with E2). Behavior change from today's 0.0.0.0-no-auth default. FRs covered: FR-7.17.4 NFRs: NFR-9 Dependencies: none — parallel with E1b; auth contract locked first.

Epic E2: Foundry→Obsidian auto-sync (Slice 1 — "safe but silent")

When the DM edits in Foundry during prep, changes flow into the vault — renames/new/missing via shallow poll (default ON), content/moves via deep poll (default ON, minutes cadence, mapPool-capped, reusing E0's ccHash + lock) — without ever clobbering an Obsidian edit. New Foundry entries surface as one-click import candidates in a separate "live new entries" list (not the LevelDB pool). A "catch up now" trigger forces an immediate deep sweep. Retry logic lives here (inline or lose state), applying E1b's retry policy to the poll path. Supersedes ADR-005 for rename/new/missing + content/move. Marked "safe but silent" — not user-visible until E2+E4+E3 all land. FRs covered: FR-2.12.6, FR-3.1 (lock reuse), FR-5.3 (poll application), FR-5.5 (poll under burst) NFRs: NFR-3 (poll cadence/load), NFR-4 (poll retry) Dependencies: E0 (lock + ccHash), E1b (retry policy, fail-safe posture).

Epic E4: Sync status & parity (Slice 1)

The DM always knows whether sync is on or off, whether the vault and Foundry agree, and when it last synced — in a persistent dashboard header + parity indicator and via a maintained Sync Status.md note in the vault. State lives in a persisted sync-state.json (syncStateSchemaVersion) that survives restart — the single status source, also used by E3 for conflict persistence. A loud "SYNC PAUSED" state replaces today's silent off. Owns the PREP/ RUN-THE-MATCH mode flag (it gates whether AutoSyncController runs at all). Activity panel (last 200, scrollable) lands here — observability pulled forward from E5 so F→O isn't running dark. FRs covered: FR-4.14.7, FR-5.4, PREP/RUN mode flag NFRs: NFR-5 (status + activity) Dependencies: E1b (parity state, log); benefits from E2.

Epic E3: Conflict resolution UX (Slice 2 — "you can resolve conflicts")

When both sides changed since last sync, the DM gets a clear side-by-side conflict with a plain-language summary and three explicit, safe resolution actions — including an unambiguous "Accept both as-is (keep divergence)" with a confirmation stating what happens to each side. No data-loss footguns; conflict state persists across ticks and restarts (via E4's sync-state.json). Neutral ordering (vault left, Foundry right, no pre-highlighted action). Once E3 lands, O→F auto-sync can flip to ON-by-default. FRs covered: FR-3.1 (routing), FR-3.23.10 NFRs: NFR-1 (conflict path) Dependencies: E1b (two-baseline routing model + lock) + E4 (sync-state.json persistence).

Epic E5: Diagnostics polish (Tail)

The remaining hardening that didn't migrate into E1b/E2/E4: the one-click redacted "Copy diagnostics" bundle (log tail + config with secrets redacted + parity + relay/clientId status). Retry-split, persistent log, and activity panel have already landed in their natural homes. FRs covered: FR-5.9 NFRs: NFR-5 (diagnostics) Dependencies: E1b (log), E4 (parity/activity) — can land incrementally.

Epic E6: Onboarding & config guidance (Tail)

A non-author DM sees exactly what's missing — an empty RELAY_CLIENT_ID, no connected Foundry client — and is guided to fix it from the dashboard with no shell command. When exactly one client is connected, the dashboard says "auto-resolved, no pick needed" instead of showing an empty list. A minimal "how to run this" README stub is touched every slice; E6 owns the final onboarding pass (not a retroactive doc dump). FRs covered: FR-6.16.3 NFRs: NFR-6 Dependencies: none (independent; small) — README stub touched per slice.

Delivery order (slice-ordered, dependency-respecting)

E0 (foundation, frozen) → E1a (go/no-go gate, 23 day timebox; NO-GO forks to E1b-alt) → [E1b ∥ E7] = Slice 0 ("safe to push; auth scaffolding DARK — not yet safe-to-expose" — E1b delivers safe push, E7 ships its auth contract flagged off; the public-exposure flip is the later launch gate) → operator-prereq live-verification gate (SM-2) → [E2 + E4] = Slice 1 ("E4 shows incoming F→O changes; E2 flagged-off plumbing" — Slice 1's visible value is E4's F-pending badge + incoming-changes list + activity panel; E2 runs dark until E3) → E3 = Slice 2 ("resolve conflicts"; flips O→F auto-sync to ON-by-default, with an upgrade-migration pause for pre-E3.5 installs) → E5-remnant → E6 (Tail). E7's auth contract locks before E2 starts; E0 + E1a freeze before E1b/E2 design. Feature-flagged landing on server.ts/dashboard.html across all epics.

Parallel-slice execution rule (Slice 0): E1b ∥ E7 both edit src/server.ts and src/dashboard.html. Feature flags prevent RUNTIME behavior collisions but NOT git edit collisions. Slice 0 MUST be executed worktree-per-agent (each epic gets its own git worktree) with an explicit merge step ordering E7's route-table/middleware edits before E1b's push-path edits (E7's contract locks first). No two parallel-slice agents edit the same file in the same worktree. This rule applies wherever epics run in parallel (Slice 0 today; any future parallel slice).

Epic Stories

Stories drafted in parallel by one sub-agent per epic, grounded in the real codebase (file:line cited throughout). Numbering: {Epic}.{Story}. Each story is sized for a single dev agent in one session and depends only on prior stories / prior-epic contracts (never future ones).

Epic E0: Shared primitives

Story E0.1: Per-UUID bidirectional sync lock replacing the per-relPath inflight set

As a sync engineer, I want a single per-UUID lock that gates both Obsidian→Foundry and Foundry→Obsidian operations on the same entry, So that while one direction is in flight for a uuid the other direction queues or skips, eliminating cross-direction clobber and TOCTOU races without two separate guard systems.

Acceptance Criteria: Given the existing AutoSyncController in src/server.ts whose inflight = new Set<string>() at src/server.ts:463 is keyed by vault-relative path and only consulted by the O→F process() path at src/server.ts:582-617 (the F→O direction has no guard today) When a dev agent introduces a new SyncLock primitive (new module, e.g. src/synclock.ts) keyed by Foundry uuid (the foundry.cc_uuid value, normalized to the full JournalEntry.<id> form used by relay.getEntry at src/relay/client.ts:69) plus a resource tag ("push" | "pull" | "baseline"), replacing the Set<string> at src/server.ts:463 Then SyncLock.acquire(uuid, op) returns acquired: true when no entry for that uuid is held, and acquired: false (with the held op) when the uuid is already locked — the lock is per-uuid, NOT per-direction, so an in-flight O→F push blocks an F→O pull on the same uuid and vice versa And the lock exposes release(uuid, op), withLock<T>(uuid, op, fn) (acquire → await fn → release in a finally), isHeld(uuid), and heldOps() for diagnostics; releasing an un-held uuid is a no-op (not a throw) so error-path cleanup can't crash the watcher And AutoSyncController.process at src/server.ts:582 is refactored to call withLock(uuid, "push", async () => { ... }) wrapping the body between this.inflight.add(relPath) (line 583) and this.inflight.delete(relPath) (line 615); the inflight.has(rel) debounce short-circuit at src/server.ts:568 is preserved as a fast pre-check but the authoritative gate becomes SyncLock.isHeld(uuid) And the change is feature-flagged behind a SYNC_LOCK_ENABLED env flag (default true for new code paths, but with a fallback that keeps the old Set<string> behavior when false) so landing on src/server.ts is reversible; with the flag off, behavior is byte-identical to today And when a second op for the same uuid is attempted while the first is held, the second op's behavior is configurable via a LockConflictPolicy enum: "skip" (log + return, current auto-sync semantics for redundant saves) or "queue" (await the holder, then acquire — bounded by a max-wait-ms to avoid unbounded stalls); auto-sync uses "skip", manual dashboard buttons use "queue" with a 5s max wait And the lock is reentrant-NO: a second acquire of the same uuid from inside a held withLock callback returns acquired: false (deadlock-safe), and a unit test asserts this And unit tests cover: (1) cross-direction exclusion (O→F withLock running, F→O acquire returns false), (2) burst of N concurrent withLock(uuid, "push", ...) calls execute strictly one at a time (FR-5.5 / NFR-3), (3) release-on-throw (fn rejects → lock is released, next acquire succeeds), (4) different uuids proceed concurrently (no global lock), (5) "queue" policy with holder rejecting still releases its slot And the lock does NOT depend on E1b's divergence guard, retry policy, or persistent log — those consume this primitive; the lock surface is frozen on landing so E1b and E2 can code against it without re-coordination And relPath→uuid resolution happens OUTSIDE the lock: the watcher/debounce keys on relPath (src/server.ts:564-571), but the uuid is resolved by reading the note's foundry.cc_uuid frontmatter (readFoundryBlock, src/frontmatter.ts:9) BEFORE SyncLock.acquire is called — resolution is a non-mutating read, so it needs no lock and there is no two-phase lock-upgrade or deadlock-ordering rule; the uuid lock is the ONLY lock And a fallback key is defined for unresolved files: if a save's note has no foundry.cc_uuid (unlinked) or malformed frontmatter, SyncLock.acquire is called with the key "relPath:" + relPath (a distinct, namespaced pseudo-uuid) so un-keyed files STILL get per-file mutual exclusion — this is NOT a regression vs the old per-relPath inflight Set; the F→O direction (which receives a uuid from the relay) never hits this fallback, so the bidirectional claim holds for all linked notes and the fallback only covers the unlinked tail And SYNC_LOCK_ENABLED=false means SyncLock is not consulted AT ALL and the existing inflight = new Set<string>() at src/server.ts:463 remains the SOLE guard (byte-identical to today, including its dual role as the self-write guard); SYNC_LOCK_ENABLED=true (default) means inflight is removed and SyncLock is the sole guard — the two are mutually exclusive, never both active, so there is no doubled collision surface on src/server.ts And the lock is a CONCURRENCY guard, NOT a self-write/re-entrancy guard — the baseline write fired by a successful push re-triggers the watcher (see E1b.2's self-write suppression AC), and that suppression is a separate mechanism from this lock; E0.1 owns only the cross-op exclusion contract

Story E0.2: ccHash compute wrapper with a frozen input contract and a tested E1a seam

As a sync engineer, I want a single ccHash function that, given a relay /get response, derives a hash comparable to the Obsidian-side contentHash baseline, So that E1b's divergence guard and E2's F→O deep-pull compare can detect "Foundry's stored content actually changed" without each re-deriving the hash input contract and without an extra /get round-trip.

Acceptance Criteria: Given Foundry stores the curated body as HTML in flags["campaign-codex"].data (consumed by obsidianToFoundryJsonLive in src/toFoundry.js via buildPushPayload at src/push.ts:25-28) and pushNote already fetches the full live entry via relay.getEntry(id) at src/push.ts:142 (the response is the full doc incl. folder, per src/relay/client.ts:69 and docs/relay-api.md GET /get), and the Obsidian-side contentHash at src/normalize.ts:24 is sha256(canonicalize(text)) where canonicalize (src/normalize.ts:19) applies canonicalizeWikilinks then canonicalizeWhitespace When a dev agent introduces ccHash(liveEntry: JournalEntry, inverse: HtmlToMarkdown): string (new module, e.g. src/cchash.ts) with the FROZEN input contract documented in a docblock: hash = contentHash(canonicalize( inverse(flags["campaign-codex"].data) ) + "\n" + name + "\n" + folder_path) where name is liveEntry.name and folder_path is liveEntry.folder ?? "" (the relay /get returns folder as a string id/path; absent folder → empty string, matching the Obsidian-side absence) Then the wrapper extracts flags["campaign-codex"].data (string HTML), name (string), and folder (string | undefined) from the /get response, runs data through the injected inverse (the E1a-provided htmlToMarkdown), concatenates with "\n" + name + "\n" + folder_path, and reuses contentHash from src/normalize.ts:24 so the canonicalize + SHA-256 pipeline is shared (NOT re-implemented) And the HtmlToMarkdown seam is typed as (html: string) => string and is an explicit parameter (NOT a module-level import) so E0.2 ships with a tested stub inverse (e.g. tag-stripping regex) and E1a swaps in the real linkedom-based htmlToMarkdown without touching ccHash — the seam is the contract boundary, frozen on landing And a unit test asserts the stub inverse + ccHash is deterministic: same /get payload → same hash across runs; and that a one-character change to flags["campaign-codex"].data yields a different hash (sensitivity); and that name or folder changing alone yields a different hash (per the contract — these are part of the hash input, unlike Obsidian-side contentHash which excludes name/folder/frontmatter per src/normalize.ts:24 docblock) And a ccHashFromGet(relay: RelayClient, uuid: string, inverse: HtmlToMarkdown): Promise<{ hash: string; entry: JournalEntry }> helper is provided that wraps the existing relay.getEntry(uuid) call at src/relay/client.ts:69 and returns BOTH the hash AND the live entry — so callers like pushNote (which already does getEntry at src/push.ts:142) can derive ccHash from the SAME response without a second round-trip (FR-1.4 ground rule: no extra /get) And when flags["campaign-codex"] is absent or flags["campaign-codex"].data is not a string, ccHash throws a typed CcHashError("missing campaign-codex data") rather than silently hashing undefined — so E1b's divergence guard can distinguish "no Foundry-side content yet" (treat as fresh/seed) from "content changed" And when the relay /get returns 404 "Invalid client ID" or "No connected Foundry clients found" (per docs/relay-api.md Global mechanics), ccHashFromGet surfaces the relay error unchanged (it does NOT wrap it as a CcHashError) — Foundry-connectivity failures are the relay client's domain, not the hash's And the hash input contract is documented in a CC_HASH_CONTRACT constant (the canonical string template) and a unit test pins that constant's exact bytes, so any future drift to the contract is a deliberate, reviewable change — this is the frozen contract E1b and E2 code against And the wrapper does NOT depend on E1a's real inverse (stub is fine for tests), does NOT depend on E1b's flagsSchemaVersion migration, and does NOT wire itself into AutoSyncController.process or baselineFoundryBlock — that wiring is E1b's job; E0.2 only delivers the frozen primitive + tests And name and folder_path in the contract are ALWAYS sourced from the JournalEntry (liveEntry.name, liveEntry.folder), NEVER from the Obsidian filename or vault-relative folder — so the hash is DIRECTION-INVARIANT: ccHash is a Foundry-side-identity hash by construction, and the O-side baseline (foundry.ccHash) stores the ccHash computed from the Foundry entry captured at the last push/pull. This makes the O-side baseline and the E2 F-side live ccHash comparable BY CONSTRUCTION — there is no Obsidian-filename-vs-Foundry-name sanitization mismatch to reconcile, because the vault filename never enters the hash. (A vault rename changes the filename but NOT foundry.ccHash until a push updates the live entry's name — correct: a rename is a name-field update routed through pushNote's updatedName path, not a content divergence. See E3.5.) And a unit test asserts direction-invariance directly: given the same flags["campaign-codex"].data + name + folder, ccHash is identical regardless of which epic calls it (E1b's push path and E2's pull path both compute the same value for the same Foundry entry); and a test asserts that renaming the vault file (without changing the live entry) leaves foundry.ccHash unchanged — pinning the contract so E2 cannot accidentally re-define name/folder as vault-side values and re-open E0.2

Story E0.3: Schema-version naming constants — flagsSchemaVersion vs syncStateSchemaVersion

As a sync engineer, I want the two distinct schema-version concepts in this system to have distinct, named, owned constants defined up front, So that E1b (Foundry-flags migration) and E4 (local sync-state migration) never collide on a single schemaVersion field, and downstream readers can branch on which schema they're validating without ambiguity.

Acceptance Criteria: Given the system has two unrelated schema-version concepts: (1) Foundry-side flags shape, stored in flags["campaign-codex"] on the live entry returned by relay.getEntry at src/relay/client.ts:69 (the data/type/image sub-flag shape that obsidianToFoundryJsonLive produces and buildPushPayload at src/push.ts:25-28 diffs) — owned by E1b; and (2) local sync-state shape, the on-disk record of last-synced hashes/timestamps per uuid — owned by E4 When a dev agent introduces a src/schema-version.ts module exporting two distinct string constants: FLAGS_SCHEMA_VERSION = "flags-campaign-codex/v1" and SYNC_STATE_SCHEMA_VERSION = "sync-state/v1", plus a SchemaVersion branded type alias per constant to prevent cross-assignment Then each constant is documented with its owner epic (E1b for FLAGS_SCHEMA_VERSION, E4 for SYNC_STATE_SCHEMA_VERSION), its storage location (Foundry flags["campaign-codex"].schemaVersion for the former; the local sync-state file's top-level schemaVersion field for the latter), and its migration policy (bump + migrate-on-read in the owning epic; the other epic must NOT touch it) And a unit test asserts the two constants are unequal strings, have distinct prefixes (flags- vs sync-), and that the branded types FlagsSchemaVersion and SyncStateSchemaVersion are not assignable to each other (compile-time guard via a nominal brand) And the module exports a parseSchemaVersion(raw: string): { kind: "flags" | "sync-state"; version: string } | null helper that branches on the prefix and returns null for unknown prefixes — so a reader handed an arbitrary schemaVersion field can determine which schema it belongs to without guessing And the module does NOT define migration logic, does NOT define the on-disk sync-state shape (E4's job), and does NOT touch baselineFoundryBlock at src/server.ts:289 or baselineNote at src/server.ts:307 — E1b extends those to write flags["campaign-codex"].schemaVersion; E0.3 only fixes the naming/ownership contract And the constants are referenced nowhere yet (no call-site changes) — this story is the frozen naming reservation; E1b and E4 import from this module when they wire their migrations, so the names cannot drift between those epics And the module lands behind no feature flag (it's constants + types + a pure parser, zero runtime behavior change) and adds zero imports to src/server.ts or src/dashboard.html

Epic E1a: Hash spike (go/no-go gate)

Story E1a.1: Build htmlToMarkdown inverse + round-trip hash-stability test suite (GO/NO-GO gate)

As a sync engineer, I want a linkedom-based htmlToMarkdown inverse of obsidianToFoundryJsonLive plus a round-trip hash-stability test (push → /get → re-hash → compare), So that I can prove whether contentHash is comparable across the Obsidian and Foundry sides and unblock E1b's divergence-guard design.

Acceptance Criteria: Given the forward transform obsidianToFoundryJsonLive (used in buildPushPayload, src/push.ts:25) converts a refined Obsidian body into HTML stored at flags["campaign-codex"].data, and linkedom is already a project dependency When the dev implements htmlToMarkdown(html: string): string in a new module (e.g. src/fromFoundry.ts) using linkedom to parse the HTML and emit refined markdown mirroring the forward transform's input shape Then htmlToMarkdown is a pure, synchronous, no-network function exported from the new module and importable by E0's ccHash seam And it handles, at minimum: plain text paragraphs, headings (<h1><h6>), unordered/ordered/nested lists, tables (<table>), <img> (round-tripping the ![](servedPath) form produced by processBodyImages, src/push.ts:102), wikilinks rendered as <a> or kept as [[target|display]], inline formatting (<strong>/<em>/<code>), and nested block HTML

Given whitespace and HTML-entity drift (e.g. &amp; vs &, &nbsp;, <br> vs \n, trailing spaces, blank-line runs) are the prime suspects for hash instability When htmlToMarkdown output is fed through canonicalize (src/normalize.ts:19, which runs canonicalizeWhitespace then canonicalizeWikilinks) Then the canonical form is deterministic regardless of incidental HTML whitespace/entity differences the forward transform or relay serialization may introduce And canonicalizeWikilinks (src/normalize.ts:4) normalizes [[ a | b ]][[a|b]] so wikilink round-trip survives spacing drift

Given pushNote fetches the live entry via deps.relay.getEntry(id) at src/push.ts:142, and buildPushPayload (src/push.ts:18) builds the diff with flags["campaign-codex"] at src/push.ts:27 When the test suite exercises the round-trip: take a refined note body → obsidianToFoundryJsonLive (or buildPushPayload) to get flags["campaign-codex"].data HTML → simulate the relay round-trip by calling relay.getEntry(id) (or, for offline unit tests, by constructing a JournalEntry whose flags["campaign-codex"].data equals that HTML and passing it back through) → run htmlToMarkdown on the returned datacontentHash (src/normalize.ts:24) the result Then the test compares that hash to contentHash(canonicalize(originalObsidianBody)) and asserts equality for each fixture case: plain text, wikilinks, headings, lists, tables, images, nested HTML, and an entity/whitespace-drift case And a failure on any fixture does NOT abort the suite — each case reports pass/fail independently so the GO/NO-GO decision has full evidence

Given this epic is a GATE, not a story in sequence When the full fixture suite passes (every case: round-trip hash === original body hash) Then the outcome is GO: the spike records the passing result, and E1b is cleared to proceed with the markdown-hash divergence-guard design as specified And when any fixture fails after reasonable remediation attempts (e.g. adjusting htmlToMarkdown to compensate for a known forward-transform asymmetry), the outcome is NO-GO: the spike writes a short findings note documenting which case(s) failed and why, and E1b redesigns to the documented fallback (canonicalize Foundry HTML directly, hash the HTML, never hash markdown) — E1b is not designed until this gate resolves

Given E0 owns the ccHash compute wrapper and the frozen input contract canonicalize(htmlToMarkdown(flags["campaign-codex"].data)) + "\n" + name + "\n" + folder When htmlToMarkdown is stable (GO outcome) Then htmlToMarkdown is wired into E0's ccHash seam exactly as that contract specifies — E0 imports htmlToMarkdown from the new module and calls it on flags["campaign-codex"].data before canonicalize And the spike does NOT redefine E0's contract or recompute the wrapper — it only delivers the htmlToMarkdown function and proves it stable; E0 integrates it

Given shared files (src/normalize.ts, src/push.ts) are touched by multiple epics When the spike lands its changes Then htmlToMarkdown and its tests land behind a feature flag (e.g. CC_HASH_SPIKE env / config gate) so the forward push path (pushNote, src/push.ts:112) behavior is unchanged when the flag is off And no edits to contentHash or canonicalize signatures in src/normalize.ts are made — the spike consumes those functions as-is; if remediation requires a canonicalization tweak, it is flagged as a contract change for E0/E1b review, not silently introduced

Given error and edge conditions matter for the gate's credibility When htmlToMarkdown receives empty HTML, null/undefined, or HTML without a flags["campaign-codex"].data field Then it returns "" (empty body) rather than throwing, so contentHash("") is well-defined and the divergence guard can treat "no content" as a stable baseline And when the relay /get round-trip fails (network error, 404 {"error":"No connected Foundry clients"}, 408/504 timeout per docs/relay-api.md), the test suite marks that fixture as ERROR (distinct from FAIL) so a relay outage is not misread as a hash-instability NO-GO

Given the spike must be completable by a single dev agent in one session When the dev finishes Then the deliverables are: the htmlToMarkdown module, a test file covering the eight fixture categories above, a one-line GO/NO-GO verdict recorded in the test output (or a short findings note on NO-GO), and the feature-flagged seam wiring for E0 And no dashboard UI, no CLI surface, and no production push-path behavior change is included — those belong to later epics

Epic E1b: Safe O→F controller hardening (Slice 0)

Story E1b.1: AutoSyncController no-clobber fix — Foundry-side check via reused /get + ccHash compare

As a DM editing a linked, seeded note during prep, I want auto-sync to refuse to push when Foundry's copy has drifted from what my note last baselined, So that a Foundry-side edit is never silently overwritten by my save.

Acceptance Criteria: Given AutoSyncController.process at src/server.ts:582-617 currently pushes on an Obsidian-body-diff only (bodyHash vs fb.contentHash at src/server.ts:594) with NO Foundry-side check, and pushNote already calls relay.getEntry(id) at src/push.ts:142 When a debounced save reaches process and the body hash differs from foundry.contentHash (so the existing gate passes) Then before constructing or sending the PUT, the controller calls E0's ccHash compute wrapper on the live entry returned by the existing relay.getEntry(ccUuid) call — reusing the SAME /get round-trip pushNote already makes, with NO additional relay call And it compares that live ccHash against the note's foundry.ccHash baseline (the dual-baseline field introduced by this story's extension of baselineFoundryBlock at src/server.ts:289 — schema field name ccHash, with flagsSchemaVersion owned by E1b.8); if the live ccHash differs from the baseline, the push is ABORTED (fail-safe), the event is logged with status skipped and message Foundry-side edit detected — skipped (use Sync / Re-pull to reconcile), and NO PUT is sent And if E0's per-uuid bidirectional lock cannot be acquired (the uuid is already locked in the F→O direction by a running pull), the push is queued (not dropped) per E0's lock contract — never bypassed And if relay.getEntry returns 404 Invalid client ID or No connected Foundry clients found (src/relay/client.ts:60-64 / docs/relay-api.md), the event is logged as error with that message, no PUT is sent, and the note's baselines are left untouched (no clobber, no false baseline) And if foundry.ccHash is absent on the note (legacy seeded note pre-dating this story), the controller treats it as "" and PROCEEDS with the push — but writes the post-push ccHash baseline via E1b.2 (so subsequent saves become guarded); this is the one-time migration path, gated by flagsSchemaVersion startup migration in E1b.8 And the fix is feature-flagged behind AUTOSYNC_FOUNDRY_GUARD (env, default true) on src/server.ts and src/dashboard.html; with the flag off, behavior regresses to the current body-only gate (documented as unsafe, dashboard banner per E1b.5) And this story MUST land and merge before any other E1b story — it is the substrate the rest of the epic builds on; CI must include a regression test simulating a Foundry-side edit between baseline and save and asserting no PUT is sent And the dual baseline (foundry.contentHash + foundry.ccHash) is persisted in the note's EXISTING foundry: frontmatter block via the extended baselineFoundryBlock (src/server.ts:289) — NOT in E4's sync-state.json. This is the crux of why E1b ships full 2×2 no-clobber in Slice 0 WITHOUT cross-depending on E4 (Slice 1): the per-note baselines live in frontmatter (E1b's own turf, already owned by baselineFoundryBlock), while E4's sync-state.json is a separate AGGREGATE (parity counts + activity + mode + conflict rows) that E4 owns in Slice 1. E1b.1's no-clobber reads foundry.ccHash from frontmatter; it never reads or writes sync-state.json. A unit test asserts E1b.1's guard works with sync-state.json ABSENT (proving no E4 dependency)

Story E1b.2: 2×2 routing + dual re-baseline (contentHash + ccHash) on push success

As the DM, I want a successful push to re-baseline BOTH my note's body hash AND the Foundry-side cc hash, So that the next save is correctly gated against both sides and no false "drift" is reported.

Acceptance Criteria: Given baselineFoundryBlock at src/server.ts:289 and baselineNote at src/server.ts:307 currently rewrite only contentHash and syncedAt When a push succeeds in AutoSyncController.process (the guard from E1b.1 passed and pushNote returned an outcome) Then the controller re-baselines BOTH foundry.contentHash (to the canonicalized Obsidian body hash, src/normalize.ts:24) AND foundry.ccHash (to E0's ccHash computed over the live entry returned by the reused /get from E1b.1, captured BEFORE the PUT) AND foundry.syncedAt (ISO timestamp), via an extended baselineFoundryBlock that takes the new ccHash as a third parameter And the 2×2 routing model classifies the post-push state as one of {obsidian-unchanged/foundry-unchanged, obsidian-newer/foundry-unchanged, obsidian-unchanged/foundry-newer, both-changed-divergent} and the success-baseline path here is the "obsidian-newer → foundry-unchanged → push → both baselined to parity" route; the other three routes are handled by E1b.3 (divergence) and E1b.5 (skip rules) And the baseline write goes through writeWithBackup at src/server.ts:116 (apply mode: real vault with .bak; dev mode: --out mirror via targetPath at src/server.ts:73), consistent with the existing baseline semantics And the controller acquires E0's per-uuid lock for the O→F direction for the full read-compare-push-baseline window (lock released in finally), so a concurrent F→O pull on the same uuid cannot split the baseline And the event is logged with status pushed and message including → ${ccUuid} and · baselined (content+cc), matching the existing log shape at src/server.ts:611 And if the baseline write itself fails (disk full, permission), the push is NOT rolled back (Foundry is already updated) but the event is logged as error with baselined FAILED: <msg> — next save may re-push or mis-detect drift; the in-memory ccHash baseline is still updated so the next save uses the correct value And the baseline write includes a SELF-WRITE SUPPRESSION mechanism separate from E0's lock: the controller records { relPath, mtime: stat.mtimeMs, baselineHash } in an in-memory recentlyBaselined map with a 2s TTL (configurable via AUTOSYNC_BASELINE_SUPPRESS_MS), and the watcher's onChange (src/server.ts:557) checks this map BEFORE queueing the debounce — if the change event's relPath matches a recent baseline entry AND its mtime equals the recorded mtime (the controller's own write, not a user edit), the event is dropped with a skipped: self-write (baseline) log line and NO debounce timer is armed. This prevents the push→baseline→watcher→re-push infinite loop that per-uuid locking alone cannot stop (the lock is released before the debounce fires). A user edit arriving within the TTL window has a DIFFERENT mtime and is NOT suppressed. A unit/integration test asserts: (1) a successful push produces exactly ONE process invocation (no re-push), (2) a user edit within 100ms after the baseline write IS processed (different mtime), (3) the TTL entry expires and a later same-mtime event (impossible in practice but forced by the test) is processed after expiry

Story E1b.3: TOCTOU post-push re-verify → conflict row instead of baseline on divergence

As the DM, I want auto-sync to re-check Foundry immediately after my push lands and, if someone edited Foundry in that window, surface a conflict instead of silently baselining over it, So that a true race-condition edit is caught and not papered over.

Acceptance Criteria: Given E1b.1 captures the live entry pre-PUT and E1b.2 baselines against that captured entry, but Foundry can be edited by another client in the window between the GET and the PUT response When pushNote returns success and the controller is about to baseline (E1b.2) Then the controller issues a SECOND relay.getEntry(ccUuid) (a dedicated re-verify call — this is the ONE additional /get the epic permits, justified by the TOCTOU window), recomputes E0's ccHash on the returned entry, and compares it to the ccHash computed from the pre-PUT captured entry And if the two ccHashes match (no TOCTOU edit), the controller proceeds to baseline per E1b.2 and logs pushed And if they DIFFER, the controller does NOT baseline — instead it records a conflict row in the in-memory conflict list (shape: { uuid, name, obsidianHash, foundryPreHash, foundryPostHash, time, relPath }), logs the event with status skipped and message TOCTOU conflict — Foundry edited during push, NOT baselined; use Sync / Re-pull, and leaves foundry.contentHash/foundry.ccHash UNCHANGED so the next save re-enters the guard and surfaces the divergence cleanly And the conflict row is exposed via a new GET /api/autosync/conflicts endpoint (auth declared through E7's middleware contract) returning { conflicts: [...] }; the dashboard renders a count badge next to the Auto-sync button (src/dashboard.html:92) when non-zero And if the re-verify /get fails with 404/timeout (src/relay/client.ts:60-64), the controller treats it as a transient error (per E1b.7), does NOT baseline, logs error with TOCTOU re-verify failed: <msg> — not baselined, and the push remains live in Foundry (no rollback) — the DM is told to reconcile manually And the conflict list is bounded (last 50, newest first) and cleared on a successful subsequent push of the same uuid

Story E1b.4: Pre-push Foundry backup cache + retention + "Revert last push" action

As the DM who just pushed a wrong edit, I want a one-click "Revert last push" that restores the Foundry entry to its pre-push state, So that I can undo a mistake without hunting through backup files or running a shell command.

Acceptance Criteria: Given pushNote already writes a live-entry backup to <outDir>/bak/<noteName>.<stamp>.json at src/push.ts:176-181, and the memory rule "UI-only, no scripts" forbids handing the user a shell command When a push succeeds in AutoSyncController.process Then the controller records a per-uuid "last push" record in memory: { uuid, name, backupPath, time, relPath } pointing at the bak/<name>.<stamp>.json written by src/push.ts:180, and exposes it via GET /api/autosync/last-push?uuid=... (auth via E7) And backups land in the documented foundry-backups/<uuid>/<iso>.json path (NOT the flat bak/<name>.<stamp>.json used by manual push — auto-sync uses the per-uuid layout so retention is per-entry), created with mkdir -p for the uuid subdir And retention keeps the last N backups per uuid (N configurable via AUTOSYNC_BACKUP_RETAIN, default 10); older files in foundry-backups/<uuid>/ beyond N are deleted on the next push of that uuid, with the deletion logged to the persistent log (E1b.8) And the dashboard (src/dashboard.html) renders a "Revert last push" button in the auto-sync panel (src/dashboard.html:94) for the most recent pushed note, wired to a new POST /api/autosync/revert { uuid } endpoint that: reads the last backup JSON, calls relay.updateEntry(uuid, fullBackupDoc) with the FULL document (not a diff — this is the one place a full PUT is correct, to restore _id/pages/ownership/flags exactly), and on success re-baselines the note's foundry.contentHash/foundry.ccHash to match the restored Foundry state via the E1b.2 baseline path And revert acquires E0's per-uuid lock in the F→O direction (it's a Foundry→vault reconciliation), queues if the O→F direction holds it, and refuses to run if no last-push record exists for the uuid (400 { error: "no last push for <uuid>" }) And revert is feature-flagged behind the same AUTOSYNC_FOUNDRY_GUARD flag as E1b.1 (a Foundry-side guard is meaningless without the backups to revert to); with the flag off, the button is hidden And if the backup file is missing from disk (manual cleanup), revert returns 409 { error: "backup file missing: <path>" } and logs error; the DM is told the backup was removed

Story E1b.5: Apply-mode gating + auto-sync OFF-by-default/opt-in-per-session + dev-mode banner + watcher skip rules

As the DM running the server in dev mode to preview changes, I want auto-sync to refuse to go live unless the server is in apply mode, and to stay OFF by default until I opt in per session, So that I never accidentally push dev-mode mirror edits to live Foundry.

Acceptance Criteria: Given resolveRefined at src/server.ts:85 returns the dev mirror in dev mode (the --out/refined copy, not the real vault), and the epic constraint that O→F auto-sync is OFF-by-default/opt-in-per-session until E3 ships When the DM toggles auto-sync ON via the dashboard button (src/dashboard.html:92) calling POST /api/autosync { enabled: true } (src/server.ts:686-696) Then if state.cfg.mode !== "apply", the controller rejects the toggle with 400 { error: "auto-sync requires --apply mode (server is in dev mode — enable would push mirror edits to live Foundry)" }, leaves enabled=false, and the dashboard button stays in the OFF state with a toast explaining why And auto-sync starts OFF every server start (the AutoSyncController constructor at src/server.ts:457 sets enabled = false — this story ENFORCES that no startup path flips it on, and the toggle is per-session: a server restart always returns to OFF, no persisted-on state until E3) And the dashboard (src/dashboard.html) shows a dev-mode banner whenever GET /api/status (src/server.ts:657) returns mode: "dev": a yellow strip reading "Dev mode — auto-sync disabled; pushes would target the --out mirror, not live Foundry" rendered above the auto-sync panel at src/dashboard.html:94 And the watcher skip rules in onChange at src/server.ts:557-572 are extended: alongside the existing .obsidian and dotfile skips (src/server.ts:561), it skips (a) status-note paths matching the FR-4.3 status-note path exclusion (wiring lives here in the watcher — the path predicate is sourced from a constant STATUS_NOTE_PATHS this story defines, e.g. ["_meta/", "wiki/", ".raw/"]), (b) notes whose foundry.cc_uuid is absent (unlinked — already skipped downstream at src/server.ts:591 but cheap to skip at the watcher to avoid queueing), and (c) notes whose foundry.contentHash is absent (unseeded — same) And the skip is logged with status skipped and a specific reason (status-note path, unlinked, unseeded, or dev mode — auto-sync off) so the DM can see WHY a save didn't push And when E3 ships, this story's OFF-by-default toggle is flipped to ON-by-default by removing the per-session reset — but until then, the dashboard banner reads "Auto-sync is opt-in per session" near the toggle And apply-mode gating is NOT feature-flagged (it's a hard safety floor); the dev-mode banner and the per-session reset ARE feature-flagged behind AUTOSYNC_FOUNDRY_GUARD so the flag-off path keeps the current dev behavior

Story E1b.6: Recursive-watch fallback + debounce 700ms + concurrency 3 validated against ~50-note burst

As the DM bulk-pasting a batch of ~50 notes into my vault, I want auto-sync to handle the burst without dropping pushes or racing duplicates, So that every changed note lands in Foundry exactly once.

Acceptance Criteria: Given AutoSyncController already has recursive-watch fallback (src/server.ts:506-514), rescanSubs (src/server.ts:520-544), debounceMs = 700 (src/server.ts:467), and concurrency = 3 (src/server.ts:466), with per-relPath inflight at src/server.ts:463 When ~50 notes are saved in a short burst (e.g. a bulk paste or Obsidian's "Paste as markdown" of a large doc) Then each relPath is debounced independently for 700ms (the timer map at src/server.ts:564-571), so rapid successive saves of the same file collapse to one push; the burst is drained through the queue/drain loop at src/server.ts:574-580 with at most 3 concurrent process calls And E0's per-uuid lock (replacing the per-relPath inflight at src/server.ts:463, per the epic contract) is acquired per uuid — so if two relPaths resolve to the same ccUuid (e.g. a rename within the vault), the second queues behind the first and does NOT race a duplicate PUT And a validation harness (this story's deliverable) simulates a 50-note burst by writing 50 .md files into a temp refined dir and asserting: (a) exactly 50 process invocations complete, (b) no uuid is pushed twice, (c) no Error: lock busy is thrown — instead queued pushes complete after the holder releases, (d) total wall time is bounded (< 30s with concurrency 3 against a mock relay with 100ms latency) And the recursive-watch fallback (non-recursive platform) is exercised by the harness by forcing recursive = false (src/server.ts:512) and asserting rescanSubs picks up a subdir created mid-burst (new folder writes are watched within one debounce window) And if a push throws (relay 500/timeout), the concurrency slot is released in finally (src/server.ts:578) and the queue drains the rest — one failure does not stall the burst; the failed note is retried per E1b.7's policy And concurrency and debounce are configurable via AUTOSYNC_CONCURRENCY (default 3) and AUTOSYNC_DEBOUNCE_MS (default 700) env vars, validated to sane bounds (concurrency 1-8, debounce 100-5000)

Story E1b.7: Transient/persistent retry policy split (O→F application)

As the DM, I want transient relay blips to retry automatically and persistent failures to surface immediately without burning Foundry, So that a flaky network doesn't lose my push and a real config error doesn't spam retries.

Acceptance Criteria: Given the relay returns 408/504 Request timed out (docs/relay-api.md) and 404 Invalid client ID / No connected Foundry clients found (src/relay/client.ts:60-64) for different failure classes When a push in AutoSyncController.process throws Then the controller classifies the error: TRANSIENT = relay 408, 504, 500, network fetch failure (ECONNRESET, ETIMEDOUT, ENOTFOUND); PERSISTENT = 400, 401, 403, 404 Invalid client ID, 404 No connected Foundry clients found, 404 relay /get returned no data (src/relay/client.ts:71), and any non-relay error (e.g. "no foundry.cc_uuid" from src/push.ts:123) And TRANSIENT errors retry with exponential backoff: 3 attempts at 500ms, 1500ms, 4500ms (jittered ±20%), re-acquiring E0's per-uuid lock per attempt; if all 3 fail, the event is logged as error with transient retry exhausted: <msg> and the note is NOT baselined (so the next save retries from scratch) And PERSISTENT errors do NOT retry — logged immediately as error with persistent: <msg>; the note is not baselined and the DM is told via the dashboard event log to fix the underlying issue (auth, clientId, missing entry) And during retry, E0's per-uuid lock is released between attempts (so a concurrent F→O pull isn't blocked for the full 4.5s backoff window) and re-acquired for each attempt; the note stays in the controller's "retrying" set so a concurrent save of the same uuid queues, not duplicates And a retry that succeeds on attempt 2 or 3 proceeds through the full E1b.2 baseline + E1b.3 TOCTOU re-verify path — retries are not a shortcut around the guard And the retry policy is feature-flagged behind AUTOSYNC_FOUNDRY_GUARD; flag-off behavior is the current "log error, no retry" at src/server.ts:612-614 And the policy is unit-tested against a mock relay that returns 504 twice then 200 (asserting success on attempt 3, one baseline write, one TOCTOU re-verify) and against a mock returning 401 (asserting zero retries, immediate error log)

Story E1b.8: Persistent rotated log logs/sync-<date>.log + flagsSchemaVersion startup migration

As the DM troubleshooting a sync issue, I want a persistent, rotated log file of every auto-sync decision and a startup migration that flags my old seeded notes with the current flags schema version, So that I can audit what happened and the controller can reason about which notes have the dual baseline.

Acceptance Criteria: Given AutoSyncController.log at src/server.ts:488-494 currently only keeps an in-memory events array (max 100, src/server.ts:469), and the epic needs NFR-5 (persistent log) and NFR-10/11 (flags schema versioning) When the server starts (src/server.ts:620 startServer) Then it ensures logs/ exists and opens a logs/sync-<YYYY-MM-DD>.log file (appended to, created if absent); every AutoSyncController.log call writes a structured line { time, level, name, status, message, uuid?, relPath? } (JSON-lines) to this file IN ADDITION to the in-memory events array — the in-memory array stays for the dashboard's fast poll, the file is the durable record And the log is ROTATED daily: when the current date changes (a setter on the controller checks new Date().toISOString().slice(0,10) against the open file's date), the old file is closed and a new logs/sync-<new-date>.log is opened; rotation is also triggered if the current file exceeds 10MB (size-based rotation, renamed to logs/sync-<date>.<n>.log) And retention keeps the last 14 days of logs (configurable via AUTOSYNC_LOG_RETAIN_DAYS, default 14); older logs/sync-*.log files are deleted at startup And flagsSchemaVersion (E0 provides the constant; this story owns its MIGRATION) is written into every note's foundry.flagsSchemaVersion field by the E1b.2 baseline path; at server startup, BEFORE auto-sync is allowed to start, a migration pass walks the matched index (state.index.matched), and for any note whose foundry.flagsSchemaVersion is absent OR less than the current constant, it adds the field (value = current constant) via writeWithBackup at src/server.ts:116 — WITHOUT touching contentHash or ccHash (so it doesn't falsely trigger a push) And the migration is IDEMPOTENT: running it twice produces no changes the second time (the field is already current); it's logged to logs/sync-<date>.log as { status: "skipped", name, message: "flagsSchemaVersion migrated <old|absent> -> <current>" } And the dashboard (src/dashboard.html) shows a startup banner when the migration ran and changed ≥1 note: "Migrated N notes to flagsSchemaVersion " (rendered once on first /api/autosync poll after start), dismissible And if a note is mid-edit when the migration reads it (file mtime changes between read and write), the migration SKIPS that note (logs skipped: in-flight edit) rather than racing the DM's save — it'll be caught on the next server restart or the next push's baseline path And the persistent log + migration are feature-flagged behind AUTOSYNC_FOUNDRY_GUARD; flag-off keeps the current in-memory-only behavior and skips migration (documented as leaving old notes un-guarded)

Epic E7: Security & access control (Slice 0)

Story E7.1: Auth contract — middleware signature + endpoint declaration table

As a developer on E2/E3/E4, I want a single auth middleware contract and a route declaration mechanism in src/server.ts, So that every endpoint (mine and future epics') declares its auth requirement once instead of reimplementing auth checks.

Acceptance Criteria: Given the single createServer handler at src/server.ts:629 currently dispatches by inline if (req.method === ... && url.pathname === ...) chains with no auth layer When I add an AuthContext type and a requireAuth: boolean / requireCSRF: boolean annotation per route in a single exported route table (e.g. const ROUTES: Record<string, { method; requireAuth; requireCSRF; handler }>) Then every existing endpoint is migrated into that table (/, /api/index, /api/status, /api/file, /api/entries, /api/autosync GET → requireAuth:false, requireCSRF:false; /api/action, /api/push, /api/push-all, /api/link, /api/refresh, /api/autosync POST → requireAuth:<gated>, requireCSRF:true) and the dispatch loop consults the table before calling the handler And the middleware is a single function authenticate(req, res, route): boolean | Promise<boolean> that returns false after sending a 401 (missing/invalid token) and true to proceed; it reads the token from Authorization: Bearer <token> OR a cookie set by the first-run prompt, compared in constant time against the configured auth token And when requireAuth is false for a route (localhost-default mode), authenticate is a no-op pass-through so E2/E3/E4 read endpoints stay unguarded on-box And the contract is documented in a top-of-file comment block naming the exact shape E2/E3/E4 must use to register new routes (ROUTES["/api/<x>"] = { method, requireAuth, requireCSRF, handler }) — no new auth code in neighbor epics And a feature flag ENABLE_AUTH_MIDDLEWARE (env, default false) gates adoption: when false, the table still exists but authenticate short-circuits to true, preserving today's behavior so this ships without flipping E2's work And error conditions are covered: malformed Authorization header → 401 {error:"bad auth header"}; token mismatch → 401 {error:"unauthorized"}; the existing send() helper at src/server.ts:60 is reused so x-content-type-options: nosniff and cache-control: no-store still apply to error responses And no poll-loop / AutoSyncController code is touched (parallel-safe with E2)

Story E7.2: Localhost-default bind + refuse 0.0.0.0 without token + first-run auth prompt

As an operator exposing the dashboard, I want the server to bind 127.0.0.1 by default and refuse to start on 0.0.0.0 unless an auth token is set, with a first-run auth prompt in the UI, So that the dashboard is safe-by-default and public exposure is an explicit, auth-gated opt-in (FR-7.1, FR-7.2, UX-DR5).

Acceptance Criteria: Given src/server.ts:37 documents host defaulting to 0.0.0.0 and src/server.ts:703 calls server.listen(cfg.port, cfg.host, resolve) with that default, and .env.example currently documents no dashboard auth token When I flip the ServerConfig.host default to 127.0.0.1 (the safe-by-default change) and add a new env DASHBOARD_AUTH_TOKEN documented in .env.example alongside RELAY_API_KEY / RELAY_PASSWORD Then startServer (src/server.ts:620) refuses to start with a clear exit error if cfg.host === "0.0.0.0" AND DASHBOARD_AUTH_TOKEN is unset/empty: throw "refusing to bind 0.0.0.0 without DASHBOARD_AUTH_TOKEN — set the token or bind 127.0.0.1" before server.listen And when bound to 127.0.0.1 with no token, all routes remain open (on-box, no auth) — the behavior change from today is ONLY the default bind address and the public-exposure gate; existing on-box operators who passed --host 0.0.0.0 must now also set the token And when bound to 0.0.0.0 WITH a token, every route flagged requireAuth:true in the table from E7.1 enforces it; routes flagged requireAuth:false (e.g. the login prompt itself) stay open And src/dashboard.html gains a first-run auth prompt: on load, fetch('/api/auth/status') returns {authRequired: bool, bound: '127.0.0.1'|'0.0.0.0'}; if authRequired is true and no valid session cookie is present, the dashboard renders a login card (password/token input) that POSTs /api/auth/login and stores the resulting cookie/token for subsequent fetch calls via an Authorization header injected into a shared request wrapper And the shared fetch wrapper in dashboard.html (replacing the bare fetch('/api/...') calls at lines 145, 149, 258, 296, 311, 323, 330, 354, 365, 398, 429, 443) attaches the token and, on 401, redirects back to the login prompt And edge cases: DASHBOARD_AUTH_TOKEN set but bind is 127.0.0.1 → token is accepted but auth not enforced on routes (localhost is trusted); empty/whitespace-only token is treated as unset; login failure returns 401 {error:"invalid credentials"} without leaking which of token-vs-password was wrong And feature-flagged via ENABLE_AUTH_MIDDLEWARE so flipping the default bind can land independently of E7.1's enforcement; when the flag is off, the bind still defaults to 127.0.0.1 but the 0.0.0.0 refuse-to-start check is skipped (back-compat escape hatch) And a SELF-LOCKOUT guard: startServer (src/server.ts:620) refuses to start if ENABLE_AUTH_MIDDLEWARE === "true" AND DASHBOARD_AUTH_TOKEN is unset/empty — throw "ENABLE_AUTH_MIDDLEWARE=on requires DASHBOARD_AUTH_TOKEN — set the token or disable the flag" before server.listen. This prevents an operator who flips the flag on to demo "safe to expose" from bricking the dashboard with no token set (the only recovery would be live .env editing, an embarrassment footgun). This guard is independent of bind address (it fires even on 127.0.0.1) because the flag itself promises enforcement, and enforcement without a token is a contradiction And AutoSyncController is not touched

Story E7.3: No-secret-egress — masked presence endpoint + audit

As an operator viewing the dashboard, I want the browser to never receive RELAY_API_KEY, RELAY_PASSWORD, or DASHBOARD_AUTH_TOKEN, only a masked presence indicator, So that a page-view or XSS cannot exfiltrate secrets (FR-7.3, UX-DR5 masked presence).

Acceptance Criteria: Given src/server.ts today exposes /api/status (src/server.ts:657) returning {mode, refinedDir, ccDir, outDir} and never serializes secrets, and .env.example defines RELAY_API_KEY (line 20), RELAY_PASSWORD (line 30), and the new DASHBOARD_AUTH_TOKEN When I add /api/auth/status (the endpoint E7.2's prompt polls) returning ONLY {authRequired: bool, bound: '127.0.0.1'|'0.0.0.0', relayConfigured: bool, foundryConfigured: bool} — booleans only, no secret values, no secret lengths Then relayConfigured is !!state.cfg.relayCfg and foundryConfigured is !!state.cfg.foundryCfg — presence, not material And I audit every send(res, 200, ...) payload in src/server.ts (the send helper at line 60 is the only egress point) and confirm none of: RELAY_API_KEY, RELAY_PASSWORD, DASHBOARD_AUTH_TOKEN, relayCfg.apiKey, relayCfg.password, foundryCfg.* secret fields — appear in any response body, including error messages at lines 226, 259, 273, 381, 428, 699 And error strings from relayClient(state) (src/server.ts:231: "relay not configured (start the server with RELAY_API_KEY / --relay-api-key to enable push)") are scrubbed to not echo the key value — the message names the env var, never its content And src/dashboard.html renders a masked presence chip ("Relay: configured ✓ / not configured ✗") driven by /api/auth/status, never by reading env values directly And edge cases: a GET /api/status request from an unauthenticated client in 0.0.0.0 mode is rejected by the requireAuth flag from E7.1 (it leaks dir paths); /api/auth/status is the only unauthenticated endpoint in public mode And a regression guard is added: a test or runtime assertion that JSON.stringify of any response payload does not contain the configured token/password/api-key substrings And AutoSyncController is not touched

Story E7.4: CSRF / same-origin guard on all POST mutation endpoints

As an operator running the dashboard on a shared host, I want every POST mutation endpoint to verify same-origin and a CSRF token, So that a cross-site form or fetch from another origin cannot trigger push/revert/link/refresh/action/autosync mutations (FR-7.4, E1b guard).

Acceptance Criteria: Given src/dashboard.html today POSTs to /api/action (line 296), /api/push (311), /api/refresh (323), /api/autosync (354), /api/push-all (365), /api/link (429) with only content-type: application/json and no origin/CSRF check on the server (src/server.ts:665-696 dispatches POSTs straight to handlers) When I add a requireCSRF enforcement in the E7.1 middleware that runs for every route flagged requireCSRF:true (all six POST endpoints above, plus any E1b mutation endpoints added later via the same table) Then the middleware rejects any request whose Origin (or Referer fallback) host does not match the server's bound host: 403 {error:"cross-origin forbidden"}; for 127.0.0.1 bind this permits same-machine browser and blocks foreign origins; for 0.0.0.0 bind it permits the matching Host/Origin pair and blocks others And a CSRF token is issued by /api/auth/status (or a dedicated /api/auth/csrf GET) as a random per-session value stored in an HttpOnly, SameSite=Strict, Secure-when-https cookie plus a mirrored non-HttpOnly value the JS reads; every POST from dashboard.html includes it via an X-CSRF-Token header added by the shared fetch wrapper from E7.2 And the middleware compares the X-CSRF-Token header to the cookie value in constant time; mismatch or absence → 403 {error:"missing or invalid csrf token"}; OPTIONS preflight and GET/HEAD are exempt And the existing content-type: application/json requests from the dashboard are NOT a CSRF defense on their own (cross-site fetch with application/json triggers a preflight but simple-content-type forms do not) — the token check is the primary guard, same-origin is defense-in-depth And error conditions: Origin header missing AND Referer missing → 403 {error:"origin required"} (curl/scripts must send an explicit Origin: http://127.0.0.1:<port>); valid token but cross-origin → still 403 (both checks must pass); expired session cookie → 401 from E7.1's auth layer before CSRF is evaluated And feature-flagged via ENABLE_AUTH_MIDDLEWARE so it lands with E7.1; when the flag is off, CSRF is a no-op pass-through (today's behavior) so E1b's mutation endpoints can ship before the flag flips And AutoSyncController is not touched

Epic E2: Foundry→Obsidian auto-sync (Slice 1 — "safe but silent")

Epic-level note: E2 is marked safe but silent — F→O sync is not user-visible until E2 + E4 (activity panel / sync-state) + E3 (conflict UX) all land. No story in this epic claims a standalone user-facing release. Feature-flagged end-to-end.

Story E2.1: Shallow poll loop — /search snapshot, detect renames/new/missing, cadence+jitter

As a DM doing prep in Foundry, I want a background poll that periodically snapshots the relay /search (minified) list and detects renamed, newly-created, and missing linked notes within seconds-tens cadence, So that I see Foundry-side structural changes without waiting for a manual re-pull and without hammering the relay.

Acceptance Criteria: Given the server is started with relayCfg set and the E2 feature flag enabled in src/server.ts (gated alongside the existing AutoSyncController at src/server.ts:457, behind a new foundryPoll flag defaulting to on but silent) When AutoSyncController.start() succeeds (or a new FoundryPollController is enabled via POST /api/foundry-poll { enabled: true }) Then a shallow-poll timer fires at a default cadence of 10s with ±20% jitter (configurable via env), calling relay.searchJournalEntries() (src/relay/client.ts:97, ?filter=documentType:JournalEntry&minified=true) And the result set {uuid,id,name,img,documentType} (NO folder, NO content, NO hash — docs/relay-api.md:54-59) is diffed against the previous snapshot keyed by uuid: a name change vs the stored snapshot → rename candidate; a uuid absent from the prior snapshot → new entry (surfaced to the live-new-entries list, E2.5); a uuid present in the linked index but absent from /search → missing candidate. Given a shallow poll round is already in flight When the next timer tick fires Then the new tick is skipped (no concurrent /search round on the same controller), and a skipped:overlap counter is incremented for the activity panel (E4) Given the shallow poll detects a rename for a uuid that maps to a linked refined note When the rename is recorded Then the note is NOT rewritten by this story (deep poll E2.2 / pull E2.3 own writes); the rename is enqueued as a deep-poll candidate for the next deep round and recorded in sync-state.json (E4) under fPending Given relay.searchJournalEntries() throws a transient error (network, 408/504 timeout per docs/relay-api.md:17, or 5xx) When the poll round fails Then E1b's retry policy is applied inline: transient backoff (e.g. cadence × 2, capped) for the next tick; the failure is appended to E1b's persistent log; the controller stays enabled And on a persistent error (404 "No connected Foundry clients found" per docs/relay-api.md:13, or 400 multi-client) the controller surfaces the error immediately to the activity panel and halts the timer (do NOT retry a no-clients condition) Given the E2 feature flag is off When the server starts Then no shallow-poll timer is created and GET /api/foundry-poll returns { enabled: false } without touching the relay

Story E2.2: Deep poll loop — per-linked-note /get + ccHash compare, minutes cadence, mapPool cap, load ceiling documented

As a DM editing existing entries' content or moving them between folders in Foundry, I want a slower deep poll that fetches each linked note's full /get document and compares a derived ccHash to the stored foundry.ccHash baseline, So that content edits and folder moves are detected without the ~800-calls/min cadence ADR-005 rejected.

Acceptance Criteria: Given the E2 feature flag is on and at least one refined note is linked (has foundry.cc_uuid per the check at src/server.ts:591) When the deep-poll timer fires at a default cadence of 5 minutes (configurable, with ±20% jitter) Then the controller builds the candidate list as the intersection of (a) uuids in the latest shallow-poll /search snapshot and (b) uuids present in the in-memory linked index (state.index.matched with entry, src/server.ts:153) And the list is processed with the existing mapPool helper (src/server.ts:317) at concurrency 4 (matching handlePushAll at src/server.ts:361), each iteration calling relay.getEntry(uuid) (src/relay/client.ts:69, ?uuid= + clientId) Given a /get response returns the full doc including folder (docs/relay-api.md:31) When the deep-poll worker computes the ccHash from the entry's campaign-codex data (E0's ccHash compute) and compares it to the stored foundry.ccHash baseline in the note's foundry block (readFoundryBlock at src/server.ts:591 area) Then a ccHash mismatch OR a folder change vs the note's folder_path marks the note F-changed and enqueues it for E2.3 pull routing; a match leaves it untouched Given ADR-005's load ceiling concern (≈800 /get-calls/min rejected as heavy/fragile) When the deep poll runs against a vault with N linked notes Then the realized call rate stays ≤ concurrency / round_seconds = 4 / 300 ≈ 0.8 calls/s (~48/min) regardless of N, because mapPool bounds concurrency and the next round doesn't start until the prior finishes; this ceiling is documented in a code comment next to the cadence constant and in the GET /api/foundry-poll status payload (loadCeilingCallsPerMin) Given a /get returns 404 "No connected Foundry clients found" (docs/relay-api.md:13) When the worker receives that status Then it is treated as persistent (NOT retried), the deep-poll round aborts, the controller surfaces the error to the activity panel, and the timer halts until the user re-enables it Given a /get returns a transient error (timeout 408/504, 5xx, network) When E1b's retry policy is applied inline to that single note's fetch Then the worker retries per policy, and on final transient failure records the note as fPendingRetry in sync-state.json (E4) without aborting the rest of the round Given the deep-poll round is still running when the next timer tick fires When the tick fires Then it is skipped (no overlap), mirroring E2.1's overlap guard

Story E2.3: F→O pull for F-changed+O-unchanged — /get→markdown→vault write (dev-mirror-safe) + dual re-baseline

As a DM who edited a Foundry entry during prep while the matching Obsidian note sat untouched, I want the detected F-changed note pulled into the vault as markdown with the foundry block re-baselined on both sides, So that my vault reflects the Foundry edit and the next sync round sees both sides as in-sync.

Acceptance Criteria: Given a note is marked F-changed by E2.2 AND the Obsidian side is unchanged (the note's body contentHash from contentHash(body) at src/normalize.ts:24 equals the stored foundry.contentHash baseline — i.e. O-side has not diverged) When the pull worker runs Then it acquires E0's per-uuid bidirectional lock for that uuid (so a concurrent O-save push and an F-pull don't double-write), calls relay.getEntry(uuid), converts the entry to markdown (reusing the existing seedBlockContent/buildBlock path at src/server.ts:417), and writes the result via writeWithBackup (src/server.ts:116) Given the server is in dev mode (state.cfg.mode === "dev") When the write target is resolved Then it lands under the mirror via targetPath(state, "refined", relPath) (src/server.ts:73-79), NOT the real vault — preserving dev mode as a safe preview per resolveRefined (src/server.ts:85) And in apply mode it lands in the real vault with a .bak-<stamp> backup (writeWithBackup at src/server.ts:117-122) Given F→O writes are apply-mode gated by E1b When the server is in dev mode Then the write still lands in the mirror (dev mode is the preview channel); in apply mode it lands in the real vault; in neither case does a dry-run mode exist for this path (the fail-safe posture is "don't write unless apply/dev mirror") Given the pull succeeds When the dual re-baseline helper from E1b runs Then both sides are baselined: the note's foundry.contentHash is rewritten to the new body hash (reusing baselineFoundryBlock at src/server.ts:289), and the stored foundry.ccHash baseline (E0) is updated to the newly-fetched ccHash, so the next deep-poll round sees F-side as unchanged And the note is removed from fPending in sync-state.json (E4) Given the pull fails transiently When E1b's retry policy is applied Then the note stays in fPending with a retry counter; on final failure it is surfaced to the activity panel and the persistent log is appended Given the pull fails persistently (relay 404 No connected Foundry clients found, or the entry uuid returns 404) When the failure is classified Then it is surfaced immediately (no retry), the note is marked fPendingError in sync-state.json, and the controller continues with other notes

Story E2.4: Never-clobber routing — vault-newer / both-diverged → pending conflict row + skip (E3 renders)

As a DM who edited the same note in both Foundry and Obsidian (or only in Obsidian) since the last sync, I want the F→O pull path to detect that the Obsidian side has diverged and SKIP the write, surfacing a pending conflict row instead, So that my Obsidian edit is never silently clobbered by a Foundry pull.

Acceptance Criteria: Given a note is marked F-changed by E2.2 AND the Obsidian side has diverged (the note's body contentHash differs from the stored foundry.contentHash baseline, meaning O was edited since last sync) When the pull worker evaluates the routing decision Then it classifies the state as both-diverged (F-changed AND O-changed) and SKIPS the vault write entirely — no writeWithBackup, no re-baseline Given a note is NOT marked F-changed by E2.2 but the shallow poll detected the uuid is missing from /search while the vault note still exists When the routing decision runs Then it classifies the state as vault-newer (F-side missing, O-side present) and SKIPS any F→O write, surfacing a pending conflict row Given a note is routed to both-diverged or vault-newer When the skip happens Then a pending conflict row is recorded in sync-state.json (E4) with { uuid, name, state: "both-diverged" | "vault-newer", detectedAt, lastFHash, lastOHash }, the note is removed from fPending, and the row is exposed via GET /api/foundry-poll under a pendingConflicts array for E3 to render later And this story does NOT render any resolution UI, auto-pick a winner, or auto-write — E3 owns resolution Given the per-uuid bidirectional lock (E0) is currently held by an O→F push for the same uuid When the F→O pull worker tries to acquire it Then the pull worker skips this round for that uuid (does NOT block), leaves the note in fPending, and retries on the next deep-poll round — so an active O-save and an F-pull never race on the same uuid Given the E2 feature flag is off When these routing paths would otherwise fire Then none of this logic runs — the pending-conflicts array stays empty

Story E2.5: Live-new-entries list + one-click import (separate from LevelDB pool, no auto-import)

As a DM who created a brand-new journal entry in Foundry during prep, I want that entry to appear in a separate "live new entries" list in the dashboard with a one-click "Import as new refined note" action, So that I can opt into importing each new entry deliberately rather than having the sync silently dump un-curated notes into my vault.

Acceptance Criteria: Given the shallow poll (E2.1) detected a uuid in /search that is absent from the linked index AND absent from the LevelDB snapshot pool (state.index.ccOnly at src/server.ts:154, built from the journal snapshot) When the new entry is classified Then it is added to a SEPARATE liveNewEntries list on the FoundryPollController (NOT merged into ccOnlyccOnly is the snapshot pool, live entries are a distinct runtime list) with { uuid, name, detectedAt } Given the dashboard has the E2 panel rendered (feature-flagged in src/dashboard.html, hidden when the flag is off) When GET /api/foundry-poll is polled Then the response includes liveNewEntries: [{ uuid, name, detectedAt }] and the dashboard renders a distinct "Live new entries (from Foundry)" section, separate from the existing "cc-only (import candidates)" table at src/dashboard.html:118-119 Given a live new entry is displayed When the user clicks its "Import as new refined note" button Then the dashboard calls a new POST /api/foundry-poll/import { uuid } which fetches the full entry via relay.getEntry(uuid), builds a new refined note via the existing importRow path (src/server.ts:184-187, landing under refined/imported/<cc_folder>/), and removes the entry from liveNewEntries And the import is NEVER automatic: entries sit in liveNewEntries indefinitely until the user clicks; no story here claims auto-import Given a live new entry's name collides with an existing refined note name When the import is attempted Then it is skipped with a surfaced reason ("name already exists in vault — rename in Foundry or link manually"), reusing the result.skipped pattern at src/server.ts:160 Given the user does nothing with a live new entry When subsequent shallow polls run Then the entry stays in liveNewEntries (dedup by uuid) and is not re-added; if it later appears in the LevelDB snapshot (after a manual refresh --full-index), it is removed from liveNewEntries to avoid double-listing Given the E2 feature flag is off When the dashboard loads Then the "Live new entries" section is not rendered and POST /api/foundry-poll/import returns 404

Story E2.6: Catch-up-now trigger + retry policy applied to poll

As a DM about to start a session who wants the vault to reflect everything I just changed in Foundry, I want a "catch up now" button that forces an immediate shallow+deep sweep out of cadence, So that I don't have to wait for the next jittered tick before trusting the vault is current.

Acceptance Criteria: Given the FoundryPollController is enabled (feature flag on) When the user clicks "Catch up now" in the dashboard (new button in the E2 panel in src/dashboard.html, feature-flagged) Then the dashboard calls POST /api/foundry-poll/catchup, which cancels the pending shallow + deep timers, runs an immediate shallow poll round, and on its completion immediately triggers a deep poll round (without waiting for the deep cadence) Given a catch-up shallow round is in flight when the user clicks again When the second click arrives Then it is ignored (debounced) and the activity panel shows "catch-up already running" Given the catch-up deep round runs When it processes the candidate list Then it reuses the same mapPool concurrency cap (E2.2), the same per-uuid lock acquisition (E0), and the same routing (E2.3/E2.4) as a scheduled deep round — no parallel code path Given any poll round (shallow, deep, or catch-up) fails transiently When E1b's retry policy is applied inline Then the round retries per policy (transient backoff), the attempt is appended to E1b's persistent log, and on final transient failure the round is surfaced to the activity panel with a retryExhausted marker without halting the controller Given a catch-up round completes When it finishes Then the activity panel (E4) shows the round summary { shallow: {new, renamed, missing}, deep: {pulled, skipped, conflicts}, durationMs }, sync-state.json's lastPoll is updated, and the regular cadence timers resume from now (next shallow tick = cadence ± jitter, not immediately) Given the E2 feature flag is off When POST /api/foundry-poll/catchup is called Then it returns 404 and no relay calls are made

Epic E4: Sync status & parity (Slice 1)

Story E4.1: Persistent sync-state.json store with syncStateSchemaVersion and restart survival

As a DM, I want the dashboard's sync status (on/off, mode, parity, last-sync, recent activity) to live in a persisted sync-state.json that survives server restarts, So that I never lose the current sync picture when the server reboots and so that E3 (conflict persistence) has a single stable status source to write into.

Acceptance Criteria: Given the server starts via startServer(cfg) at src/server.ts:620, which today holds all status in the in-memory AutoSyncController (events array capped at 100 at src/server.ts:468-469, no persistence) When the server boots Then it loads <outDir>/sync-state.json (or creates it with defaults if absent) before the first request is served And the loaded object's syncStateSchemaVersion is checked against a constant SYNC_STATE_SCHEMA_VERSION = 1 owned by this story (distinct from E1b's flagsSchemaVersion); on mismatch the file is backed up to sync-state.json.bak-<stamp> (reusing backupStamp() from src/server.ts:23) and a fresh default-state object is written, with an error event appended to the new file's activity array Given a populated sync-state.json When any state mutation occurs (auto-sync on/off toggle, mode flip, parity refresh, activity event appended, status-note write) Then the full state object is re-serialized to <outDir>/sync-state.json atomically (write to sync-state.json.tmp then rename) so a crash mid-write never leaves a truncated file And the file's shape is exactly: { syncStateSchemaVersion, mode, autoSyncOn, lastSyncAt, parity: { status, oPending, fPending, conflict, unsyncedLinked, lastPollAt }, watchedDir, activity: [{time,kind,name,status,message}], updatedAt, conflict: null } — the conflict: null field is reserved for E3 and MUST NOT be populated by E4 (E3 owns its content); the field is present so E3's writer does not change the shape Given the server has been running with auto-sync ON and 17 activity events recorded When the server process is killed and restarted Then autoSyncOn is restored to true, the 17 events are still present in activity, and lastSyncAt is preserved And AutoSyncController.enabled (src/server.ts:458) is reconciled to match autoSyncOn on boot — if true, start() (src/server.ts:502) is called, and if it throws because relayCfg is unset (src/server.ts:503), autoSyncOn is flipped to false in the file and an error event "auto-sync could not resume — relay not configured" is appended Given the entire sync-status + parity + PAUSED + status-note + activity-panel feature set When any of E4.2-E4.6 ships Then it is gated behind a feature flag features.syncStatus (read once at boot from cfg / env OFS_SYNC_STATUS=1, default off) stored in the State interface (src/server.ts:53-58); when the flag is off, none of the new endpoints, header, banner, note writer, or activity-panel changes are registered, and the existing /api/autosync (src/server.ts:683-696) and autoSyncPanel (src/dashboard.html:94) behave exactly as today And the flag is a single boolean read site — not sprinkled across other epics — and this story owns its definition

Story E4.2: PREP / RUN-THE-MATCH mode flag gating AutoSyncController + dashboard toggle

As a DM, I want a single PREP / RUN-THE-MATCH mode flag that gates whether auto-sync can run at all, So that during prep (curation, seed, link) I never accidentally push half-curated notes to live Foundry, and during the match auto-sync is unblocked.

Acceptance Criteria: Given the mode field in sync-state.json (owned by E4.1), whose value is "PREP" or "RUN-THE-MATCH" (default "PREP" on a fresh install) When the user clicks a new "Mode: PREP ⇄ RUN-THE-MATCH" toggle in the dashboard header (next to modeTag at src/dashboard.html:82) Then a POST /api/sync-state/mode endpoint flips mode in the persisted sync-state.json and returns the new state And the toggle is the ONLY place in the codebase that writes mode — no other epic sets it ad hoc Given mode === "PREP" When AutoSyncController.setEnabled(true) is called (src/server.ts:496) — whether from POST /api/autosync (src/server.ts:686-696) or from the boot reconciliation in E4.1 Then start() (src/server.ts:502) refuses to attach the watcher and throws Error("auto-sync is blocked in PREP mode — switch to RUN-THE-MATCH first") And autoSyncOn in sync-state.json is forced to false and an error event "auto-sync blocked in PREP mode" is appended to activity And the dashboard's existing autoSyncBtn (src/dashboard.html:92) is rendered disabled with title="Switch to RUN-THE-MATCH mode first" and the existing toggleAutosync() (src/dashboard.html:351) short-circuits without calling the endpoint Given mode === "RUN-THE-MATCH" and relayCfg is set When the user clicks autoSyncBtn Then start() succeeds exactly as today (src/server.ts:502-517), autoSyncOn becomes true in sync-state.json, and the watcher attaches And switching mode from "RUN-THE-MATCH" back to "PREP" while auto-sync is ON calls stop() (src/server.ts:546) first, then persists autoSyncOn=false, so the watcher is torn down before the mode flip returns Given the server restarts with mode === "PREP" and autoSyncOn === true in the file (corrupt/inconsistent state) When E4.1's boot reconciliation runs Then autoSyncOn is corrected to false and a "PREP mode auto-sync disabled on boot" event is appended, because start() would have thrown anyway And the mode flag is read from sync-state.json only (never from env or CLI) so it survives restarts by construction

Story E4.3: Dashboard sync-status header + parity indicator reading one source (sync-state.json)

As a DM, I want a persistent header that shows whether sync is ON/OFF, the current mode, the watched directory, and a parity indicator (in-parity / O-pending / F-pending / conflict / unsynced-linked) with last-sync time, So that at a glance I know the state of agreement between my vault and Foundry without clicking anything.

Acceptance Criteria: Given sync-state.json (E4.1) is the single source of truth, and a new GET /api/sync-state endpoint that returns its full contents When the dashboard loads (init() at src/dashboard.html:144) and on every 2s poll (the existing autoPoll cadence at src/dashboard.html:348) Then a new #syncHeader block is rendered in the existing <header> (src/dashboard.html:79-93) showing: sync ON/OFF (from autoSyncOn), mode (PREP / RUN-THE-MATCH from mode), and watchedDir (from state.cfg.refinedDir, already exposed at src/server.ts:479) And a #parityIndicator badge is rendered next to the counts showing one of: in-parity (green, oPending=0 && fPending=0 && conflict=0 && unsyncedLinked=0), O-pending (blue, oPending>0), F-pending (purple, fPending>0), conflict (red, conflict>0), or unsynced-linked (yellow, unsyncedLinked>0 && oPending=0 && fPending=0), with the precedence order conflict > O-pending > F-pending > unsynced-linked > in-parity Given the parity counts in sync-state.json When the dashboard renders #parityIndicator Then oPending and unsyncedLinked are derived from the current INDEX.byRecommendation (sync-cc → oPending, seed → unsynced-linked, conflict → conflict; see REC map at src/dashboard.html:132-140) on each /api/index fetch, and written into sync-state.json by the server's parity-refresh step And fPending and lastPollAt come from E2's poll timestamps + F-pending counts (E2 owns the poll; E4 only reads them into sync-state.json); if E2 has not yet populated them, fPending=0 and lastPollAt=null are shown and the badge falls back to O-pending/in-parity as appropriate And the F-pending badge is the SLICE 1 DEMO HOOK: when fPending > 0, the badge is clickable and expands an "Incoming F→O changes" list panel (rendered in the existing <header> area, src/dashboard.html:79-93) showing each pending entry as { name, uuid, change: "edited" | "renamed" | "moved" | "new", detectedAt } sourced from sync-state.json.fPending (populated by E2's shallow+deep poll). This is the visible signal that "Foundry flows back" in Slice 1 — E2's pulls are observable here EVEN THOUGH E3 (conflict resolution) has not shipped yet: entries with change: "edited" and O-side-unchanged auto-pull silently (E2.3) and appear in the activity panel (E4.6) as pulled; entries with change flags implying divergence (both-diverged, vault-newer) appear in this list with a "needs resolution — coming in a later release" tag and are NOT auto-pulled (E2.4 skip). So Slice 1's shippable, demoable value is: "E4's header shows incoming Foundry changes and auto-pulled ones land visibly in the activity panel; divergent ones are held pending, not clobbered." A unit test asserts the list renders from sync-state.json.fPending with E3 absent (proving Slice 1 has a demo without E3) And the "Incoming F→O changes" list is feature-flagged with features.syncStatus (E4.1) AND gated on E2's foundryPoll flag — when either is off, the badge shows fPending=0 and the list panel does not render (E2 plumbing is dark); this is the concrete enforcement of "E2 is flagged-off plumbing until E3 ships" — E2 can run in Slice 1, but only E4's UI makes it visible, and only the non-divergent subset auto-applies And lastSyncAt is read from E1b's parity events (push/pull/baseline) — E4 appends each such event to activity and updates lastSyncAt to the newest push/pull event time; if none exists yet, lastSyncAt renders as "never" Given both the dashboard and the vault note (E4.5) must reflect the same state When any parity field or autoSyncOn changes Then sync-state.json is updated first (E4.1 atomic write), and BOTH the dashboard poll AND the next status-note write read from that same file — there is no second in-memory copy of parity that can drift And the GET /api/sync-state endpoint is registered behind the features.syncStatus flag (E4.1); when the flag is off it returns 404 and the dashboard falls back to the existing GET /api/autosync + GET /api/status (src/server.ts:657, 683) rendering exactly as today

Story E4.4: Loud SYNC PAUSED state when auto-sync is off in RUN mode

As a DM, I want a loud, persistent "SYNC PAUSED" banner when auto-sync is off during a match (not a silent button label), So that I immediately notice that my vault edits are not being pushed to live Foundry instead of silently going stale.

Acceptance Criteria: Given mode === "RUN-THE-MATCH" and autoSyncOn === false in sync-state.json When the dashboard renders the sync-status header (E4.3) Then a persistent #syncPausedBanner is shown directly under the <header> (above the existing autoSyncPanel at src/dashboard.html:94) with the --bad color (src/dashboard.html:8), text "SYNC PAUSED — auto-sync is off, vault edits are NOT being pushed to Foundry", and a "Resume auto-sync" button that calls the existing toggleAutosync() (src/dashboard.html:351) And the banner is NOT a toast (it does not auto-dismiss like toast() at src/dashboard.html:284-290); it remains on screen until autoSyncOn becomes true Given mode === "RUN-THE-MATCH" and autoSyncOn === true When the dashboard renders Then no #syncPausedBanner is shown and the existing autoSyncBtn (src/dashboard.html:92) reads "Auto-sync: on" with the primary class (src/dashboard.html:334-335) exactly as today And the auto-sync activity panel (autoSyncPanel src/dashboard.html:94) remains visible while ON Given mode === "PREP" (auto-sync unavailable by design, per E4.2) When the dashboard renders Then the #syncPausedBanner is NOT shown (PREP is not a "paused" state, it is a "not available" state); instead the header shows a neutral "PREP MODE — auto-sync disabled" tag and the autoSyncBtn is disabled per E4.2 And switching from RUN-THE-MATCH (ON) to PREP calls stop() (src/server.ts:546) and the banner does NOT appear because the mode tag supersedes it Given the server is in RUN mode, auto-sync is OFF, and the user reloads the page When init() (src/dashboard.html:144) fetches /api/sync-state Then the banner reappears immediately from the persisted autoSyncOn=false — the paused state survives page reloads because it is read from sync-state.json, not from a transient UI variable And the banner is gated behind features.syncStatus (E4.1); with the flag off, the existing silent "Auto-sync: off" button label (src/dashboard.html:334) is shown and no banner is rendered

Story E4.5: Vault .sync-status.md note writer with foundry.sync_status sentinel and airtight path+sentinel exclusion (rename-safe) + lost-sentinel error

As a DM, I want a maintained .sync-status.md note in my vault that mirrors the dashboard status (on/off, last sync, parity, recent events), carrying a foundry.sync_status: true sentinel, So that I can see sync status from inside Obsidian and so that the watcher never pushes the status note itself to Foundry even if it is renamed or edited.

Acceptance Criteria: Given features.syncStatus is on and sync-state.json exists (E4.1) When any field in sync-state.json changes (auto-sync toggle, mode flip, parity refresh, activity event) Then a writer renders a Markdown note to <state.cfg.refinedDir>/.sync-status.md with YAML frontmatter containing foundry:\n sync_status: "true" (the sentinel, read via readFoundryBlock at src/frontmatter.ts:9) plus a body with: on/off, mode, lastSyncAt, parity status + counts, and the last 10 activity events (per UX-DR2) And the writer uses writeWithBackup (src/server.ts:116) so in apply mode the previous status note is backed up; the write itself is the ONLY writer for this file Given the watcher's onChange at src/server.ts:557-561, which today filters .md (line 560) and skips paths containing .obsidian (line 561) When a save event arrives for .sync-status.md (the writer's own write, or a user edit in Obsidian) Then onChange skips it by PATH — rel === ".sync-status.md" or rel.split("/").pop()?.startsWith(".") — and returns before queueing the debounce timer (src/server.ts:564-571) And E1b's sentinel check (which this story owns the sentinel value for) runs in process() at src/server.ts:582-617: after readFoundryBlock(fm) at src/server.ts:591, if fb?.sync_status === "true" the note is skipped with a "skipped — sync status note (sentinel)" log line, before the cc_uuid/contentHash checks at src/server.ts:592-593 And BOTH the path skip AND the sentinel skip must be present — skip on EITHER; this is rename-safe because if the file is renamed to a non-dot path the sentinel still blocks, and if the sentinel is stripped but the path is unchanged the path still blocks Given a user (or a sync tool) renames .sync-status.md to Sync Status.md (no leading dot, sentinel intact) When the watcher processes it Then the sentinel check in process() (src/server.ts:591) catches it and logs "skipped — sync status note (sentinel)"; no push to Foundry occurs And the writer re-creates .sync-status.md at the canonical path on the next state change, so the managed note is self-healing Given the user edits .sync-status.md in Obsidian and removes the foundry: block (lost sentinel), but the file remains at the dot path When the watcher's onChange fires Then the PATH skip (dotfile) prevents process() from ever running on it — the note is NOT synced to Foundry ("user error not synced", FR-4.3) And on the next state-change write, the writer overwrites the file with the correct sentinel + content, and an error event "status note lost its sentinel — re-written" is appended to activity in sync-state.json And if the writer itself fails to write (e.g., the file is locked), an error event "status note write failed: " is appended and the dashboard header shows a warning badge; the failure never throws out of the state-mutation path Given the airtight exclusion must also cover the poll path, not just the watcher When any poll/index rebuild (indexAll via /api/index at src/server.ts:648-656) encounters .sync-status.md Then the file is excluded by the same path check before it can enter state.index.matched or state.index.refinedOnly, so it never appears as a row in the dashboard and never becomes a push candidate via pushAll (src/server.ts:330-383) And the exclusion is gated behind features.syncStatus (E4.1); with the flag off, the writer does not run and the existing onChange (src/server.ts:557-561) behaves exactly as today (no dotfile skip is added when the flag is off)

Story E4.6: Activity panel (last 200, scrollable) fed from sync-state.json + E1b parity events

As a DM, I want a scrollable activity panel showing the last 200 sync events (pushes, pulls, baselines, skips, errors, mode flips, status-note writes), So that I can trace what sync has been doing without grepping log files.

Acceptance Criteria: Given the existing autoSyncPanel (src/dashboard.html:94-97) which today binds to AutoSyncController.events capped at 100 (src/server.ts:468-469) via refreshAutosync() (src/dashboard.html:329-350) When features.syncStatus is on (E4.1) Then refreshAutosync() is replaced by refreshActivity() which fetches GET /api/sync-state and renders state.activity (the array in sync-state.json) into the existing #autoSyncLog <pre> (src/dashboard.html:96) And the panel shows the last 200 entries (not 100); sync-state.json's activity array is trimmed to 200 on each append (analogous to the trim at src/server.ts:490 but with maxActivity = 200), newest first And the panel remains scrollable via the existing max-height:180px;overflow:auto on #autoSyncLog (src/dashboard.html:96) Given activity events come from multiple producers When an event occurs Then it is appended to sync-state.json.activity with {time, kind, name, status, message} where kind{push, pull, baseline, skip, error, mode, status-note}; AutoSyncController.log() (src/server.ts:488-494) is redirected to append to activity (with kind inferred from status: "pushed"→push, "skipped"→skip, "error"→error) instead of the in-memory events array And E1b's parity events (push/pull/baseline) are read into activity by the same appender — E4 owns the appender shape, E1b owns the log file; the activity panel reads only from sync-state.json, never directly from logs/sync-<date>.log And the pushed/skipped/errors counters (src/server.ts:470-472) are derived from activity on read rather than maintained as separate fields, so the panel and the header counts (E4.3) cannot drift Given the dashboard is open and auto-sync is ON When the 2s poll (autoPoll at src/dashboard.html:348) fires Then refreshActivity() re-fetches /api/sync-state and re-renders the last 200 entries, preserving the user's scroll position within #autoSyncLog (scrollTop is saved and restored across the re-render) And when auto-sync is OFF the poll stops (matching the existing clearInterval(autoPoll) at src/dashboard.html:349) but the last 200 events remain visible in the panel — the panel stays open and scrollable regardless of ON/OFF state Given the server restarts When E4.1 reloads sync-state.json Then the last 200 events are immediately available to refreshActivity() on the first dashboard load — no "no activity yet" state (src/dashboard.html:347) is shown if activity.length > 0, even before any new event fires And with features.syncStatus off, refreshAutosync() (src/dashboard.html:329-350) runs unchanged, binding to AutoSyncController.events capped at 100 as today

Epic E3: Conflict resolution UX (Slice 2)

Story E3.1: Conflict row with side-by-side diff, plain-language summary, and neutral ordering

As a DM editing notes in both Obsidian and Foundry, I want a conflict row that shows the vault and Foundry versions side by side with a plain-language summary of what each side changed, So that I can understand the divergence without reading two full documents.

Acceptance Criteria: Given E1b's two-baseline routing marks a matched note recommendation: "conflict" (both refinedChanged and ccChanged true, src/dashboard.html:137 REC.conflict), and the user clicks the row's review button (src/dashboard.html:229-230) When the detail panel renders for a conflict row (select(name)GET /api/file?name=fileDetail src/server.ts:198-213, which already returns refined, cc, entry) Then the panel shows a dedicated conflict panel titled "Both sides changed since last sync" with two columns rendered side by side: vault (left) and Foundry (right), in that fixed order (neutral — vault left, Foundry right, no pre-highlighted action) And above the columns a one-line plain-language summary is computed from the two diffs (e.g. "Vault edited body; Foundry renamed entry + 1 page added") derived from comparing f.refined vs f.cc bodies and the entry.name vs the note's stored name — the summary names what each side did, not which side wins And the side-by-side diff reuses the existing diff(a,b) helper (src/dashboard.html:274-282) but is upgraded to a true ordered line diff (the current set-based helper drops ordering and mis-reports moved lines as del+add); the upgraded diff is scoped to a new conflictDiff(a,b) function and the legacy diff() is left untouched for the seed/sync/re-pull preview panels And the conflict panel is feature-flagged behind a E3_CONFLICT_UX flag read from STATUS (/api/status src/server.ts:657-658); when the flag is off, conflict rows fall back to the current review-only behavior (src/dashboard.html:229-230) and no conflict panel renders And the conflict row in the list table keeps the existing conflict badge (src/dashboard.html:137, badge: 'bad') — no new row type is introduced; only the detail panel changes And if either f.refined or f.cc is null (one side missing on disk), the panel shows "vault file missing" / "Foundry export missing" in the affected column instead of crashing, and no resolution actions are offered (edge case: a file deleted between the index tick and the click) And the panel is read-only at this stage — no action buttons yet (those land in E3.2); the three action buttons are rendered disabled with title="coming next" so the layout matches E3.2 and the dev agent can wire them in one pass.

Story E3.2: Three resolution actions with per-action one-line preview before commit

As a DM resolving a conflict, I want three explicit actions — "Push vault → Foundry", "Pull Foundry → vault", "Accept both as-is (keep divergence)" — each showing a one-line preview of what it will do to each side before I commit, So that I never destroy content by accident.

Acceptance Criteria: Given the conflict panel from E3.1 is open for a conflict row, with the three action buttons enabled (replacing E3.1's disabled placeholders) When the user hovers/focuses any of the three action buttons (before clicking) Then a one-line preview appears under that button stating, in plain language, what the action does to each side and which baselines it rewrites — e.g. for "Push vault → Foundry": "Writes vault body → live Foundry entry (relay /update); re-baselines foundry.contentHash + syncedAt on the vault side (baselineFoundryBlock src/server.ts:289). Foundry side content is overwritten." And the "Pull Foundry → vault" preview states: "Writes Foundry body → vault note (E2's F→O pull); re-baselines foundry.contentHash on the vault side. Vault side body is overwritten (curation tags/type/aliases preserved per rePullRow)." And the "Accept both as-is (keep divergence)" preview states: "No content is transferred. Re-baselines foundry.contentHash (vault) and cc_sync_hash (cc) to current values — both sides keep their own text and the divergence is acknowledged." (per the hard constraint: this action must NOT transfer content) And when the user clicks any action, a confirm step runs before any write: the same one-line preview is shown in a confirm popover (modal-bg/modal styles already exist src/dashboard.html:66-73) with "Confirm" / "Cancel"; no write happens on Cancel And on Confirm for "Push vault → Foundry", the dashboard POSTs to a new /api/conflict/resolve endpoint (src/server.ts route table near src/server.ts:665-696) with { name, action: "push-vault" }, which invokes pushNote (src/push.ts:112) with dryRun: false and then baselineNote (src/server.ts:307) on the vault side — reusing the exact path handlePushAll uses (src/server.ts:365-372) And on Confirm for "Pull Foundry → vault", the same endpoint with { action: "pull-foundry" } invokes E2's F→O pull (consumed, not rebuilt here) and then baselineNote on the vault side so foundry.contentHash matches the new body And on Confirm for "Accept both as-is", the endpoint with { action: "keep-divergence" } runs only the dual re-baseline — it calls baselineFoundryBlock (src/server.ts:289) to rewrite foundry.contentHash+syncedAt to the current vault body hash, and the cc-side equivalent to rewrite cc_sync_hash to the current cc body hash; pushNote is NOT called and no relay /update is sent (FR-3.7) And all three paths are gated by the per-uuid lock from E0 (acquired before the write, released after) so a concurrent auto-sync tick cannot push the same note mid-resolution; if the lock is held, the endpoint returns 409 { error: "locked by another operation" } and the dashboard shows that in the toast And the endpoint is feature-flagged behind E3_CONFLICT_UX; when off, the route returns 501 and the dashboard buttons stay disabled And if the relay is unconfigured (state.cfg.relayCfg falsy, src/server.ts:230-233), "Push vault → Foundry" is disabled with title="relay not configured"; "Pull Foundry → vault" and "Accept both as-is" remain available (the latter never touches the relay) And if the live entry fetch fails (relay timeout 408/504 per docs/relay-api.md), the endpoint returns the relay's error message and the dashboard toast shows it; no partial write is left behind (the lock is released in a finally).

Story E3.3: "Accept both as-is (keep divergence)" confirmation stating per-side outcome

As a DM who has edited both sides intentionally, I want the "Accept both as-is" action to require an explicit confirmation that spells out exactly what happens to each side, So that I don't mistake it for a merge and lose the fact that the two sides now disagree.

Acceptance Criteria: Given the user clicks "Accept both as-is (keep divergence)" in the conflict panel When the confirm popover opens Then the popover body states, in two labeled lines: "Vault side: content kept as-is; foundry.contentHash re-baselined to current body hash (src/server.ts:289 baselineFoundryBlock). No write to Foundry." and "Foundry side: content kept as-is; cc_sync_hash re-baselined to current cc body hash. No pull from Foundry." — explicitly naming both baselines and explicitly stating no content transfer in either direction And the popover requires a second affirmative gesture beyond the single Confirm button: a checkbox "I understand the two sides will stay different" that must be checked before the Confirm button becomes enabled (Confirm is disabled until the checkbox is checked — button:disabled style at src/dashboard.html:20) And the Confirm button is styled danger (src/dashboard.html:19) to signal irreversibility of the decision (the divergence is now baked into the baselines; a future tick will see parity, not conflict) And on Confirm, the /api/conflict/resolve { action: "keep-divergence" } handler runs only baselineFoundryBlock on the vault file and the cc-side cc_sync_hash re-baseline; it does NOT call pushNote (src/push.ts:112), does NOT call the relay /update (docs/relay-api.md), and does NOT call E2's pull — verified by the dev agent with a grep that the handler's code path contains no pushNote/relayClient/updateEntry call And after the re-baseline, the next /api/index tick (src/server.ts:648-656, rebuilt on every request) reclassifies the note as in-sync (both hashes now match their recorded baselines) and the row leaves the conflict bucket — the dashboard re-fetches the index after the action resolves (same pattern as act() at src/dashboard.html:301) And the popover's Cancel button returns to the conflict panel with no state changed (no baselines written, no lock acquired) And the confirmation text and checkbox are feature-flagged with E3_CONFLICT_UX; flag off → the button is absent (not merely unconfirmed) And edge case: if the vault file was deleted between the panel opening and Confirm (re-read inside the handler returns ENOENT), the handler returns 409 { error: "vault file vanished — re-scan and retry" } and no cc-side baseline is written either (both baselines must succeed atomically or neither does).

Story E3.4: Conflict persistence across ticks and restarts + resolved-conflict activity entry

As a DM who steps away from the dashboard mid-conflict, I want the conflict state to survive auto-sync ticks and a server restart, and to see a record in the activity panel once I resolve it, So that I don't lose track of unresolved conflicts and can audit what I decided.

Acceptance Criteria: Given E1b's routing marks a note as conflict on a tick, and E4's sync-state.json is the persistence layer (consumed here, not rebuilt) When the /api/index tick runs (src/server.ts:648-656, rebuilt per request) and finds a conflict recommendation Then the conflict is written into sync-state.json under a conflicts: { [name]: { refinedHash, ccHash, since: <iso>, status: "open" } } key (shape defined by E4; this story only writes/reads it) before the response is sent And on the next tick, if the same note is still conflict, the existing entry's since is preserved (not overwritten with a new timestamp) and status stays "open" — the conflict is not "forgotten" between ticks And on server restart (startServer src/server.ts:620), the in-memory conflict set is rehydrated from sync-state.json before the first /api/index response, so a conflict that was open before the restart is still shown as open (with its original since timestamp) after the restart And when any of the three resolution actions from E3.2/E3.3 succeeds, the /api/conflict/resolve handler flips the sync-state.json entry to { status: "resolved", resolvedAt: <iso>, action: <"push-vault"|"pull-foundry"|"keep-divergence"> } and appends a resolved-conflict entry to E4's activity panel (consumed via E4's activity API; this story emits the event, E4 renders it) And the resolved-conflict activity entry includes: note name, which action was taken, the previous and new foundry.contentHash, and the timestamp — enough to audit the decision later And a conflict that is status: "open" in sync-state.json but whose next tick recomputes as in-sync (e.g. because the user manually synced outside the dashboard) is auto-closed with status: "auto-closed", reason: "recomputed as in-sync" so the state file doesn't accumulate stale open entries And the persistence read/write is wrapped so that a corrupt or missing sync-state.json does not crash the server — on read error the server logs and treats the conflict set as empty (graceful degradation); on write error the resolution still succeeds in-memory and the dashboard toast notes "state file write failed — conflict resolution will not persist across restart" And the feature flag E3_CONFLICT_UX gates the persistence writes too; flag off → no sync-state.json writes from this path (E1b/E4 keep working independently) And edge case: two conflicts resolved in quick succession must not clobber each other's sync-state.json entry — writes are serialized (a simple async mutex in the handler, same pattern as the per-uuid lock) so the second write reads-after the first.

Story E3.5: Renames/moves surface as conflicts + flip O→F auto-sync to ON-by-default

As a DM running the sync tool continuously, I want renamed or moved notes to surface as conflicts (not silent overwrites), and I want O→F auto-sync to flip ON by default once conflict resolution exists, So that the auto-sync that now runs by default can't silently destroy a Foundry-side change the vault rename would mask.

Acceptance Criteria: Given a matched note is renamed in the vault (file basename changes) or moved (relative path changes) while the Foundry side also changed since last sync When E1b's routing runs on the next tick Then the note is classified as conflict (not sync-cc) when both (a) the vault path/basename differs from the last-recorded path AND (b) ccChanged is true — a rename/move alone on a clean Foundry side stays sync-cc (the rename flows through pushNote's updatedName path src/push.ts:185); a rename/move on a Foundry-changed side is a conflict because the auto-sync push would overwrite the Foundry-side change And the conflict panel from E3.1 shows an extra one-line banner when the rename/move condition is detected: "Note renamed/moved in vault (was X, now Y) AND Foundry side also changed — resolving 'Push vault → Foundry' will also update the live entry's name." so the DM knows the rename is part of the resolution And the Push vault → Foundry action for a renamed note passes the new name through pushNote (src/push.ts:112, which already handles updatedName via deps.noteName and the relay /update name field per docs/relay-api.md PUT /update) so the live entry is renamed in the same operation And when E3 lands (the E3_CONFLICT_UX flag is set to true in STATUS), AutoSyncController flips to ON-by-default: the enabled field (src/server.ts:458, currently enabled = false) initializes to true when the flag is on AND state.cfg.relayCfg is set (auto-sync needs the relay — src/server.ts:503); the controller's start() is called from startServer (src/server.ts:620-704) after the index builds, so a restart re-enables auto-sync without user action And the dashboard reflects the new default: refreshAutosync() (src/dashboard.html:329-350) shows Auto-sync: on on first load when the flag is on, and the auto-sync panel (src/dashboard.html:94-97) is visible by default And a user who explicitly turned auto-sync OFF in sync-state.json (E4 persists the toggle) is respected — the ON-by-default only applies when no explicit user preference exists; once the user toggles it off, it stays off across restarts (the setEnabled(false) path src/server.ts:496-500 persists the choice via E4) And UPGRADE MIGRATION: on first start after E3.5 lands, if sync-state.json has NO autoSyncOn field (an upgraded install from pre-E3.5 that never persisted a preference), the controller treats the missing field as OFF for one release cycle — it does NOT infer consent-to-ON from absence. A startup banner reads "Auto-sync default-on is paused for this upgraded install — click to enable ON-by-default" with a one-click POST /api/sync-state { autoSyncDefaultOn: true } that records the opt-in; only AFTER that opt-in (or on a fresh install where E3_CONFLICT_UX is on from first boot) does the ON-by-default take effect. This makes the E3.5 flip a non-breaking default change for upgraders (no surprise silent vault mutation) while still ON-by-default for fresh installs — the breaking-change-disguised-as-a-default-change footgun PM flagged. Fresh installs (no prior sync-state.json) skip the banner and start ON-by-default directly And the ON-by-default flip is gated by E3_CONFLICT_UX — flag off → enabled stays false at construction (src/server.ts:458) and the startup auto-start does not run, preserving current behavior And edge case: if the relay is unconfigured (relayCfg falsy) at startup, the auto-start is skipped with a log line "auto-sync default-on skipped — relay not configured" and enabled stays false; no crash (the start() guard at src/server.ts:503 already throws, but the startup path catches and logs rather than propagating) And edge case: a rename where the new basename collides with an existing different note's name is detected by E1b's resolver (src/server.ts:20 MapNameResolver) and surfaced as review (not conflict) so the DM disambiguates before any resolution — this story does not auto-resolve name collisions.

Epic E5: Diagnostics polish (Tail)

Story E5.1: "Copy diagnostics" dashboard action — redacted bundle to clipboard

As a DM operating the sync dashboard, I want a one-click "Copy diagnostics" button that puts a redacted bundle (log tail + config with secrets redacted + parity + relay/clientId status) on my clipboard, So that I can paste it into a bug report or support chat without ever leaking RELAY_API_KEY / RELAY_PASSWORD or any other secret.

Acceptance Criteria: Given the server is running with relayCfg set (or unset) and E1b's persistent log logs/sync-<date>.log exists on disk When the user clicks a new "Copy diagnostics" button in the dashboard header (added next to "Re-scan" at src/dashboard.html:83) Then the dashboard POSTs/GETs a new feature-flagged /api/diagnostics endpoint on src/server.ts (registered in the router at src/server.ts:629 alongside the other /api/* routes) and copies the returned bundle to the clipboard via navigator.clipboard.writeText And the bundle is a single string field containing: (a) the last ~200 lines of logs/sync-<date>.log (today's file; if missing, the most recent logs/sync-*.log by mtime; if none, the literal string "(no log file)"), (b) a redacted config snapshot, (c) parity counts + conflict state + last-sync pulled from E4's sync-state.json (read best-effort; on missing file, "(no sync-state.json)"), (d) relay/clientId status block: relayConfigured: bool, clientIdConfigured: bool (derived from state.cfg.relayCfg?.clientId truthiness at src/relay/client.ts:36), mode, host, port from /api/status (src/server.ts:657-658) And the redacted config snapshot NEVER includes RELAY_API_KEY or RELAY_PASSWORD in plaintext — these keys appear only as "<redacted>" or are omitted entirely; any other env var whose name matches /KEY|PASSWORD|SECRET|TOKEN/i is likewise redacted by the same server-side redactor (defensive — E7 owns no-egress, this respects it) And the bundle includes a header line # diagnostics bundle — generated <iso8601> — secrets redacted and is formatted as fenced markdown so it pastes cleanly into a chat And on the dashboard side, a toast (src/dashboard.html:284-290 toast() pattern) confirms "diagnostics copied to clipboard (secrets redacted)" on success, or surfaces the error message on failure (e.g. clipboard permission denied → "clipboard blocked — copy manually from the detail panel", and the bundle is also rendered in a <pre> in the detail panel #detail as a fallback) Given relayCfg is unset (server started without RELAY_API_KEY) When the user clicks "Copy diagnostics" Then the bundle still copies successfully with relayConfigured: false and the relay/clientId status block reporting "(relay not configured)" — the button is always enabled, never silently disabled Given the redactor is asked to serialize a config object that contains a nested apiKey or password field under an unexpected key name When the bundle is assembled Then the redactor walks the object recursively and redacts any value whose key path matches /apiKey|password|secret|token/i (case-insensitive), so a future config reshape can't accidentally leak a secret And a unit-style assertion in the endpoint's redactor function rejects the response (throws, caught by the outer try/catch at src/server.ts:698-700) if any non-redacted secret-looking string longer than 8 chars remains in the serialized bundle — fail-closed rather than risk a leak Given the diagnostics endpoint is hit but reading the log file throws (permissions / missing dir) When the response is built Then that section is set to "(log read failed: <error.message>)" and the rest of the bundle is still returned — one missing piece does not blank the whole bundle And the endpoint is feature-flagged (gated behind a DIAGNOSTICS_ENABLED env flag or a cfg.diagnostics field on ServerConfig at src/server.ts:30), returning 503 {"error":"diagnostics disabled"} when off, so the feature ships dark and can be flipped on per environment

Epic E6: Onboarding & config guidance (Tail)

Story E6.1: Empty RELAY_CLIENT_ID detection + "no clientId configured" guidance state

As a non-author DM opening the dashboard for the first time, I want the dashboard to clearly tell me when RELAY_CLIENT_ID is empty and guide me to fix it from the UI (never a shell command), So that I don't stare at a silent 404 "Invalid client ID" error with no idea what's wrong.

Acceptance Criteria: Given the server is running with relayCfg set but relayCfg.clientId is empty/undefined (the field read at src/relay/client.ts:36 is falsy) When the dashboard loads (init() at src/dashboard.html:144) and fetches a new feature-flagged /api/relay-status endpoint on src/server.ts (registered in the router at src/server.ts:629) Then the endpoint returns { clientIdConfigured: false, relayConfigured: true, connectedClients: null, status: "no-client-id" } WITHOUT making any relay call (no relay round-trip needed to know clientId is empty — it's a local config read) And the dashboard renders a dismissible guidance banner above the counts bar (src/dashboard.html:81 #counts) with class badge warn styling (src/dashboard.html:35) reading exactly: "No clientId configured — set RELAY_CLIENT_ID in your .env (see the 'How to run' section below), or pick a client from the relay client list once Foundry is connected." And the banner includes an inline button "Open client picker" that, when clicked, triggers the same client-list flow built in E6.3 (it may be a no-op stub if E6.3 hasn't landed — but the button MUST NOT hand the user a shell command; if E6.3 is not yet implemented, the button is disabled with tooltip "client picker coming soon" rather than routing to a shell) And the "Auto-sync" button (src/dashboard.html:92 #autoSyncBtn) is forced disabled with a tooltip "auto-sync requires a clientId — see the banner above" when status === "no-client-id", and toggleAutosync() (src/dashboard.html:351) short-circuits with a toast "configure RELAY_CLIENT_ID first" instead of POSTing /api/autosync Given relayCfg itself is unset (no RELAY_API_KEY) When /api/relay-status is called Then it returns { relayConfigured: false, clientIdConfigured: false, status: "no-relay" } and the banner reads "Relay not configured — set RELAY_API_KEY and RELAY_PASSWORD in your .env (see 'How to run' below)." with NO client picker button (there's nothing to pick until the relay is up) And the "Auto-sync" button is disabled with tooltip "relay not configured" Given clientId IS configured (non-empty string) When /api/relay-status is called Then it returns { clientIdConfigured: true, status: "ok" } and the dashboard renders no banner (the empty-clientId state is cleared on next poll) And the dashboard polls /api/relay-status on the same cadence as refreshAutosync (src/dashboard.html:348 setInterval(refreshAutosync, 2000)) while autosync is on, and once on window focus (src/dashboard.html:451) so a user who fixes their .env and restarts sees the banner clear without a manual refresh And the README "How to run" stub (the ## Quick start (Docker Compose) block at README.md:21) is touched in this slice to add a one-line note under step 1: "If you see 'No clientId configured' in the dashboard, RELAY_CLIENT_ID is empty — see the client picker in the dashboard header." (touching the stub every slice is the E6 contract, not a retroactive doc dump) And the endpoint is feature-flagged the same way as E5.1 (gated on ServerConfig at src/server.ts:30), returning 503 when off so it ships dark And edge case: if /api/relay-status itself 503s (feature flag off) or 500s, the dashboard silently skips the banner (no error toast for a not-yet-enabled endpoint) — the onboarding flow degrades gracefully when the flag is off

Story E6.2: No-connected-client detection + auto-sync disable + cadenced re-check

As a DM whose Foundry instance isn't connected to the relay yet (Foundry closed, rest-api module not pointed at the relay, or session not started), I want the dashboard to say "Foundry not connected", disable auto-sync, and keep re-checking on a cadence so it recovers the moment Foundry comes online, So that I don't have to manually retry and don't silently lose auto-sync pushes to a 404.

Acceptance Criteria: Given relayCfg is set and clientId is configured (E6.1's "no-client-id" state is clear), but no Foundry client is connected to the relay key When the dashboard calls the feature-flagged /api/relay-status endpoint (from E6.1) Then the endpoint makes a single lightweight probe to the relay — it calls RelayClient.searchJournalEntries (src/relay/client.ts:97) with limit=1 (the smallest possible probe) wrapped in try/catch — and on a 404 with body containing "No connected Foundry clients found" (per docs/relay-api.md:13) returns { status: "no-connected-client", connectedClients: 0, clientIdConfigured: true } And the dashboard renders a guidance banner (same slot as E6.1, src/dashboard.html:81) reading exactly: "Foundry not connected — open Foundry, install/enable the rest-api module, point it at ws(s)://<host>:3010, and start the relay session. Re-checking every 10s." with class badge bad (src/dashboard.html:36) And the "Auto-sync" button (src/dashboard.html:92) is disabled with tooltip "Foundry not connected — auto-sync paused" whenever status === "no-connected-client", and if auto-sync was already ON when the state transitions to no-connected-client, the dashboard POSTs /api/autosync {enabled:false} (src/server.ts:686) to turn it off server-side (so the watcher at src/server.ts:502-517 stops) and toasts "auto-sync paused — Foundry disconnected" And the dashboard sets a 10-second setInterval re-check that re-calls /api/relay-status while in the no-connected-client state (separate timer from the 2s autosync poll at src/dashboard.html:348; cleared when state transitions to ok or multi-client) Given the probe returns 404 with "Invalid client ID" (the retryable error per the shared code facts, distinct from "No connected Foundry clients found") When /api/relay-status builds its response Then it returns { status: "invalid-client-id", clientIdConfigured: true } and the dashboard banner reads "clientId is invalid or stale — re-fetch the client list and pick a new one." with an "Open client picker" button (E6.3's flow; stub-safe per E6.1's rule) — this is the retryable path, NOT the persistent "no connected client" path, and they MUST NOT be conflated Given the probe times out (408/504 per docs/relay-api.md:17) When /api/relay-status builds its response Then it returns { status: "relay-unreachable", error: "<message>" } and the banner reads "Relay unreachable () — check the relay container and your network." with auto-sync disabled (same disable path as no-connected-client) and the 10s re-check armed And the 10s re-check is suspended while the document is hidden (document.hidden) to avoid background polling, and resumes on visibilitychange — no shell command, no manual retry button required (the re-check IS the retry) And the README "How to run" stub (README.md:21) is touched in this slice to add under step 4: "If the dashboard says 'Foundry not connected', confirm Foundry is running, the rest-api module is enabled, and node scripts/start-relay-session.js has been run." (per-slice touch, not a doc dump) And edge case: if the probe returns a 400 listing clients (>1 connected, no clientId), the endpoint returns { status: "multi-client", connectedClients: <n>, clients: [...] } and does NOT render the no-connected-client banner — that's E6.3's picker state, not this story's And the endpoint is feature-flagged identically to E6.1 (gated on ServerConfig at src/server.ts:30); when the flag is off, the dashboard never enters the no-connected-client state (no probe, no banner, no auto-sync disable) — the onboarding flow is opt-in

Story E6.3: Connected-client list picker with single-client "auto-resolved, no pick needed" message

As a DM with multiple Foundry instances (or a dev + prod world) connected to the same relay key, I want the dashboard to list the connected clients and let me pick one from the UI, and to tell me when no pick is needed because exactly one is connected (the relay auto-resolves), So that I never see an empty list when there's clearly one client, and never have to shell out to find the clientId.

Acceptance Criteria: Given relayCfg is set, clientId is empty (E6.1's state), and the probe from E6.2 returns a 400 with a client list (the relay's ">1 connected client" behavior per docs/relay-api.md:13) When the dashboard calls /api/relay-status (E6.1's endpoint, extended) and gets { status: "multi-client", connectedClients: <n>, clients: [{clientId, name, ...}] } Then the dashboard renders a client-picker modal (reusing the .modal/.modal-bg/.entry styles at src/dashboard.html:66-73 already used by the link picker at src/dashboard.html:402-413) listing each connected client with its name and clientId And clicking a client calls a new feature-flagged POST /api/relay-client-select { clientId } on src/server.ts (router at src/server.ts:629) that writes the chosen clientId back to state.cfg.relayCfg.clientId in-memory AND persists it to .env (or a sync-state.json field) so it survives a restart — UI-only, no shell command to the user And on success, the dashboard toasts "clientId set to ", closes the modal, re-fetches /api/relay-status, and clears E6.1's "no clientId configured" banner Given exactly one Foundry client is connected to the relay key and clientId is empty When /api/relay-status makes its probe (E6.2's searchJournalEntries({limit:1})) Then the relay auto-resolves the single client and returns a 200 (per docs/relay-api.md:11-13 and the shared code facts: "with exactly 1, it auto-resolves and returns NO list") And the endpoint detects this single-client auto-resolve path and returns { status: "auto-resolved", clientIdConfigured: false, autoResolvedClientId: null } (the relay doesn't tell us the resolved id, just that it succeeded) and the dashboard banner reads "Relay auto-resolved to the single connected Foundry client — no pick needed." with class badge ok (src/dashboard.html:34) and NO client picker button is shown (there's nothing to pick) And the "Auto-sync" button is enabled in this state (the relay will auto-resolve on each request), with a tooltip noting "relay auto-resolves the single connected client" Given zero clients are connected (E6.2's no-connected-client state) and the user opens the picker via the E6.1 banner button When the picker tries to fetch the client list Then the picker shows "Foundry not connected — no clients to list. Re-checking…" (reusing the E6.2 re-check cadence) instead of an empty list, and the picker's "Open client picker" button in E6.1's banner is disabled with tooltip "Foundry not connected" in this state And edge case: if the user picks a client and then that client disconnects before the next probe, the next /api/relay-status poll returns invalid-client-id (E6.2's retryable state) and the dashboard re-opens the picker with a toast "that client disconnected — pick a new one" And the README "How to run" stub (README.md:21) is touched in this slice to add under step 3: "If multiple Foundry worlds are connected, pick one from the dashboard client picker. If only one is connected, the dashboard says 'auto-resolved, no pick needed'." (per-slice touch, not a doc dump) And the POST /api/relay-client-select endpoint is feature-flagged identically to E6.1/E6.2 (gated on ServerConfig at src/server.ts:30); when off, the picker modal shows "client selection disabled in this build" instead of posting, and the user is never given a shell command as a fallback

Feature Flag Inventory

Single source of truth for every feature flag introduced across the epics. Each flag: key · default · read site · off-behavior · introducing story · default-flipping story. Flags are read once at boot (env → ServerConfig / cfg.features) unless noted "per-request". When a flag is OFF, the code path is present but a no-op pass-through (byte-identical to today's behavior) — never "code absent" — so landing is reversible.

Flag Default Read site Off-behavior Introduces Flips default
SYNC_LOCK_ENABLED true ServerConfig, boot SyncLock not consulted; old per-relPath inflight Set (src/server.ts:463) is the sole guard — byte-identical to today E0.1 — (stays on)
CC_HASH_SPIKE false ServerConfig, boot htmlToMarkdown + its tests not wired into push path; forward push behavior unchanged E1a.1 — (gate artifact; on only for the spike)
AUTOSYNC_FOUNDRY_GUARD true env, boot Auto-sync regresses to body-only gate (no Foundry-side ccHash check, no TOCTOU re-verify, no revert-last-push); dashboard banner marks unsafe E1b.1 — (stays on after Slice 0)
AUTOSYNC_BACKUP_RETAIN 10 (count, not bool) env, boot N/A — numeric config for E1b.4 retention E1b.4
AUTOSYNC_CONCURRENCY 3 (numeric) env, boot N/A — numeric, validated 18 E1b.6
AUTOSYNC_DEBOUNCE_MS 700 (numeric) env, boot N/A — numeric, validated 1005000 E1b.6
AUTOSYNC_BASELINE_SUPPRESS_MS 2000 (numeric) env, boot N/A — TTL for E1b.2 self-write suppression E1b.2
AUTOSYNC_LOG_RETAIN_DAYS 14 (numeric) env, boot N/A — retention for E1b.8 rotated log E1b.8
ENABLE_AUTH_MIDDLEWARE false env, boot Auth/CSRF no-op pass-through; route table still exists but authenticate short-circuits to true — today's behavior. Bind still defaults 127.0.0.1; refuse-0.0.0.0-without-token check skipped. Self-lockout guard still fires (flag=on + no token → refuse boot) E7.1 Flips to true at the launch gate (public exposure)
DIAGNOSTICS_ENABLED false ServerConfig, boot /api/diagnostics returns 503 {"error":"diagnostics disabled"} E5.1 — (per-environment flip)
features.syncStatus / OFS_SYNC_STATUS off / 0 cfg.features, boot None of E4's endpoints/header/banner/note-writer/activity-panel register; /api/sync-state 404; dashboard falls back to existing /api/autosync + /api/status rendering exactly as today E4.1 — (per-environment flip; on once E4 lands)
foundryPoll off (within Slice 1) cfg.features, boot + POST /api/foundry-poll No shallow/deep poll timers; /api/foundry-poll* 404; fPending stays 0; E2 is dark plumbing E2.1 Flips ON when E3 ships (conflict UX makes F→O safe to surface)
E3_CONFLICT_UX false STATUS (/api/status), boot Conflict rows fall back to current review-only behavior; /api/conflict/resolve 501; conflict persistence writes skipped; O→F auto-sync stays OFF-by-default (enabled=false at construction) E3.1 Flips to true at E3 ship → flips O→F auto-sync ON-by-default (E3.5, with upgrade-migration pause)
E6 onboarding flag (on ServerConfig) false ServerConfig, boot /api/relay-status + /api/relay-client-select 503; dashboard skips onboarding banners; degrades gracefully (no error toast) E6.1 — (per-environment flip)

Flag-off failure-path rule: when a flag is OFF and its code path encounters an error (e.g. features.syncStatus off and sync-state.json can't be written), the path skips silently and logs to logs/sync-<date>.log (E1b.8) — it never throws out of the state-mutation path. This is consistent across all flags. And AC cross-check: every FR-6 sub-requirement is covered — FR-6.1 (empty clientId → "no clientId configured" + guidance, not silent 404) by E6.1; FR-6.2 (no connected client → "Foundry not connected" + auto-sync disable + cadenced re-check) by E6.2; FR-6.3 (list connected relay clients from the UI; exactly-one → "auto-resolved, no pick needed") by E6.3 — no FR-6 requirement is left to a later epic