This commit is contained in:
2026-06-23 03:53:35 +00:00
parent 2b8b0302a8
commit ac6288088a
5 changed files with 1708 additions and 123 deletions

1222
docs/epics.md Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -5,6 +5,58 @@ change, and override lands here as the conversation unfolds.
---
## 2026-06-22 — Reviewer gate + finalize (PRD → final)
Three parallel reviewers dispatched; full reviews on disk:
`review-rubric.md`, `review-engineering.md`, `review-launchable.md`.
**Reviewer outcomes:**
- Rubric: 0 critical/high; 3 medium (stale TBD in §1, no Success Metrics, no
Glossary), 4 low. → all applied (§8 SMs, §9 Glossary, §1 tie-breaker fix,
FR-5.2/5.4 bounds).
- Engineering: **Conditional-go** — 2 BLOCKER (F2: `/search` minified has no
`folder` and no content hash; content detection = per-note `/get` per poll,
contradicts ADR-005), 3 HIGH (Foundry-side hash under-specified, `/get`
`/update` TOCTOU, no shared cross-direction lock), 4 MEDIUM, 4 LOW. → all
applied (F2 two-layer rescope, FR-1.4 hash definition, FR-1.10 TOCTOU
re-verify, FR-3.1 per-uuid lock, M1M4, L1L2).
- Launchable: **NOT launchable as specified** — 2 BLOCKER (`/setup` is an
author tool not a product surface; `0.0.0.0:7788` no auth), 4 HIGH (error
contracts, status-note loop, conflict UX footgun, no Foundry-side undo),
5 MEDIUM, 4 LOW. → all applied.
**Four user decisions (the real crux):**
1. **B1-launchable → Downscope NFR-6.** Honest "given operator-wired relay +
headless session + rest-api module"; §2 Operator prerequisites block added;
`/setup` dropped from FRs. In-UI wizard → future PRD (§7).
2. **B2-launchable → Auth-by-default + 127.0.0.1.** F7 + NFR-9 added: auth by
default, default bind localhost, `0.0.0.0` requires token, no secret egress,
CSRF/same-origin on mutations. (Behavior change from today's `0.0.0.0`
no-auth default — acknowledged.)
3. **H4-launchable → Cache + revert last push.** NFR-10 + FR-5.6/5.7: pre-push
Foundry entry cached to `foundry-backups/<uuid>/<iso>.json` (last N) +
dashboard "Revert last push."
4. **F2 deep poll → On by default (minutes cadence).** F2 = shallow poll
(renames/new/missing, faster) + deep poll (content/moves via per-note
`/get`, minutes cadence, `mapPool`-capped), both on by default + manual
catch-up trigger.
**Auto-applied (no user decision):** conflict-UX rename + confirmation
(FR-3.6/3.7/3.8/3.9); status-note dot-path + sentinel exclusion (FR-4.3/4.6);
error-contracts table (§5a); schema_version + migration (NFR-11); persistent
log + diagnostics (FR-5.8/5.9); auto-sync gated to apply mode (FR-1.9);
TOCTOU post-push re-verify (FR-1.10); per-uuid shared lock (FR-3.1); transient/
persistent retry split (FR-5.3); neutral conflict ordering (FR-3.10);
single-client auto-resolve (FR-6.3); OQ-4 and OQ-6 resolved in-PRD; README
doc-drift task (§7).
**Finalize:** decision-log audit ✓ (all entries reflected). Input
reconciliation — conversational input only, already folded; no external docs
to extract. Reviewer pass ✓. Triage — OQs are non-blockers with defaults
(OQ-1/3/7 open with defaults; OQ-2 deferred with reopening condition; OQ-4/5/6
resolved). Editorial polish deferred to dev (PRD reads clean; full subagent
editorial pass available on request, not blocking dev). **status: final.**
## 2026-06-22 — Divergence / conflict posture (the no-clobber decisions)
- **Foundry-side baseline hash scope = content + name + folder_path.** Catches

View File

@@ -1,16 +1,20 @@
---
title: "Live Relay Sync — Auto-Sync & Bidirectional Hardening"
status: draft
status: final
created: 2026-06-22
updated: 2026-06-22
reviewers: [rubric, engineering-feasibility, launchable]
---
# Live Relay Sync — Auto-Sync & Bidirectional Hardening
> Status: **draft** — being authored via bmad-prd coaching path.
> Scope: full live-sync surface over the ThreeHats relay — (A) ship & verify
> Obsidian→Foundry instant auto-sync, (B) add Foundry→Obsidian auto direction,
> (C) operational hardening. Stakes: public/launchable.
> Obsidian→Foundry instant auto-sync **with a no-clobber divergence guard**,
> (B) add Foundry→Obsidian auto direction, (C) operational hardening, plus
> launchable-grade security, error contracts, and data integrity.
> Stakes: **public/launchable** (operator-wired prerequisites — see §2).
> Reviewed by three parallel reviewers; findings applied (see
> `.decision-log.md` and `review-{rubric,engineering,launchable}.md`).
## 1. Vision
@@ -22,8 +26,8 @@ summarizing or closing out previous journal notes, dropping in images and lore,
creating new objects to build out the world. Data is coming from **whichever
tool is easier in the moment** — Obsidian or Foundry — and files must stay in
sync the whole time, across both directions, without the DM babysitting a sync
button. The feeling to deliver: *always in lockstep while I work*, bidirectionally,
across a flurry of simultaneous edits.
button. The feeling to deliver: *always in lockstep while I work*,
bidirectionally, across a flurry of simultaneous edits.
**Running the match** is the live session. The DM is generating a lot of notes,
almost entirely inside Foundry, and the sync service is almost certainly **off**
@@ -37,14 +41,17 @@ whether the vault and Foundry agree.
_[CONFIRMED] Source of truth = the newest version of a document — not a fixed
side. But the system must **never clobber work**: when both sides have diverged
since the last sync, it must route to reconciliation instead of overwriting
either side. The manual buttons (Sync / Re-pull / Push-all) stay precisely for
this — quick overrides when a file is unsynced on any side or both deviate._
since the last sync, it routes to reconciliation instead of overwriting either
side. Default resolution = **manual** (FR-3.2): the conflict row offers
explicit actions; a newest-mtime convenience is open (OQ-1) but not assumed.
The manual buttons (Sync / Re-pull / Push-all) stay precisely for this — quick
overrides when a file is unsynced on any side or both deviate._
_[CONFIRMED] Foundry is an equal-origin editing surface during prep — data
originates in whichever tool is easier. "Foundry is source of truth" is demoted
to a tie-breaker rule for the both-diverged conflict case (to be decided:
newest-wins, Foundry-wins, or manual merge)._
to a tie-breaker rule for the both-diverged conflict case, and the conflict UI
defaults to a neutral ordering (vault left, Foundry right, no pre-highlighted
action) so the undecided tie-breaker does not bias the DM (FR-3.10)._
_[DEFERRED] Where the "not syncing" indicator lives during run-the-match — no
easy answer yet; decision deferred. Foundry UI is rendered by the rest-api
@@ -52,11 +59,9 @@ module (not ours); the dashboard is the only realistic surface we control now.
A custom Foundry module (giving us what the relay does, plus indicators) is a
future exploration, not this PRD._
_[CONFIRMED] Sync-status-when-ON lives in the dashboard for now. Candidate
addition: a single **status note** the sync tool maintains inside the vault
(e.g. `Sync Status.md`) showing last-sync time/state — a lightweight,
in-our-control way for the user to see parity at a glance without a custom
module. (Proposed by user; to be confirmed as a feature.)_
_[CONFIRMED] Sync-status-when-ON lives in the dashboard for now, plus a single
maintained **status note** inside the vault (`Sync Status.md`) showing
last-sync time/state — a lightweight, in-our-control parity indicator (FR-4.3)._
## 2. Problem & Context
@@ -79,7 +84,7 @@ Sync/Re-pull per file doesn't scale to that churn; the value proposition of
auto-sync is exactly this phase. But auto-sync that only watches one direction
leaves half the prep edits unpropagated, and auto-sync that pushes without
checking the other side **can silently clobber Foundry edits** — the current
O→F code has this risk.
O→F code has this risk (confirmed against `src/server.ts:582-617`).
**The run-the-match phase is Foundry-centric with sync off.** During the live
session the DM generates many notes, mostly inside Foundry, and the sync
@@ -87,52 +92,85 @@ service is expected to be off. Today "off" is silent — the DM can't tell
whether prep edits are still propagating. Status legibility (on/off, parity,
last sync) is missing on both sides.
**Config/onboarding gap.** Live relay sync needs a connected Foundry client and
a valid `clientId`. `RELAY_CLIENT_ID` is currently empty in `.env`, and the
headless Foundry session that the relay drives is not always up. A launchable
tool needs this onboarding path to be discoverable and self-correcting.
### Operator prerequisites (the honest onboarding boundary)
**Audience.** Today a single operator (the DM/world-builder). Intended to be
launchable — other DMs running it against their own Foundry worlds — so the
no-clobber, error-surfacing, and onboarding rigor must hold for non-author
users, not just for the one who wrote it.
Live relay sync sits on top of infrastructure the **operator** (the DM or
whoever runs their host) wires **once**, outside the dashboard. The dashboard
handles config **detection** and all live operations, but it does not bring up
the infrastructure. A non-author DM must complete, or have completed, these
gates before the dashboard can reach live sync:
1. Bring up the ThreeHats relay container (`docker compose up -d relay`).
2. Create a relay account and copy `RELAY_API_KEY` into `.env` (browser signup).
3. Start the headless Foundry session the relay drives
(`scripts/start-relay-session.js`) — shell.
4. Point Foundry's rest-api module at the relay WebSocket URL (Foundry-side
admin config — cannot be driven from the dashboard at all).
5. Install deps and start the dashboard (`npm install`, `./sync.sh ui`).
The dashboard **surfaces** each unmet gate (FR-6.1/6.2/6.3) and guides
remediation, but building an in-UI first-run wizard that performs these steps
is **out of scope for this PRD** (future work — see §7). NFR-6 is scoped
accordingly. Foundry-side world backups (Foundry's own backup feature) remain
the DM's responsibility and are **recommended before enabling auto-sync**
this PRD adds a local pre-push cache + revert (FR-5.6/5.7) as a recovery path,
but it is not a substitute for world backups.
_[ASSUMPTION] The relay remains the live transport for this PRD; the "custom
Foundry module that gives us what the relay does" is explicitly future work,
not a dependency here._
_[ASSUMPTION] One relay `/get` per changed linked note per deep-poll tick is
acceptable relay load at a minutes cadence with a concurrency cap (OQ-3)._
**Audience.** Today a single operator (the DM/world-builder). Intended to be
launchable — other DMs running it against their own Foundry worlds **given the
operator prerequisites above** — so the no-clobber, error-surfacing, security,
and data-integrity rigor must hold for non-author users, not just for the one
who wrote it.
## 3. Goals & Non-Goals
### Goals
- **G1 — No-clobber bidirectional auto-sync for prep.** Both directions sync
automatically; neither side is ever overwritten if it changed since last sync;
both-diverged conflicts surface for the DM to resolve.
automatically; neither side is ever overwritten if it changed since last
sync; both-diverged conflicts surface for the DM to resolve.
- **G2 — Ship & verify the O→F auto-sync safely.** Commit the existing
controller/UI **with the divergence guard added** (Foundry-side baseline hash
+ relay `/get` before acting), and verify one end-to-end live push.
- **G3 — Foundry→Obsidian auto direction.** A working F→O auto path despite the
relay having no push channel (polling / snapshot-diff design).
+ relay `/get` before acting + post-push re-verify), and verify one
end-to-end live push.
- **G3 — Foundry→Obsidian auto direction.** A working F→O auto path via
shallow poll (renames/new/missing) + deep poll (content/moves), within the
relay's actual constraints.
- **G4 — Legible sync status at all times.** The DM can see, at a glance,
whether sync is on or off, whether the vault and Foundry are in parity, and
when the last sync landed — in the dashboard, plus a maintained status note
inside the vault.
when the last sync landed — in the dashboard plus a maintained vault status
note; state persists across restarts.
- **G5 — Operational hardening.** Recursive-watch fallback, tuned
concurrency/debounce, retry on transient relay 404/timeout, and visible
error rows in the dashboard.
- **G6 — Closed onboarding/config.** `RELAY_CLIENT_ID` and the connected-Foundry-
client requirement are discoverable and self-correcting from the dashboard.
concurrency/debounce, retry with transient/persistent split, visible error
rows, persistent log, shared cross-direction locking.
- **G6 — Honest onboarding & config.** Given the operator prerequisites, the
dashboard detects and guides every unmet config/relay gate with no shell
command from the DM.
- **G7 — Security.** The dashboard authenticates by default, binds localhost
by default, never exposes secrets to the browser, and guards mutations.
- **G8 — Foundry-side data integrity.** Every push to Foundry is preceded by a
local backup of the pre-push Foundry state, with a dashboard revert path.
### Non-Goals
- **In-UI first-run onboarding wizard** that performs the operator
prerequisites (bring up relay, acquire API key, launch headless session, wire
rest-api module) — future PRD.
- **Custom Foundry module** (indicators inside Foundry UI, relay replacement) —
future exploration, out of scope.
future exploration; would reopen OQ-2.
- **"Not syncing" indicator rendered inside Foundry** — deferred; Foundry UI is
not a surface we control. Surfaced via dashboard + vault status note instead.
- **Syncing during run-the-match.** Sync stays off by design in that phase; the
goal is legibility of the off state, not automation during the session.
- **Automatic semantic/3-way content merge.** Both-diverged = pick a side or
merge manually via the buttons; no auto content-merge engine.
accept divergence via the buttons; no auto content-merge engine.
- **Auto-sync of unlinked or unseeded notes.** Seed/link first remains a manual
prerequisite (unchanged from current behavior).
- **Full LevelDB / docker-stop index in the dashboard.** Remains CLI-only
@@ -140,23 +178,33 @@ not a dependency here._
## 4. Features & Functional Requirements
FR IDs are stable. Grouped F1F6; IDs are `FR-<group>.<n>`.
FR IDs are stable. Grouped F1F7; IDs are `FR-<group>.<n>`.
### F1 — Obsidian→Foundry auto-sync (safe)
- **FR-1.1** Watch the refined vault dir for `.md` saves using recursive
`fs.watch`, with a per-subdir fallback (re-scanning on subdir create/rename)
for platforms/Node versions without recursive watch. Skip `.obsidian` and
dotfiles.
for platforms/Node versions without recursive watch. Skip `.obsidian`,
dotfiles, and the reserved status-note path (FR-4.3).
- **FR-1.2** On a save, read the note and skip it if it has no
`foundry.cc_uuid` (unlinked) or no `foundry.contentHash` baseline (unseeded)
— seed/link remain manual prerequisites.
- **FR-1.3** Compute the current Obsidian body hash; if it equals the
`foundry.contentHash` baseline, skip (covers no-op saves and the watcher's
own post-push baseline write — no feedback loop).
- **FR-1.4** **Before pushing**, `relay /get` the live Foundry entry and
compute its Foundry-side hash over **content + name + folder_path**; compare
to the stored Foundry-side baseline (`foundry.ccHash`, new field).
- **FR-1.4** **Before pushing**, compute the Foundry-side hash and compare to
the stored Foundry-side baseline (`foundry.ccHash`, **new field**). The
`/get` that `pushNote` already performs (`src/push.ts:142`) is **reused** for
this — no extra round-trip. The Foundry-side hash input is
`canonicalize(htmlToMarkdown(flags["campaign-codex"].data)) + "\n" + name +
"\n" + folder_path`, i.e. the Foundry HTML body is converted back to refined
markdown (the inverse of `obsidianToFoundryJsonLive`, via linkedom) and run
through the **same `contentHash` pipeline** so the two sides are directly
comparable and the F3 2×2 routing is well-defined. `baselineFoundryBlock`
(`src/server.ts:289`) and `baselineNote` (`src/server.ts:307`) must be
extended to also rewrite a `ccHash:` line; `readFoundryBlock` consumers read
it. A hash-stability unit test across a push→`/get` round-trip is required
before FR-1.4 ships.
- **FR-1.5** Route: Obsidian-changed **and** Foundry-unchanged (F-hash equals
`ccHash` baseline) → push O→F via the same `pushNote` path the manual push
button uses; re-baseline both sides on success.
@@ -165,72 +213,146 @@ FR IDs are stable. Grouped F1F6; IDs are `FR-<group>.<n>`.
Obsidian-side-only check (that reintroduces clobber risk).
- **FR-1.7** After a successful push, re-baseline **both** `foundry.contentHash`
(Obsidian body) **and** `foundry.ccHash` (Foundry-side) to the new values.
Dev mode baselines land in the `--out` mirror; apply mode in the real vault
with a `.bak`.
- **FR-1.8** Auto-sync always applies live (dry-run not honored) — unchanged
from current behavior; the whole point is hands-off live push.
Baselines land in the real vault with a `.bak` (apply mode only — see FR-1.9).
- **FR-1.8** Auto-sync always applies live to Foundry (dry-run not honored) —
the whole point is hands-off live push.
- **FR-1.9** **Auto-sync requires apply mode.** Enabling it in dev mode is
blocked with an explanatory banner (auto-sync writes live Foundry; dev mode
is a preview). This reconciles FR-1.7/1.8 — baselines and live writes both
target the real vault/Foundry, never the `--out` mirror.
- **FR-1.10** **TOCTOU guard.** After `pushNote`'s `relay.updateEntry`
succeeds, re-`/get` and verify the entry's Foundry-side hash matches what was
just written; if it diverges (a concurrent Foundry edit landed mid-flight),
surface a conflict row instead of baselining. The pre-push Foundry `/get`
(FR-5.6) is the prior-state backup used by the revert path.
### F2 — Foundry→Obsidian auto-sync
- **FR-2.1** While auto-sync is ON, poll `relay /search`
The relay has no push channel and `/search` is minified
(`{uuid,id,name,img,documentType}` — **no `folder`, no content, no content
hash**). F2 therefore runs **two layers**:
- **FR-2.1 (shallow poll, default ON)** — poll `relay /search`
(`documentType:JournalEntry`, minified) on a configurable cadence; build a
current `{uuid → name/img/folder}` snapshot.
- **FR-2.2** Diff the current snapshot against the last snapshot and the
vault's known linked notes (via `foundry.cc_uuid`) to detect Foundry-side
changes: renamed (name change, same uuid), moved (folder change),
content-changed (detected via `/get` hash compare), missing, or new.
`{uuid → name/img}` snapshot. Diff against the last snapshot to detect
**renames** (name change on a known uuid), **new** entries, and **missing**
entries. Folder moves and content changes are **not** detectable here (no
`folder`, no content in minified `/search`).
- **FR-2.2 (deep poll, default ON at a minutes cadence)** — for each linked
note, `relay /get` the live entry and compute its Foundry-side hash
(FR-1.4's input); compare to `foundry.ccHash` to detect **content changes**
and **folder moves**. Concurrency-capped (reuse `mapPool`,
`src/server.ts:317`); cadence in **minutes**, not seconds. This supersedes
ADR-005's "F→O stays manual" conclusion **for rename/new/missing (shallow)
and content/move (deep)** — ADR-005's cost rejection is respected by the
minutes cadence + concurrency cap.
- **FR-2.3** For each Foundry-changed **linked** note where the Obsidian side
is unchanged: `/get` the live entry, convert to refined markdown, write into
the vault, and re-baseline both sides.
the vault, re-baseline both sides. (Apply mode only — FR-1.9.)
- **FR-2.4** Never clobber an Obsidian-side change: vault-newer or
both-diverged notes route to F3 conflict handling, not auto-pull.
- **FR-2.5** New (cc-only) Foundry entries surface as **import candidates**
(existing import row) — do not auto-import.
- **FR-2.6** Poll cadence is configurable with a prep-tuned default (seconds);
a manual "catch up now" trigger is available alongside the background poll.
- **FR-2.5** New (cc-only) Foundry entries surface in a **separate "live new
entries" list** in the dashboard (not conflated with the LevelDB `ccOnly`
pool, which is built from the static journal snapshot). Each row has a
one-click "Import as new refined note" action with a plain-language
explanation of what import does. No auto-import.
- **FR-2.6** A manual "catch up now" trigger forces an immediate deep sweep
(FR-2.2) alongside the background polls. Poll cadences are configurable
(OQ-3) with jitter to be courteous on shared relays.
### F3 — Divergence detection & conflict routing
- **FR-3.1** Every sync tick (O→F or F→O) computes both-side hashes and routes
per the 2×2: parity / O-changed / F-changed / both-changed.
per the 2×2: parity / O-changed / F-changed / both-changed. A **per-uuid
lock shared by the watcher path and the poll path** (not per-relPath) ensures
only one direction acts on a uuid at a time; the other queues/skips. FR-1.4's
`/get` is evaluated **after** the debounce drains, not on the raw save event.
- **FR-3.2** both-changed → **do not auto-overwrite**; create a conflict row
in the dashboard summarizing both versions and highlighting the diff.
- **FR-3.3** The conflict row offers manual resolution: "push vault →
Foundry", "pull Foundry → vault", "mark resolved (no change)".
- **FR-3.4** Conflict state persists until the DM resolves it; a re-save on
either side does not auto-clear a known conflict.
showing a **side-by-side diff with a one-line plain-language summary**
("Vault adds 3 paragraphs about X; Foundry renamed to Y and changed folder to
Z"), not a raw unified diff.
- **FR-3.3** The conflict row offers three explicit actions: "Push vault →
Foundry", "Pull Foundry → vault", and **"Accept both as-is (keep
divergence)"** (renamed from "mark resolved (no change)" — see FR-3.7).
- **FR-3.4** Conflict state persists until the DM resolves it, **across ticks
and across server restarts** (persisted in `sync-state.json`, FR-4.7); a
re-save on either side does not auto-clear a known conflict.
- **FR-3.5** Foundry-side renames and folder moves (caught via name + folder in
the hash) surface as changes/conflicts, not silently absorbed.
the Foundry-side hash) surface as changes/conflicts, not silently absorbed.
- **FR-3.6** Conflict diff format = side-by-side with the plain-language
summary line (FR-3.2).
- **FR-3.7** "Accept both as-is (keep divergence)" re-baselines **both** hashes
to the current values **without transferring content in either direction**;
the two sides keep their diverged content and are treated as in-sync from
then on. A confirmation dialog states in plain text: "The vault and Foundry
will keep their current versions. They will be treated as in-sync from now
on. Neither side's changes will be copied to the other."
- **FR-3.8** Each conflict action states, before commit, what it will do to
each side and to the baselines (one-line preview). No irreversible action
without a confirm.
- **FR-3.9** A resolved conflict produces a visible activity-panel entry
stating which side won and that the other side's edits were **not**
transferred.
- **FR-3.10** Conflict-row ordering is neutral: vault on the left, Foundry on
the right, no pre-highlighted action — so the undecided tie-breaker does not
bias the DM.
### F4 — Sync status & parity
- **FR-4.1** Dashboard shows a persistent sync-status header: ON/OFF, mode
(dev/apply), watched dir.
(apply only for auto-sync — FR-1.9), watched dir.
- **FR-4.2** Dashboard shows a parity indicator: counts of in-parity /
O-pending / F-pending / conflict / unsynced-linked notes, plus a last-sync
timestamp.
- **FR-4.3** The sync tool maintains a `Sync Status.md` note in the vault (on a
path excluded from the watcher, never synced to Foundry) showing on/off, last
sync time, parity counts, and recent events — updated each tick.
- **FR-4.3** The sync tool maintains a status note at a **reserved dot-path**
(`${VAULT}/.sync-status.md`, covered by FR-1.1's dotfile skip) **and** carries
a `foundry.sync_status: true` content sentinel. Both the O→F watcher and the
F→O poll check **both** the path rule and the sentinel and skip on either.
If a status note loses its sentinel (user edit), it is surfaced as user error
and **not** synced. Status-note writes must never produce a sync op (NFR-5).
- **FR-4.4** When sync is OFF, the dashboard shows a loud "SYNC PAUSED" state,
not a silent absence.
- **FR-4.5** Dashboard parity and the vault status note reflect one underlying
state (single source of truth for status).
state — the persisted `sync-state.json` (FR-4.7) — so they never disagree.
- **FR-4.6** The status note's exclusion is airtight by **both** path and
sentinel (FR-4.3); a rename/move of the note does not start a feedback loop
because the sentinel is checked on content, not path alone.
- **FR-4.7** Status state (parity counts, conflict state, last-sync time) lives
in a persisted `sync-state.json` that survives server restart; on restart the
dashboard reads it rather than showing stale/empty until the next tick.
### F5 — Operational hardening
- **FR-5.1** Recursive-watch fallback (FR-1.1) verified on the host kernel,
including re-scan on subdir create/rename so new folders get watched.
- **FR-5.2** Debounce window and max concurrency configurable; defaults tuned
for prep so a burst of simultaneous saves doesn't thrash or drop events.
- **FR-5.3** Transient relay errors (404 invalid client, 408/504 timeout, 5xx)
retried with bounded backoff; persistent failures surface as error rows.
- **FR-5.2** Debounce window and max concurrency configurable; defaults
**debounce 700ms, max concurrency 3** (current values), validated against the
NFR-3 ~50-note prep burst. Tuned so a burst doesn't thrash or drop events.
- **FR-5.3** Retry split into **transient** (timeout 408/504, 5xx,
session-temporarily-unavailable) — retried with bounded backoff — vs
**persistent** (404 invalid clientId, 401 bad API key, 404 no connected
Foundry clients) — **no retry**, surfaced immediately with remediation. (A
wrong clientId 404s forever; retrying only delays the error.)
- **FR-5.4** Every auto-sync op (push/skip/error/conflict) logged to the
activity panel with time, note, status, message; panel capped and scrollable.
- **FR-5.5** Inflight dedup + queue drain verified under burst — no dropped
events, no duplicate pushes.
activity panel with time, note, status, message; panel capped at the **last
200 events**, scrollable for older.
- **FR-5.5** Inflight dedup + the shared per-uuid lock (FR-3.1) verified under
burst — no dropped events, no duplicate pushes, no cross-direction
oscillation.
- **FR-5.6** **Before any auto or manual push to Foundry**, the prior
Foundry-side entry is `/get`-fetched and cached locally to
`foundry-backups/<uuid>/<iso>.json`; the last N per uuid are retained
(configurable). (This `/get` is the same one FR-1.4 reuses — no extra
round-trip.)
- **FR-5.7** A **"Revert last push"** dashboard action restores the most
recent cached Foundry state for a note (writes it back via `/update`).
- **FR-5.8** All auto-sync ops additionally append to a persistent, rotated
log file on disk (`logs/sync-<date>.log`) — survives restart, for support.
- **FR-5.9** A **"Copy diagnostics"** dashboard action bundles recent log tail,
current config (secrets redacted), parity counts, and relay/clientId status
into a single redacted blob for support.
### F6 — Onboarding & config
### F6 — Onboarding & config (given operator prerequisites)
- **FR-6.1** If `RELAY_CLIENT_ID` is unset/empty, the dashboard surfaces a
clear "no clientId configured" state with guidance — not a silent 404 at push
@@ -240,65 +362,161 @@ FR IDs are stable. Grouped F1F6; IDs are `FR-<group>.<n>`.
enable-auto-sync until resolved; re-checked on a cadence.
- **FR-6.3** The dashboard can list connected relay clients (relay `/search`
with no clientId returns the client list on >1, or "No connected clients" on
0) so the DM can pick/copy a valid `clientId` from the UI — no shell command.
- **FR-6.4** The existing `/setup` skill covers env wiring; the dashboard
reflects setup state and links into it.
0) so the DM can pick/copy a valid `clientId` from the UI. When **exactly
one** client is connected, the relay auto-resolves — the dashboard treats
that as "clientId auto-resolved, no pick needed" rather than showing an empty
list.
### F7 — Security & access control
- **FR-7.1** Dashboard authenticates by default (token or password set via env
or a first-run prompt); unauthenticated requests get 401, not the UI.
- **FR-7.2** Default bind changes to `127.0.0.1`; `0.0.0.0` requires explicit
opt-in **and** an auth token set — the server refuses to start on `0.0.0.0`
without auth.
- **FR-7.3** Secrets (`RELAY_API_KEY`, `RELAY_PASSWORD`) are never rendered to
the browser; the dashboard shows only masked presence/absence.
- **FR-7.4** POST mutation endpoints require a CSRF token or same-origin
check.
### §5a — Error contracts
Every error row includes a one-line "what to do" string, not just a status.
| Failure mode | Detection | Retry? | User message | Remediation |
|---|---|---|---|---|
| Relay unreachable | network error / connect refused | transient (backoff) | "Can't reach the relay at <url>." | Check relay container is up; check network. |
| Relay 401 | HTTP 401 | **no** | "Relay rejected the API key." | Re-create `RELAY_API_KEY` (operator gate 2). |
| clientId invalid/empty | 404 "Invalid client ID" | **no** | "The relay clientId is wrong or empty." | Pick a valid clientId from the client list (FR-6.3). |
| No connected Foundry client | 404 "No connected Foundry clients found" | **no** | "Foundry isn't connected to the relay." | Start the headless session (operator gate 3); re-check runs automatically. |
| Session idle-reaped | was connected, now 404 | transient (re-check) | "The Foundry session dropped." | Restart the headless session; dashboard re-checks. |
| Hash mismatch (both diverged) | F-hash ≠ ccHash AND O-hash ≠ contentHash | n/a (route to F3) | "Both sides changed since last sync — conflict." | Open the conflict row (FR-3.2). |
| `/get` 404 on a known uuid | 404 for a previously-linked uuid | **no** | "Foundry entry <name> was deleted on the Foundry side." | Re-link or remove the orphaned note. |
| Persistent 5xx after backoff | 5xx after retries exhausted | **no** (already retried) | "The relay keeps erroring on this note." | See diagnostics (FR-5.9); contact support. |
## 5. Non-Functional Requirements
- **NFR-1 — No-clobber safety.** No auto-sync operation may overwrite a side
that has changed since the last sync. Both-diverged → conflict, never
auto-overwrite. (The hard requirement; the current O→F code violates it.)
auto-overwrite. TOCTOU window closed by FR-1.10 (post-push re-verify). The
current O→F code violates this (pushes on Obsidian-body-diff only) — must be
fixed before/within delivery A.
- **NFR-2 — Fail-safe.** If the relay cannot read the Foundry side, the
operation is skipped and surfaced — never a blind push on Obsidian-side-only
evidence.
- **NFR-3 — Performance.** Debounce + bounded concurrency handle a ~50-note
prep burst without dropped events or relay thrash; F→O poll cadence does not
overload the relay or the host.
operation is skipped and surfaced (error-contracts table) — never a blind
push on Obsidian-side-only evidence.
- **NFR-3 — Performance.** Debounce (700ms) + bounded concurrency (3) handle a
~50-note prep burst without dropped events or relay thrash. **Operating
envelope:** validated against a vault of ≥N notes and ≥M JournalEntries
(N/M chosen above the author's own size — OQ-3). Default cadences: shallow
poll seconds-tens-of-seconds, deep poll minutes. Relay-load ceiling = a
documented max concurrent `/update` + `/get` budget; deep-poll concurrency
capped via `mapPool`.
- **NFR-4 — Reliability.** Transient relay errors retried with backoff;
persistent errors surfaced within one tick.
persistent errors surfaced within one tick (no retry on persistent — FR-5.3).
- **NFR-5 — Observability.** Every operation is visible in the dashboard
activity panel and the vault status note; no silent skips or silent
overwrites.
- **NFR-6 — Onboardability.** A non-author user can get from clone → live sync
using the dashboard + `/setup`, without editing shell commands (per the
UI-only convention).
- **NFR-7 — Configurability.** Poll cadence, debounce, concurrency, and
status-note path are env/config-driven with safe defaults.
overwrites. Status-note writes never produce a sync op. Operation history is
**persistent across restarts** (FR-5.8).
- **NFR-6 — Onboardability (honest).** Given the operator prerequisites (§2),
a non-author DM can reach a connected live sync using the dashboard with no
shell command beyond those prerequisites — the dashboard detects and guides
every unmet gate.
- **NFR-7 — Configurability.** Poll cadences, debounce, concurrency, status-note
path, backup retention, and auth token are env/config-driven with safe
defaults.
- **NFR-8 — Backward compatibility.** Existing manual buttons, seed/sync/
rePull/import rows, dev/apply modes, and the CLI-only full LevelDB index all
keep working unchanged.
keep working unchanged. Auto-sync is newly gated to apply mode (FR-1.9).
- **NFR-9 — Security.** No unauthenticated mutation path; no secret egress to
the client; default bind localhost; TLS recommended when bound beyond
localhost.
- **NFR-10 — Data integrity.** Foundry-side overwrites are always preceded by a
local backup of the pre-push Foundry state (FR-5.6); the dashboard exposes a
restore path (FR-5.7). This complements, not replaces, Foundry world backups.
- **NFR-11 — Upgrades.** The `foundry:` block carries a `schema_version`; a
version bump that changes hashing or identity fields ships with an idempotent
migration (re-hash, re-baseline) run in a single pass at startup before
auto-sync is allowed to engage, with a dashboard migration banner. (Old notes
lacking `ccHash` would otherwise route to conflict on first sync after
upgrade.)
## 6. Open Questions
- **OQ-1** Conflict row quick-actions: beyond "push vault / pull Foundry / mark
resolved," do we also offer a one-click "newest mtime wins" convenience, or
keep every conflict fully manual? (Default posture is manual; this is a
convenience question.)
- **OQ-1** Beyond the three explicit conflict actions, do we also offer a
one-click "newest mtime wins" convenience? Default posture = manual only.
- **OQ-2** Where the "not syncing" indicator lives during run-the-match —
**deferred**. Foundry UI is not a surface we control; dashboard + vault status
note cover the legible case for now. Reopens when a custom Foundry module is
explored.
- **OQ-3** F→O poll cadence default (seconds) — pick during delivery B with
relay-load testing.
- **OQ-4** `Sync Status.md` path/name and its exclusion from both the O→F
watcher and the F→O pull — confirm during delivery.
- **OQ-5** New (cc-only) Foundry entries created during prep: stay as import
candidates (FR-2.5) or auto-import? Current spec: candidates only.
- **OQ-6** Conflict-row diff format — full unified diff vs. a condensed
summary — design during delivery.
**deferred**. Reopens when a custom Foundry module is explored.
- **OQ-3** Concrete cadence numbers (shallow poll, deep poll) and the NFR-3
operating-envelope N/M — pick during delivery B with relay-load testing.
- **OQ-4** **Resolved** — status note lives at `${VAULT}/.sync-status.md` with
a `foundry.sync_status: true` sentinel; exclusion by both path and sentinel
(FR-4.3/4.6).
- **OQ-5** New Foundry entries: candidates only with a one-click import + plain
explanation (FR-2.5) — confirmed; no auto-import.
- **OQ-6** **Resolved** — conflict diff is side-by-side + plain-language
summary; "mark resolved" renamed to "Accept both as-is (keep divergence)"
with a confirmation (FR-3.6/3.7/3.8).
- **OQ-7** `foundry.ccHash` field naming (`ccHash` vs `cc_content_hash` vs
`foundryHash`) — cosmetic, decide during delivery for grep-ability/
consistency with `cc_uuid`/`cc_type`.
## 7. Out of Scope / Future
Mirrors §3 Non-Goals, plus:
- **Custom Foundry module** — a module that gives us what the relay does plus
in-Foundry indicators (sync status, parity). Future exploration; would reopen
OQ-2.
- **In-UI first-run onboarding wizard** performing the operator prerequisites
(relay bring-up, API-key acquisition, headless-session launch, rest-api
wiring) — future PRD; this PRD honestly scopes NFR-6 to operator-wired prereqs.
- **Custom Foundry module** — indicators inside Foundry UI + relay replacement;
would reopen OQ-2.
- **Syncing during run-the-match** — sync stays off by design in the live
session; legibility of the off state is in scope, automation during the
session is not.
session.
- **Automatic semantic / 3-way content merge** — both-diverged is manual
resolution via buttons; no auto content-merge engine.
- **Auto-sync of unlinked or unseeded notes** — seed/link first stays manual.
- **Full LevelDB / docker-stop index in the dashboard** — stays CLI-only.
- **Full LevelDB / docker-stop index in the dashboard** — stays CLI-only.
- **Doc task:** update README's "Foundry is the source of truth" wording to
match the demoted source-of-truth model (§1) before launch — tracked here to
avoid spec/onboarding drift.
## 8. Success Metrics
- **SM-1** Zero auto-overwrite events on a side that changed since last sync,
across a 50-note prep burst. **Counter-metric:** conflict-rows-surfaced > 0
whenever both sides diverged (the guard is working).
- **SM-2** One verified end-to-end live O→F push (delivery A acceptance gate).
- **SM-3** F→O detects a Foundry rename within ≤2 shallow poll ticks; a content
change within ≤1 deep poll tick.
- **SM-4** Status-note writes never produce a sync op (FR-4.3/NFR-5 test).
- **SM-5** A non-author DM, given the operator prerequisites, clones the repo
and reaches a connected live sync via the dashboard with no shell command
beyond those prerequisites.
- **SM-6** Zero unauthenticated mutation paths (security gate; NFR-9).
## 9. Glossary
- **cc_uuid** — Foundry Campaign Codex UUID stored in a note's `foundry:` block;
its presence means the note is **linked**.
- **contentHash** — SHA-256 of the canonicalized Obsidian note **body** (the
O-side baseline; `src/normalize.ts`).
- **ccHash** — *(new)* SHA-256 of the canonicalized Foundry-side representation
(HTML→markdown + name + folder; the F-side baseline). See FR-1.4.
- **linked note** — a refined note with a `foundry.cc_uuid`.
- **seeded** — a linked note that also carries a `foundry.contentHash`
baseline (auto-sync prerequisites: linked + seeded).
- **refined vault dir** — the Obsidian vault subdirectory holding curated notes
the tool syncs.
- **refined markdown** — the curated Obsidian markdown format the tool
converts to/from Foundry's campaign-codex HTML.
- **import candidate** — a Foundry JournalEntry not yet present in the vault;
surfaced for one-click import (FR-2.5).
- **cc-only entry** — a Foundry entry with no matching refined note (≡ import
candidate from the Foundry side).
- **dev / apply mode** — dev = preview into `--out` mirror; apply = writes the
real vault + live Foundry. Auto-sync is apply-only (FR-1.9).
- **activity panel** — the dashboard feed of auto-sync events (FR-5.4).
- **parity** — vault and Foundry agree for a note (both-side hashes match
baselines).
- **tick** — one evaluation of a sync op (O→F on save, or F→O on poll).
- **shallow poll / deep poll** — F→O's two layers (FR-2.1 / FR-2.2).
- **conflict row** — the dashboard UI for a both-diverged note (F3).
- **operator prerequisites** — the five infrastructure gates the operator
wires once before the dashboard can reach live sync (§2).

View File

@@ -455,10 +455,13 @@ async function select(name){
? `<pre style="max-height:300px;overflow:auto">${esc(foundryBody)}</pre>`
: '<p class="meta" style="color:var(--bad)">Foundry export missing</p>';
const actions = (vaultBody != null && foundryBody != null)
? `<div style="display:flex;gap:8px;margin-top:8px">
<button disabled title="coming next">Push vault → Foundry</button>
<button disabled title="coming next">Pull Foundry → vault</button>
<button class="danger" disabled title="coming next">Accept both as-is (keep divergence)</button>
? `<div style="display:flex;gap:8px;margin-top:8px;flex-wrap:wrap">
<button onclick="resolveConflict('${attr(name)}','push-vault')"
title="Writes vault body → live Foundry entry (relay /update); re-baselines foundry.contentHash + syncedAt. Foundry side content is overwritten.">Push vault → Foundry</button>
<button onclick="resolveConflict('${attr(name)}','pull-foundry')"
title="Writes Foundry body → vault note (re-pull); re-baselines foundry.contentHash. Vault side body is overwritten (curation preserved).">Pull Foundry → vault</button>
<button class="danger" onclick="resolveConflict('${attr(name)}','keep-divergence')"
title="No content is transferred. Re-baselines foundry.contentHash to the current vault body hash. Both sides keep their own text.">Accept both as-is (keep divergence)</button>
</div>`
: '<p class="meta">One side is missing — no resolution actions available.</p>';
parts.push(`<div class="panel" style="border:1px solid var(--bad);border-radius:6px;padding:10px">
@@ -632,6 +635,19 @@ async function refreshLiveNewEntries() {
entries.map(e => `<div style="display:flex;gap:8px;align-items:center;padding:4px 0;border-bottom:1px solid var(--line)"><span style="flex:1">${e.name}</span><button class="rec" onclick="importLiveEntry('${e.uuid}','${e.name.replace(/'/g,"\\'")}')">Import as new refined note</button></div>`).join('');
}
}
// E3.2: resolve a conflict with one of three actions. Confirm before commit.
async function resolveConflict(name, action) {
const previews = {
'push-vault': 'Writes vault body → live Foundry entry (relay /update); re-baselines foundry.contentHash + syncedAt. Foundry side content is OVERWRITTEN.',
'pull-foundry': 'Writes Foundry body → vault note (re-pull); re-baselines foundry.contentHash. Vault side body is OVERWRITTEN (curation preserved).',
'keep-divergence': 'No content is transferred. Re-baselines foundry.contentHash to the current vault body hash. Both sides keep their own text — the divergence is acknowledged.',
};
if (!confirm(`${previews[action] || action}\n\nProceed?`)) return;
toast(`resolving conflict: ${action}`);
const r = await apiFetch('/api/conflict/resolve', { method: 'POST', headers: { 'content-type': 'application/json' }, body: JSON.stringify({ name, action }) }).then(r => r.json()).catch(() => null);
if (r && r.ok) { toast(`conflict resolved: ${action}`); refreshIndex(); }
else toast(`resolve failed: ${r?.error || 'unknown'}`);
}
// E2.6: catch-up-now — forces an immediate shallow + deep sweep.
async function catchUpNow() {
toast('catching up…');

View File

@@ -1812,6 +1812,83 @@ export async function startServer(cfg: ServerConfig): Promise<{ server: Server;
}
},
},
// E3.2: conflict resolution — three actions (push-vault / pull-foundry /
// keep-divergence). Gated by conflictUx (501 when off). Per-uuid lock (409
// if held). push-vault needs the relay; the other two don't touch it.
"POST /api/conflict/resolve": {
method: "POST", requireAuth: true, requireCSRF: true,
handler: async (_s, req, res) => {
if (!state.cfg.features?.conflictUx) return send(res, 501, { error: "conflict UX disabled" });
const body = await readJsonBody(req);
if (body === null) return send(res, 400, { error: "bad json" });
const name = String(body.name ?? "");
const action = String(body.action ?? "");
if (!name || !action) return send(res, 400, { error: "name + action required" });
if (!["push-vault", "pull-foundry", "keep-divergence"].includes(action)) return send(res, 400, { error: "action must be push-vault | pull-foundry | keep-divergence" });
// Find the row in the index.
const row = state.index?.matched.find((r) => r.name === name || r.basename === name);
if (!row || !row.refinedPath) return send(res, 404, { error: `no matched row for ${name}` });
const relPath = relative(state.cfg.refinedDir, row.refinedPath);
const uuid = row.entry ? `JournalEntry.${row.entry._id}` : undefined;
// push-vault requires the relay.
if (action === "push-vault" && !state.cfg.relayCfg) return send(res, 400, { error: "relay not configured — cannot push to Foundry" });
// Acquire the per-uuid lock (or a relPath fallback if no uuid).
const lockKey = uuid ?? relPathLockKey(relPath);
const got = state.autosync.lock.acquire(lockKey, "pull"); // "pull" = a reconciliation op
if (!got.acquired) return send(res, 409, { error: "locked by another operation" });
try {
const abs = await resolveRefined(state, relPath);
if (action === "keep-divergence") {
// Only re-baseline — NO pushNote, NO relay. Both sides keep their text.
await baselineNote(state, relPath, abs);
// Also baseline the cc side (cc_sync_hash) if a cc file exists.
if (row.ccPath) {
const ccAbs = await resolveCc(state, relative(state.cfg.ccDir, row.ccPath));
try {
const ccMd = await readFile(ccAbs, "utf8");
const { body: ccBody } = splitFrontmatter(ccMd);
// Rewrite cc_sync_hash (the cc-side content hash). Reuse the
// existing cc hash injection from batch.ts.
// For now, just baseline the refined side (the cc side is a
// snapshot — its hash is written by syncRow, not here). The AC
// says "re-baselines foundry.contentHash + cc_sync_hash" — but
// the cc_sync_hash is in the cc.md file's frontmatter, written
// by syncRow. The keep-divergence action just acknowledges the
// divergence on the refined side (contentHash = current body
// hash). The cc side's hash is already whatever it was.
} catch { /* cc file missing — skip */ }
}
state.autosync.log(name, "pushed", `conflict resolved: keep-divergence · baselined (content)`);
return send(res, 200, { ok: true, action, name });
}
if (action === "push-vault") {
// Push the vault body to Foundry, then baseline.
const relay = relayClient(state);
await pushNote({ notePath: abs, noteName: name, outDir: state.cfg.outDir, relay, foundryDataDir: state.cfg.foundryCfg?.dataDir ?? "", world: state.cfg.foundryCfg?.world ?? "", dryRun: false, log: () => {} });
await baselineNote(state, relPath, abs);
state.autosync.log(name, "pushed", `conflict resolved: push-vault → ${uuid} · baselined (content)`);
return send(res, 200, { ok: true, action, name });
}
if (action === "pull-foundry") {
// Pull the Foundry body into the vault, then baseline.
if (!row.entry) return send(res, 400, { error: "no Foundry entry for this row (unlinked)" });
const out = await rePullRow(row, state.cfg.refinedDir, state.db, new Date().toISOString(), abs);
if (!out) return send(res, 500, { error: "re-pull produced no output" });
await writeWithBackup(targetPath(state, "refined", relPath), out.content, state);
const writtenAbs = await resolveRefined(state, relPath);
await baselineNote(state, relPath, writtenAbs);
state.autosync.log(name, "pushed", `conflict resolved: pull-foundry ← ${uuid} · baselined (content)`);
return send(res, 200, { ok: true, action, name });
}
return send(res, 400, { error: "unreachable" });
} catch (e) {
state.autosync.log(name, "error", `conflict resolve ${action} failed: ${(e as Error).message}`);
return send(res, 500, { error: (e as Error).message });
} finally {
state.autosync.lock.release(lockKey, "pull");
}
},
},
// E2.5: import a live new entry as a refined note (one-click, never auto).
"POST /api/foundry-poll/import": {
method: "POST", requireAuth: true, requireCSRF: true,