feat(specs): velvet-auction exercises group tools; FU-9 playtest gate; docs drift fix

FU-12 — velvet-auction.yaml now uses the group-encounter tools: - minPlayers: 3 (lobby-gated party heist, matches PRD UJ-1) - passiveReveals: Insight/15 (notices Karr's tell — Feature B) - group_stealth skillChecks entry (group Stealth, successRule: majority, durationSeconds: 60) + skill_check_group_emit and character_status added to the tools list. - specsToolsConsistency: emptied the NOT_YET_REFERENCED allowlist (skill_check_group_emit + character_status are now referenced); all 8 registered tools are reachable from specs. Validated: specLoader + specsToolsConsistency + full unit suite (527) pass. FU-9 — docs/release-playtest-checklist.md: the 7-step manual pre-release multi-player playtest checklist checked into the repo as a release gate (was buried only in the arch doc). Includes pass criteria (no orphaned thread / lost roll / raw-JSON leak) + the NFR-3/NFR-4 latency checklist. docs/project-overview.md drift fix: pino -> src/lib/logger.ts (custom plaintext, ADR-002); primary LLM -> minimax-m3 via LiteLLM (LITELLM_MODEL); test count 22 -> 58; lib/ description; relabel dynamic goal registration as delivered. Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-22 16:02:00 +00:00
parent c549aaa49f
commit 37a1a3d421
4 changed files with 110 additions and 9 deletions
--- a/docs/project-overview.md
+++ b/docs/project-overview.md
@@ -4,7 +4,7 @@

 ## What it is

-A Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. The bot loads a YAML spec, narrates the scene via an LLM (Gemma 4 IT e2b through LiteLLM with Ollama fallback), voices NPCs with stable personas, runs skill checks via Discord embeds, and persists NPC memory + encounter history into a graph database through GraphMCP (JSON-RPC over HTTP). Optional Foundry VTT integration pulls live character stats and awards XP via an external relay.
+A Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. The bot loads a YAML spec, narrates the scene via an LLM (minimax-m3 through LiteLLM, with Ollama as fallback), voices NPCs with stable personas, runs skill checks via Discord embeds, and persists NPC memory + encounter history into a graph database through GraphMCP (JSON-RPC over HTTP). Optional Foundry VTT integration pulls live character stats and awards XP via an external relay.

 ## Who it serves

@@ -16,14 +16,14 @@ Discord community members playing D&D 5e in the Land of Mardonar. The DM runs `/
 |---|---|
 | Runtime | Node.js 22 (ESM, TypeScript 5.8 strict) |
 | Discord | discord.js v14 |
-| LLM (primary) | LiteLLM proxy (env: `LITELLM_BASE_URL`) |
+| LLM (primary) | LiteLLM proxy — minimax-m3 (env: `LITELLM_BASE_URL`, `LITELLM_MODEL`) |
 | LLM (fallback) | Ollama (env: `OLLAMA_BASE_URL`) — `gemma4-it:e2b`, 128k context |
 | Session cache | Redis (ioredis), 12h TTL |
 | Graph DB | Neo4j (via GraphMCP JSON-RPC, not direct) |
 | Lore / NPC memory | GraphMCP HTTP JSON-RPC server |
 | Foundry VTT | External relay (optional, requires API key) |
 | Validation | Zod (env + encounter spec) |
-| Logging | pino + pino-pretty |
+| Logging | `src/lib/logger.ts` (custom plaintext — pino removed) |
 | Testing | Vitest 3 (unit + integration) |
 | Build | tsc → multi-stage Node 22 alpine Dockerfile |

@@ -62,12 +62,12 @@ src/
 ├── spec/          # YAML encounter loader + Zod schema
 ├── persona/       # persona.yaml loader
 ├── config.ts      # Zod env validation
-├── lib/           # logger
+├── lib/           # logger (custom plaintext), historyTrim, skillCheckMessages
 ├── scripts/       # deploy-commands (slash command registration)
 └── types/         # shared interfaces + CONTEXT_BUDGET
 ```

-Plus `specs/` (8 encounter YAML files), `tests/` (22 test files), `data/` (runtime tally + summaries), and `Docs/` (pre-existing project documentation, partially out of date).
+Plus `specs/` (8 encounter YAML files), `tests/` (58 test files), `data/` (runtime tally + summaries), and `Docs/` (pre-existing project documentation, partially out of date).

 ## Documentation

@@ -82,7 +82,7 @@ Plus `specs/` (8 encounter YAML files), `tests/` (22 test files), `data/` (runti
 ## Key features in the current codebase

 - **Per-encounter tool filtering.** Each spec declares which tool plugins are active.
- **Dynamic goal registration** (the active PRD feature) — `tools/goalRegister.ts` lets the LLM add new goals mid-encounter.
+- **Dynamic goal registration** (delivered) — `tools/goalRegister.ts` lets the LLM add new goals mid-encounter.
 - **Three-pattern tool parser** — handles fenced `tool_call`, bare `tool_call` header, and fuzzy bare JSON, so even smaller models can drive tools.
 - **Self-spinning VTT relay** — when the relay is down, the bot handshakes via RSA-OAEP and launches a headless Foundry session on demand.
 - **Burst cap with drop notices** — if too many messages arrive before the last LLM response, the bot drops the excess and posts a tone-aware notice.
--- a/docs/release-playtest-checklist.md
+++ b/docs/release-playtest-checklist.md
@@ -0,0 +1,76 @@
+# Multi-Player Playtest Checklist (Release Gate)
+
+Manual pre-release checklist for **group-encounter** features (Features A–E + FR-43).
+Required before any release that touches the lobby, group checks, passive reveals,
+timed checks, or story-status surfaces.
+
+> **Why this exists.** Group checks have **unit + integration coverage only — no
+> live E2E tier**. The one-token constraint makes true multi-player live E2E
+> impossible without a synthetic-Interaction forge (integration, not live) or a
+> second bot token (violates the constraint). The deterministic core is fully
+> unit/integration covered; the Discord fan-out surface is shared with existing
+> single-player live ACs. **This manual checklist is the safety net for the
+> residual risk** (real Discord fan-out, gateway event ordering, ephemeral-in-thread
+> quirks, burst rate-limiting). Source: `_bmad-output/arch/arch-mardonar-encounter-engine-2026-06-20/architecture.md §8` (closes Murat #5).
+
+---
+
+## Pass criteria
+A pass = **no orphaned thread, no lost roll, no raw-JSON leak to players.**
+Any of those = fail the release and fix before shipping.
+
+## The 7 steps
+Run these against a real Discord guild with ≥3 test players and a group spec
+(e.g. `velvet-auction`, `minPlayers: 3`).
+
+1. **Lobby → start.** N players join the lobby; `Start` enables at `minPlayers`;
+   starter presses Start. Opening narrative posts; passive reveals fire for
+   qualifying players. Confirm the auto `[SESSION] entered` announcement is
+   **suppressed** for the group encounter.
+2. **Group check, all roll.** LLM emits a group skill check. Every targeted
+   player clicks **Roll**; each gets an ephemeral with their d20+mod vs DC; the
+   central scoreboard fills live and finalizes with a group SUCCESS/FAILURE +
+   `[SKILL CHECK RESULT]` system message. Confirm no double rolls, no lost rolls.
+3. **Timed group check.** A group check with `durationSeconds`. Watch the
+   countdown (10s increments), the hourglass GIF in the final stretch, and the
+   "final sands" text cue. Let one player roll early and one let it expire →
+   expiry finalizes correctly (unrolled = failure) without hanging the thread.
+4. **Latecomer joins a running encounter.** A non-joined player tries to post →
+   their message is auto-deleted (FR-28/29). They join via the persistent
+   **Join** button on the lobby embed **and** via `/encounter join`; their
+   messages are then accepted. Confirm they are **not** retro-added to an
+   in-flight group check's target set.
+5. **Non-joined message deleted + guided.** A non-joined member posts during the
+   lobby phase and during a running group encounter; the bot deletes it and
+   guides them to Join. Confirm **no false-positive deletions** of joined
+   players' messages, and that missing `Manage Messages` degrades safely (logs +
+   skips deletion, does not crash — NFR-7).
+6. **No-show.** A targeted player doesn't roll. Untimed: the no-show grace period
+   (~60s) passes → they count as a failure, check finalizes. Timed: timer
+   expires → timeout finalize. Either way the thread does not hang.
+7. **Bot restart mid-group-check.** With a group check in flight, restart the
+   bot. The boot sweep rehydrates `groupcheck:{threadId}` (FR-44) and the
+   `encounter:{threadId}:active` flag; in-flight checks rehydrate for remaining
+   players to finish, and any check whose deadline passed finalizes as a
+   timeout. Confirm no orphaned thread and no lost roll state.
+
+---
+
+## Latency checklist (NFR-3 / NFR-4)
+While playtesting, record observed p95 from the bot's perspective (non-LLM
+overhead, i.e. excluding the LLM generation wait):
+- **Single-roll narration path:** p95 ≤ **8s**.
+- **Group-resolution path:** p95 ≤ **15s**.
+
+Record the observed numbers in the release notes. A miss = a perf follow-up, not
+an automatic fail, but investigate before shipping if either is exceeded by >50%.
+
+---
+
+## Sign-off
+- [ ] All 7 steps run; pass criteria met (no orphaned thread / lost roll / raw-JSON leak).
+- [ ] Latency p95 recorded (single ≤8s, group ≤15s).
+- [ ] Tester: ______  Date: ______  Release/commit: ______
+
+> File issues for any failure. Do not ship a group-encounter release without a
+> completed checklist.
--- a/specs/velvet-auction.yaml
+++ b/specs/velvet-auction.yaml
@@ -2,6 +2,9 @@ encounterId: "mardonar-velvet-auction-006"
 title: "The Velvet Quill Auction"
 tone: "mysterious"

+# Group encounter (Feature D) — requires a party; the lobby gates until 3 join.
+minPlayers: 3
+
 setting:
  location: "Upper District — private lounge in the Velvet Quill parlor"
  mood: >
@@ -101,6 +104,24 @@ skillChecks:
  negotiate_vesper_note: >
    Offering Madame Vesper secrets or trade arrangements of greater value than Karr's gold.

+  group_stealth_dc: 15
+  group_stealth_skill: "Stealth"
+  group_stealth_note: >
+    A coordinated group Stealth check when the party moves on the artifact together
+    (e.g. during a staged distraction). Emit as a GROUP check via skill_check_group_emit
+    targeting all joined players, with successRule: majority and durationSeconds: 60.
+    Failure means the guards or the abjuration wards notice the coordinated movement.
+
+# Passive skill reveals (Feature B) — bot-applied at encounter start, group-visible,
+# attributed to the qualifying player. threshold is a passive DC (integer); revealText
+# is outcome prose only — no dice results (the engine owns rolls).
+passiveReveals:
+  - skill: "Insight"
+    threshold: 15
+    revealText: >
+      Notices a faint tremor in Karr's grip as he raises his bid paddle — his swagger
+      is a veneer, and something about this lot has him badly rattled.
+
 randomizable:
  - key: broker_name
    source: vocabulary
@@ -121,11 +142,13 @@ randomizable:

 tools:
  - skill_check_emit
+  - skill_check_group_emit
  - encounter_resolve
  - context_recall
  - goal_register
  - foundry_lookup
  - foundry_reward
+  - character_status

 dmNotes: >
  This is a social heist encounter. Direct combat is highly discouraged by the presence of abjuration wards
--- a/tests/unit/specsToolsConsistency.test.ts
+++ b/tests/unit/specsToolsConsistency.test.ts
@@ -78,9 +78,11 @@ describe('specs/*.yaml tool references', () => {

  it('every registered tool is referenced by at least one spec (sanity: the registry is reachable from the default active set)', () => {
    // Tools registered ahead of their spec are allowlisted here — remove the
-    // entry once a spec references the tool. skill_check_group_emit lands a
-    // group spec with the lobby (Story 9).
-    const NOT_YET_REFERENCED = new Set(['skill_check_group_emit', 'character_status']);
+    // entry once a spec references the tool. As of 2026-06-22, velvet-auction
+    // references skill_check_group_emit (group Stealth) and character_status,
+    // so the allowlist is empty. Re-add a name here only when a new tool is
+    // registered ahead of its landing spec.
+    const NOT_YET_REFERENCED = new Set<string>([]);
    const referenced = new Set<string>();
    for (const { raw } of specFiles) {
      if (Array.isArray(raw.tools)) {