FU-12 — velvet-auction.yaml now uses the group-encounter tools: - minPlayers: 3 (lobby-gated party heist, matches PRD UJ-1) - passiveReveals: Insight/15 (notices Karr's tell — Feature B) - group_stealth skillChecks entry (group Stealth, successRule: majority, durationSeconds: 60) + skill_check_group_emit and character_status added to the tools list. - specsToolsConsistency: emptied the NOT_YET_REFERENCED allowlist (skill_check_group_emit + character_status are now referenced); all 8 registered tools are reachable from specs. Validated: specLoader + specsToolsConsistency + full unit suite (527) pass. FU-9 — docs/release-playtest-checklist.md: the 7-step manual pre-release multi-player playtest checklist checked into the repo as a release gate (was buried only in the arch doc). Includes pass criteria (no orphaned thread / lost roll / raw-JSON leak) + the NFR-3/NFR-4 latency checklist. docs/project-overview.md drift fix: pino -> src/lib/logger.ts (custom plaintext, ADR-002); primary LLM -> minimax-m3 via LiteLLM (LITELLM_MODEL); test count 22 -> 58; lib/ description; relabel dynamic goal registration as delivered. Co-Authored-By: Claude <noreply@anthropic.com>
4.1 KiB
Multi-Player Playtest Checklist (Release Gate)
Manual pre-release checklist for group-encounter features (Features A–E + FR-43). Required before any release that touches the lobby, group checks, passive reveals, timed checks, or story-status surfaces.
Why this exists. Group checks have unit + integration coverage only — no live E2E tier. The one-token constraint makes true multi-player live E2E impossible without a synthetic-Interaction forge (integration, not live) or a second bot token (violates the constraint). The deterministic core is fully unit/integration covered; the Discord fan-out surface is shared with existing single-player live ACs. This manual checklist is the safety net for the residual risk (real Discord fan-out, gateway event ordering, ephemeral-in-thread quirks, burst rate-limiting). Source:
_bmad-output/arch/arch-mardonar-encounter-engine-2026-06-20/architecture.md §8(closes Murat #5).
Pass criteria
A pass = no orphaned thread, no lost roll, no raw-JSON leak to players. Any of those = fail the release and fix before shipping.
The 7 steps
Run these against a real Discord guild with ≥3 test players and a group spec
(e.g. velvet-auction, minPlayers: 3).
- Lobby → start. N players join the lobby;
Startenables atminPlayers; starter presses Start. Opening narrative posts; passive reveals fire for qualifying players. Confirm the auto[SESSION] enteredannouncement is suppressed for the group encounter. - Group check, all roll. LLM emits a group skill check. Every targeted
player clicks Roll; each gets an ephemeral with their d20+mod vs DC; the
central scoreboard fills live and finalizes with a group SUCCESS/FAILURE +
[SKILL CHECK RESULT]system message. Confirm no double rolls, no lost rolls. - Timed group check. A group check with
durationSeconds. Watch the countdown (10s increments), the hourglass GIF in the final stretch, and the "final sands" text cue. Let one player roll early and one let it expire → expiry finalizes correctly (unrolled = failure) without hanging the thread. - Latecomer joins a running encounter. A non-joined player tries to post →
their message is auto-deleted (FR-28/29). They join via the persistent
Join button on the lobby embed and via
/encounter join; their messages are then accepted. Confirm they are not retro-added to an in-flight group check's target set. - Non-joined message deleted + guided. A non-joined member posts during the
lobby phase and during a running group encounter; the bot deletes it and
guides them to Join. Confirm no false-positive deletions of joined
players' messages, and that missing
Manage Messagesdegrades safely (logs + skips deletion, does not crash — NFR-7). - No-show. A targeted player doesn't roll. Untimed: the no-show grace period (~60s) passes → they count as a failure, check finalizes. Timed: timer expires → timeout finalize. Either way the thread does not hang.
- Bot restart mid-group-check. With a group check in flight, restart the
bot. The boot sweep rehydrates
groupcheck:{threadId}(FR-44) and theencounter:{threadId}:activeflag; in-flight checks rehydrate for remaining players to finish, and any check whose deadline passed finalizes as a timeout. Confirm no orphaned thread and no lost roll state.
Latency checklist (NFR-3 / NFR-4)
While playtesting, record observed p95 from the bot's perspective (non-LLM overhead, i.e. excluding the LLM generation wait):
- Single-roll narration path: p95 ≤ 8s.
- Group-resolution path: p95 ≤ 15s.
Record the observed numbers in the release notes. A miss = a perf follow-up, not an automatic fail, but investigate before shipping if either is exceeded by >50%.
Sign-off
- All 7 steps run; pass criteria met (no orphaned thread / lost roll / raw-JSON leak).
- Latency p95 recorded (single ≤8s, group ≤15s).
- Tester: ______ Date: ______ Release/commit: ______
File issues for any failure. Do not ship a group-encounter release without a completed checklist.