Files
zalbot/docs/release-playtest-checklist.md
Kaysser Kayyali 37a1a3d421 feat(specs): velvet-auction exercises group tools; FU-9 playtest gate; docs drift fix
FU-12 — velvet-auction.yaml now uses the group-encounter tools:
- minPlayers: 3 (lobby-gated party heist, matches PRD UJ-1)
- passiveReveals: Insight/15 (notices Karr's tell — Feature B)
- group_stealth skillChecks entry (group Stealth, successRule: majority,
  durationSeconds: 60) + skill_check_group_emit and character_status added
  to the tools list.
- specsToolsConsistency: emptied the NOT_YET_REFERENCED allowlist
  (skill_check_group_emit + character_status are now referenced); all 8
  registered tools are reachable from specs. Validated: specLoader +
  specsToolsConsistency + full unit suite (527) pass.

FU-9 — docs/release-playtest-checklist.md: the 7-step manual pre-release
  multi-player playtest checklist checked into the repo as a release gate
  (was buried only in the arch doc). Includes pass criteria (no orphaned
  thread / lost roll / raw-JSON leak) + the NFR-3/NFR-4 latency checklist.

docs/project-overview.md drift fix: pino -> src/lib/logger.ts (custom
  plaintext, ADR-002); primary LLM -> minimax-m3 via LiteLLM
  (LITELLM_MODEL); test count 22 -> 58; lib/ description; relabel dynamic
  goal registration as delivered.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-06-22 16:02:00 +00:00

4.1 KiB
Raw Permalink Blame History

Multi-Player Playtest Checklist (Release Gate)

Manual pre-release checklist for group-encounter features (Features AE + FR-43). Required before any release that touches the lobby, group checks, passive reveals, timed checks, or story-status surfaces.

Why this exists. Group checks have unit + integration coverage only — no live E2E tier. The one-token constraint makes true multi-player live E2E impossible without a synthetic-Interaction forge (integration, not live) or a second bot token (violates the constraint). The deterministic core is fully unit/integration covered; the Discord fan-out surface is shared with existing single-player live ACs. This manual checklist is the safety net for the residual risk (real Discord fan-out, gateway event ordering, ephemeral-in-thread quirks, burst rate-limiting). Source: _bmad-output/arch/arch-mardonar-encounter-engine-2026-06-20/architecture.md §8 (closes Murat #5).


Pass criteria

A pass = no orphaned thread, no lost roll, no raw-JSON leak to players. Any of those = fail the release and fix before shipping.

The 7 steps

Run these against a real Discord guild with ≥3 test players and a group spec (e.g. velvet-auction, minPlayers: 3).

  1. Lobby → start. N players join the lobby; Start enables at minPlayers; starter presses Start. Opening narrative posts; passive reveals fire for qualifying players. Confirm the auto [SESSION] entered announcement is suppressed for the group encounter.
  2. Group check, all roll. LLM emits a group skill check. Every targeted player clicks Roll; each gets an ephemeral with their d20+mod vs DC; the central scoreboard fills live and finalizes with a group SUCCESS/FAILURE + [SKILL CHECK RESULT] system message. Confirm no double rolls, no lost rolls.
  3. Timed group check. A group check with durationSeconds. Watch the countdown (10s increments), the hourglass GIF in the final stretch, and the "final sands" text cue. Let one player roll early and one let it expire → expiry finalizes correctly (unrolled = failure) without hanging the thread.
  4. Latecomer joins a running encounter. A non-joined player tries to post → their message is auto-deleted (FR-28/29). They join via the persistent Join button on the lobby embed and via /encounter join; their messages are then accepted. Confirm they are not retro-added to an in-flight group check's target set.
  5. Non-joined message deleted + guided. A non-joined member posts during the lobby phase and during a running group encounter; the bot deletes it and guides them to Join. Confirm no false-positive deletions of joined players' messages, and that missing Manage Messages degrades safely (logs + skips deletion, does not crash — NFR-7).
  6. No-show. A targeted player doesn't roll. Untimed: the no-show grace period (~60s) passes → they count as a failure, check finalizes. Timed: timer expires → timeout finalize. Either way the thread does not hang.
  7. Bot restart mid-group-check. With a group check in flight, restart the bot. The boot sweep rehydrates groupcheck:{threadId} (FR-44) and the encounter:{threadId}:active flag; in-flight checks rehydrate for remaining players to finish, and any check whose deadline passed finalizes as a timeout. Confirm no orphaned thread and no lost roll state.

Latency checklist (NFR-3 / NFR-4)

While playtesting, record observed p95 from the bot's perspective (non-LLM overhead, i.e. excluding the LLM generation wait):

  • Single-roll narration path: p95 ≤ 8s.
  • Group-resolution path: p95 ≤ 15s.

Record the observed numbers in the release notes. A miss = a perf follow-up, not an automatic fail, but investigate before shipping if either is exceeded by >50%.


Sign-off

  • All 7 steps run; pass criteria met (no orphaned thread / lost roll / raw-JSON leak).
  • Latency p95 recorded (single ≤8s, group ≤15s).
  • Tester: ______ Date: ______ Release/commit: ______

File issues for any failure. Do not ship a group-encounter release without a completed checklist.