feat(specs): velvet-auction exercises group tools; FU-9 playtest gate; docs drift fix

FU-12 — velvet-auction.yaml now uses the group-encounter tools:
- minPlayers: 3 (lobby-gated party heist, matches PRD UJ-1)
- passiveReveals: Insight/15 (notices Karr's tell — Feature B)
- group_stealth skillChecks entry (group Stealth, successRule: majority,
  durationSeconds: 60) + skill_check_group_emit and character_status added
  to the tools list.
- specsToolsConsistency: emptied the NOT_YET_REFERENCED allowlist
  (skill_check_group_emit + character_status are now referenced); all 8
  registered tools are reachable from specs. Validated: specLoader +
  specsToolsConsistency + full unit suite (527) pass.

FU-9 — docs/release-playtest-checklist.md: the 7-step manual pre-release
  multi-player playtest checklist checked into the repo as a release gate
  (was buried only in the arch doc). Includes pass criteria (no orphaned
  thread / lost roll / raw-JSON leak) + the NFR-3/NFR-4 latency checklist.

docs/project-overview.md drift fix: pino -> src/lib/logger.ts (custom
  plaintext, ADR-002); primary LLM -> minimax-m3 via LiteLLM
  (LITELLM_MODEL); test count 22 -> 58; lib/ description; relabel dynamic
  goal registration as delivered.

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Kaysser Kayyali
2026-06-22 16:02:00 +00:00
parent c549aaa49f
commit 37a1a3d421
4 changed files with 110 additions and 9 deletions

View File

@@ -4,7 +4,7 @@
## What it is
A Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. The bot loads a YAML spec, narrates the scene via an LLM (Gemma 4 IT e2b through LiteLLM with Ollama fallback), voices NPCs with stable personas, runs skill checks via Discord embeds, and persists NPC memory + encounter history into a graph database through GraphMCP (JSON-RPC over HTTP). Optional Foundry VTT integration pulls live character stats and awards XP via an external relay.
A Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. The bot loads a YAML spec, narrates the scene via an LLM (minimax-m3 through LiteLLM, with Ollama as fallback), voices NPCs with stable personas, runs skill checks via Discord embeds, and persists NPC memory + encounter history into a graph database through GraphMCP (JSON-RPC over HTTP). Optional Foundry VTT integration pulls live character stats and awards XP via an external relay.
## Who it serves
@@ -16,14 +16,14 @@ Discord community members playing D&D 5e in the Land of Mardonar. The DM runs `/
|---|---|
| Runtime | Node.js 22 (ESM, TypeScript 5.8 strict) |
| Discord | discord.js v14 |
| LLM (primary) | LiteLLM proxy (env: `LITELLM_BASE_URL`) |
| LLM (primary) | LiteLLM proxy — minimax-m3 (env: `LITELLM_BASE_URL`, `LITELLM_MODEL`) |
| LLM (fallback) | Ollama (env: `OLLAMA_BASE_URL`) — `gemma4-it:e2b`, 128k context |
| Session cache | Redis (ioredis), 12h TTL |
| Graph DB | Neo4j (via GraphMCP JSON-RPC, not direct) |
| Lore / NPC memory | GraphMCP HTTP JSON-RPC server |
| Foundry VTT | External relay (optional, requires API key) |
| Validation | Zod (env + encounter spec) |
| Logging | pino + pino-pretty |
| Logging | `src/lib/logger.ts` (custom plaintext — pino removed) |
| Testing | Vitest 3 (unit + integration) |
| Build | tsc → multi-stage Node 22 alpine Dockerfile |
@@ -62,12 +62,12 @@ src/
├── spec/ # YAML encounter loader + Zod schema
├── persona/ # persona.yaml loader
├── config.ts # Zod env validation
├── lib/ # logger
├── lib/ # logger (custom plaintext), historyTrim, skillCheckMessages
├── scripts/ # deploy-commands (slash command registration)
└── types/ # shared interfaces + CONTEXT_BUDGET
```
Plus `specs/` (8 encounter YAML files), `tests/` (22 test files), `data/` (runtime tally + summaries), and `Docs/` (pre-existing project documentation, partially out of date).
Plus `specs/` (8 encounter YAML files), `tests/` (58 test files), `data/` (runtime tally + summaries), and `Docs/` (pre-existing project documentation, partially out of date).
## Documentation
@@ -82,7 +82,7 @@ Plus `specs/` (8 encounter YAML files), `tests/` (22 test files), `data/` (runti
## Key features in the current codebase
- **Per-encounter tool filtering.** Each spec declares which tool plugins are active.
- **Dynamic goal registration** (the active PRD feature) — `tools/goalRegister.ts` lets the LLM add new goals mid-encounter.
- **Dynamic goal registration** (delivered) — `tools/goalRegister.ts` lets the LLM add new goals mid-encounter.
- **Three-pattern tool parser** — handles fenced `tool_call`, bare `tool_call` header, and fuzzy bare JSON, so even smaller models can drive tools.
- **Self-spinning VTT relay** — when the relay is down, the bot handshakes via RSA-OAEP and launches a headless Foundry session on demand.
- **Burst cap with drop notices** — if too many messages arrive before the last LLM response, the bot drops the excess and posts a tone-aware notice.

View File

@@ -0,0 +1,76 @@
# Multi-Player Playtest Checklist (Release Gate)
Manual pre-release checklist for **group-encounter** features (Features AE + FR-43).
Required before any release that touches the lobby, group checks, passive reveals,
timed checks, or story-status surfaces.
> **Why this exists.** Group checks have **unit + integration coverage only — no
> live E2E tier**. The one-token constraint makes true multi-player live E2E
> impossible without a synthetic-Interaction forge (integration, not live) or a
> second bot token (violates the constraint). The deterministic core is fully
> unit/integration covered; the Discord fan-out surface is shared with existing
> single-player live ACs. **This manual checklist is the safety net for the
> residual risk** (real Discord fan-out, gateway event ordering, ephemeral-in-thread
> quirks, burst rate-limiting). Source: `_bmad-output/arch/arch-mardonar-encounter-engine-2026-06-20/architecture.md §8` (closes Murat #5).
---
## Pass criteria
A pass = **no orphaned thread, no lost roll, no raw-JSON leak to players.**
Any of those = fail the release and fix before shipping.
## The 7 steps
Run these against a real Discord guild with ≥3 test players and a group spec
(e.g. `velvet-auction`, `minPlayers: 3`).
1. **Lobby → start.** N players join the lobby; `Start` enables at `minPlayers`;
starter presses Start. Opening narrative posts; passive reveals fire for
qualifying players. Confirm the auto `[SESSION] entered` announcement is
**suppressed** for the group encounter.
2. **Group check, all roll.** LLM emits a group skill check. Every targeted
player clicks **Roll**; each gets an ephemeral with their d20+mod vs DC; the
central scoreboard fills live and finalizes with a group SUCCESS/FAILURE +
`[SKILL CHECK RESULT]` system message. Confirm no double rolls, no lost rolls.
3. **Timed group check.** A group check with `durationSeconds`. Watch the
countdown (10s increments), the hourglass GIF in the final stretch, and the
"final sands" text cue. Let one player roll early and one let it expire →
expiry finalizes correctly (unrolled = failure) without hanging the thread.
4. **Latecomer joins a running encounter.** A non-joined player tries to post →
their message is auto-deleted (FR-28/29). They join via the persistent
**Join** button on the lobby embed **and** via `/encounter join`; their
messages are then accepted. Confirm they are **not** retro-added to an
in-flight group check's target set.
5. **Non-joined message deleted + guided.** A non-joined member posts during the
lobby phase and during a running group encounter; the bot deletes it and
guides them to Join. Confirm **no false-positive deletions** of joined
players' messages, and that missing `Manage Messages` degrades safely (logs +
skips deletion, does not crash — NFR-7).
6. **No-show.** A targeted player doesn't roll. Untimed: the no-show grace period
(~60s) passes → they count as a failure, check finalizes. Timed: timer
expires → timeout finalize. Either way the thread does not hang.
7. **Bot restart mid-group-check.** With a group check in flight, restart the
bot. The boot sweep rehydrates `groupcheck:{threadId}` (FR-44) and the
`encounter:{threadId}:active` flag; in-flight checks rehydrate for remaining
players to finish, and any check whose deadline passed finalizes as a
timeout. Confirm no orphaned thread and no lost roll state.
---
## Latency checklist (NFR-3 / NFR-4)
While playtesting, record observed p95 from the bot's perspective (non-LLM
overhead, i.e. excluding the LLM generation wait):
- **Single-roll narration path:** p95 ≤ **8s**.
- **Group-resolution path:** p95 ≤ **15s**.
Record the observed numbers in the release notes. A miss = a perf follow-up, not
an automatic fail, but investigate before shipping if either is exceeded by >50%.
---
## Sign-off
- [ ] All 7 steps run; pass criteria met (no orphaned thread / lost roll / raw-JSON leak).
- [ ] Latency p95 recorded (single ≤8s, group ≤15s).
- [ ] Tester: ______ Date: ______ Release/commit: ______
> File issues for any failure. Do not ship a group-encounter release without a
> completed checklist.

View File

@@ -2,6 +2,9 @@ encounterId: "mardonar-velvet-auction-006"
title: "The Velvet Quill Auction"
tone: "mysterious"
# Group encounter (Feature D) — requires a party; the lobby gates until 3 join.
minPlayers: 3
setting:
location: "Upper District — private lounge in the Velvet Quill parlor"
mood: >
@@ -101,6 +104,24 @@ skillChecks:
negotiate_vesper_note: >
Offering Madame Vesper secrets or trade arrangements of greater value than Karr's gold.
group_stealth_dc: 15
group_stealth_skill: "Stealth"
group_stealth_note: >
A coordinated group Stealth check when the party moves on the artifact together
(e.g. during a staged distraction). Emit as a GROUP check via skill_check_group_emit
targeting all joined players, with successRule: majority and durationSeconds: 60.
Failure means the guards or the abjuration wards notice the coordinated movement.
# Passive skill reveals (Feature B) — bot-applied at encounter start, group-visible,
# attributed to the qualifying player. threshold is a passive DC (integer); revealText
# is outcome prose only — no dice results (the engine owns rolls).
passiveReveals:
- skill: "Insight"
threshold: 15
revealText: >
Notices a faint tremor in Karr's grip as he raises his bid paddle — his swagger
is a veneer, and something about this lot has him badly rattled.
randomizable:
- key: broker_name
source: vocabulary
@@ -121,11 +142,13 @@ randomizable:
tools:
- skill_check_emit
- skill_check_group_emit
- encounter_resolve
- context_recall
- goal_register
- foundry_lookup
- foundry_reward
- character_status
dmNotes: >
This is a social heist encounter. Direct combat is highly discouraged by the presence of abjuration wards

View File

@@ -78,9 +78,11 @@ describe('specs/*.yaml tool references', () => {
it('every registered tool is referenced by at least one spec (sanity: the registry is reachable from the default active set)', () => {
// Tools registered ahead of their spec are allowlisted here — remove the
// entry once a spec references the tool. skill_check_group_emit lands a
// group spec with the lobby (Story 9).
const NOT_YET_REFERENCED = new Set(['skill_check_group_emit', 'character_status']);
// entry once a spec references the tool. As of 2026-06-22, velvet-auction
// references skill_check_group_emit (group Stealth) and character_status,
// so the allowlist is empty. Re-add a name here only when a new tool is
// registered ahead of its landing spec.
const NOT_YET_REFERENCED = new Set<string>([]);
const referenced = new Set<string>();
for (const { raw } of specFiles) {
if (Array.isArray(raw.tools)) {