feat: remove ai generator, and update core vision. move specs to another repo, to iterate on them via a tool
Some checks failed
tests / Unit tests (Node 22) (push) Failing after 28s
Some checks failed
tests / Unit tests (Node 22) (push) Failing after 28s
This commit is contained in:
25
Dockerfile
25
Dockerfile
@@ -12,6 +12,27 @@ COPY src ./src
|
||||
|
||||
RUN npm run build
|
||||
|
||||
# ── Spec corpus (CAP-17) ─────────────────────────────────────────
|
||||
# Opt-in build-time pull from Gitea, so the production spec corpus ships
|
||||
# decoupled from core code and updates on its own cadence. When SPECS_GIT_URL
|
||||
# is set, clone the corpus from Gitea (optionally at SPECS_GIT_REF); when unset,
|
||||
# fall back to the in-repo example specs (local/dev + CI default). Either way
|
||||
# the runtime image's ./specs is the single spec dir the bot reads via
|
||||
# config.SPECS_DIR.
|
||||
FROM alpine:3 AS specs
|
||||
ARG SPECS_GIT_URL=
|
||||
ARG SPECS_GIT_REF=
|
||||
RUN apk add --no-cache git
|
||||
WORKDIR /spec-source
|
||||
COPY specs ./bundled
|
||||
RUN if [ -n "$SPECS_GIT_URL" ]; then \
|
||||
git clone "$SPECS_GIT_URL" pulled \
|
||||
&& if [ -n "$SPECS_GIT_REF" ]; then git -C pulled checkout "$SPECS_GIT_REF"; fi \
|
||||
&& rm -rf bundled && mv pulled final; \
|
||||
else \
|
||||
mv bundled final; \
|
||||
fi
|
||||
|
||||
# ── Runtime image ──────────────────────────────────────────────
|
||||
FROM node:22-alpine
|
||||
|
||||
@@ -25,8 +46,8 @@ COPY package*.json ./
|
||||
RUN npm ci --omit=dev --ignore-scripts
|
||||
|
||||
COPY --from=builder /app/dist ./dist
|
||||
COPY specs ./specs
|
||||
COPY --from=specs /spec-source/final ./specs
|
||||
COPY lore ./lore
|
||||
COPY persona.yaml ./persona.yaml
|
||||
|
||||
CMD ["node", "dist/bot/index.js"]
|
||||
CMD ["node", "dist/bot/index.js"]
|
||||
27
_bmad-output/specs/spec-encounter-builder/.decision-log.md
Normal file
27
_bmad-output/specs/spec-encounter-builder/.decision-log.md
Normal file
@@ -0,0 +1,27 @@
|
||||
# Decision Log — spec-encounter-builder
|
||||
|
||||
## 2026-06-20 — create (express-mode distillation)
|
||||
|
||||
**Input:** "i want to make a simple, encounter builder tool. Something that i can use, which outputs the spec in a format that adheres to the llm contract. I'd like that to be another project." Sparse but directional; distilled express-mode (no field-by-field elicitation). No prior SPEC.md → create.
|
||||
|
||||
**Slug:** `encounter-builder` (from purpose; user did not specify). Landed under `_bmad-output/specs/` alongside the engine spec.
|
||||
|
||||
**Relationship to the engine:** this is a **separate project**. The builder targets the engine's spec contract, referenced as **adopted companions** from `spec-mardonar-encounter-engine/`: `authoring-guide.md` (the contract surface), `encounter-spec-fields.md` (field catalog), `voice-rules.md` (the LLM-contract pitfalls the builder enforces), plus `docs/architecture.md`. The builder does **not** re-derive the contract — single-sourced from the engine.
|
||||
|
||||
**Capabilities:** 6 (CAP-1..CAP-6). **Companions:** 4 adopted, 0 spec-authored (the contract lives in the engine spec; the builder's own overflow is open questions, not catalogs).
|
||||
|
||||
### Pass 1 — Coherence (Spec Law 1–6, 8)
|
||||
|
||||
- **Rule 1 (intent+success):** all 6 capabilities carry both, each success tied to a concrete engine-side check (Zod validate, `/encounter start`, complete run). PASS.
|
||||
- **Rule 2 (WHAT not HOW):** capabilities name outcomes (emit valid YAML, validate, enforce pitfalls), not implementation; form factor deliberately left as an open question. PASS.
|
||||
- **Rule 3 (constraints bend):** single-sourced contract rules out re-implementation; plain loader-shaped YAML rules out proprietary format; separate repo rules out runtime coupling; no-AI rules out a generator; simple/local rules out a deployed service. Each rules something out. PASS.
|
||||
- **Rule 4 (non-goals explicit):** 5 stated (no LLM generation, no Foundry, no runtime runner, no cloud service, no versioning/migration). PASS.
|
||||
- **Rule 5 (success signal testable):** Zod validate + `/encounter start` + complete run + zero pitfalls + drops into Gitea unmodified. PASS.
|
||||
- **Rule 6 (IDs stable/unique):** CAP-1..CAP-6, this spec's own ID space, no reuse/renumber. PASS.
|
||||
- **Rule 8 (lean prose):** no decoration. PASS.
|
||||
|
||||
### Pass 2 — Preservation
|
||||
|
||||
User claim-by-claim: "simple" → Why + simple/local constraint + non-goal (no cloud service); "encounter builder tool" → title + CAP-1..CAP-6; "i can use" → affected party = the author (the user); "outputs the spec in a format that adheres to the llm contract" → CAP-1/CAP-3/CAP-4 + constraints (shares contract, enforces pitfalls) + adopted `voice-rules.md`; "another project" → separate-project constraint + non-goals (no runtime runner, no Foundry). All load-bearing claims landed in SPEC.md or an adopted companion. No wrapper ceremony to drop.
|
||||
|
||||
**Verdict:** coherent and preservation-complete. 4 open questions remain (form factor, contract-sharing mechanism, output destination, validation depth) — all block implementation and need a human decision, but none block the spec itself. The engine contract the builder targets is captured in `spec-mardonar-encounter-engine/` (CAP-18 authoring guide, CAP-17 Gitea pipeline) so the builder spec is downstream-ready once those questions are answered.
|
||||
77
_bmad-output/specs/spec-encounter-builder/SPEC.md
Normal file
77
_bmad-output/specs/spec-encounter-builder/SPEC.md
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
id: SPEC-encounter-builder
|
||||
companions:
|
||||
- ../../../docs/architecture.md # adopted — engine system design
|
||||
- ../spec-mardonar-encounter-engine/authoring-guide.md # adopted — the contract this tool targets
|
||||
- ../spec-mardonar-encounter-engine/encounter-spec-fields.md # adopted — field catalog this tool targets
|
||||
- ../spec-mardonar-encounter-engine/voice-rules.md # adopted — the LLM-contract pitfalls this tool enforces
|
||||
sources: []
|
||||
---
|
||||
|
||||
> **Canonical contract.** This SPEC and the files in `companions:` are the complete, preservation-validated contract for what to build, test, and validate. The encounter-builder is a **separate project** from the Mardonar Encounter Engine; it targets the engine's spec contract (referenced as adopted companions from `spec-mardonar-encounter-engine/`) rather than re-deriving it.
|
||||
|
||||
# Mardonar Encounter Builder
|
||||
|
||||
## Why
|
||||
|
||||
A **pain to solve**: hand-authoring encounter YAMLs is error-prone. The engine's `EncounterSpecSchema` is strict (Zod-validated), and the LLM contract has sharp pitfalls — dice results in prose teach the LLM to fabricate rolls, system tags/fenced JSON get stripped or suppressed, personas written as stat blocks waste context the LLM can't use (Foundry owns stats), and stable `id` fields can't be renamed once live. The DM wants a simple tool that outputs contract-adherent specs so authoring is fast and correct the first time. The affected party is the DM/author (one person, the user). The backdrop that makes it matter now: the engine's spec contract is stable — `authoring-guide.md` and `encounter-spec-fields.md` exist as the canonical reference (CAP-18) — and specs are moving to a pipeline-pulled Gitea corpus (CAP-17), so a builder that emits valid YAML feeds that pipeline cleanly instead of handing the author a text editor and a Zod error report.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- id: CAP-1
|
||||
intent: An author can fill in encounter fields through the tool and receive a YAML spec that passes the engine's `EncounterSpecSchema`.
|
||||
success: A spec produced by the tool loads via `npm run build` (Zod validates) and `/encounter start` without a validation error.
|
||||
|
||||
- id: CAP-2
|
||||
intent: An author can define NPCs, goals, skill checks, tone, randomizable fields, and the party-size envelope through guided input rather than hand-writing YAML.
|
||||
success: The tool's inputs cover every common field in `encounter-spec-fields.md` (setting, npcs/personas, goals, skillChecks, tone, randomizable, minPlayers/maxPlayers, campaignId, tools) and emit them in the loader's expected shape.
|
||||
|
||||
- id: CAP-3
|
||||
intent: The tool can validate a spec against the engine's shared contract so an invalid spec never leaves the tool.
|
||||
success: A spec missing a required field or breaking an LLM-contract rule is blocked before output, with a message naming the field and the rule.
|
||||
|
||||
- id: CAP-4
|
||||
intent: The tool can enforce the LLM-contract pitfalls so an author is guided away from spec shapes the engine's filter/parser would reject.
|
||||
success: A dice result in `openingNarrative`/`persona`, a system tag, fenced `tool_call` syntax, or a stat-block persona triggers a validation warning explaining why and how to fix it (per `voice-rules.md`).
|
||||
|
||||
- id: CAP-5
|
||||
intent: An author can load an existing spec (the reference example) as a starting template and edit it.
|
||||
success: The tool ingests the repo's annotated reference spec and pre-fills its fields for editing; saving emits a new spec with a new `encounterId`.
|
||||
|
||||
- id: CAP-6
|
||||
intent: The tool can output specs ready for the Gitea spec corpus so the pipeline picks them up without manual reshaping.
|
||||
success: Output is a plain YAML file named `<encounterId>.yaml` with no frontmatter or wrapper the loader doesn't expect, written to a chosen directory.
|
||||
|
||||
## Constraints
|
||||
|
||||
- **Shares the engine's spec contract as the source of truth.** The builder consumes the engine's `EncounterSpecSchema` (and the `authoring-guide`/`encounter-spec-fields` rules), not a re-implementation — so the two cannot drift. The sharing mechanism (published package, generated JSON Schema, git submodule, or copy-with-contract-test) is an open question, but the contract is single-sourced.
|
||||
- **Outputs plain loader-shaped YAML.** No proprietary format, no frontmatter, no wrapper — exactly what `src/spec/loader.ts` expects, ready for the Gitea corpus.
|
||||
- **Separate project/repo from the engine.** Depends on the engine's contract artifact only, not its runtime (Discord, Redis, GraphMCP, LLM). The engine does not depend on the builder.
|
||||
- **No AI authoring of lore.** The builder is a structured-input tool, not an LLM generator. This is the line the engine drew (retired `/encounter generate`); the builder upholds it — it helps a human author, it does not author.
|
||||
- **Simple and local.** The user asked for "simple." It is a local tool the author runs, not a deployed multi-user service. Form factor is an open question.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Not an LLM spec generator.** AI never authors encounter lore; the builder is a form/wizard for a human author.
|
||||
- **Not a Foundry/VTT integration.** Character sheets and journals are Foundry's domain; the builder deals in encounter specs only.
|
||||
- **Not a runtime encounter runner.** The builder produces specs; the engine runs them. No Discord, no LLM, no session state.
|
||||
- **Not a multi-user or cloud service.** One author, local tool.
|
||||
- **Not a spec versioning/migration tool.** v1 authors new specs; migrating/renaming live `id` fields across an existing corpus is out of scope.
|
||||
|
||||
## Success signal
|
||||
|
||||
An author with no YAML expertise fills the builder's inputs for a 2-NPC, 2-primary-goal, randomizable, `minPlayers`-gated encounter and receives a YAML that passes `npm run build` Zod validation, loads via `/encounter start`, and runs a complete encounter in the engine — with zero LLM-contract pitfalls flagged — and the same spec drops into the Gitea corpus unmodified.
|
||||
|
||||
## Assumptions
|
||||
|
||||
- The builder is a separate project/repo, distinct from `mardonar-npcs`; it depends only on the engine's contract artifact (the EncounterSpec schema + the authoring guide rules), not the engine's runtime.
|
||||
- "Simple" means a local tool the user runs themselves; the exact form factor (CLI wizard, local TUI/GUI, or local web form) is an open question.
|
||||
- Slug `encounter-builder` was chosen from the tool's purpose; the user did not specify one.
|
||||
- The builder targets the engine contract as it stands at 2026-06-20 (CAP-1..CAP-18 in `spec-mardonar-encounter-engine`); new engine capabilities that add spec fields (e.g. Foundry journal links, campaign fields) flow to the builder through the shared contract, not by re-deriving.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- **Form factor:** interactive CLI wizard, local TUI/GUI (e.g. Ink/Textual), or a local web form? "Simple" leans CLI wizard, but the author should pick — it changes the whole stack.
|
||||
- **Contract-sharing mechanism:** how does the builder import the engine's `EncounterSpecSchema` without depending on the engine's runtime? Published npm package from the engine, a generated JSON Schema file committed to the builder, a git submodule, or a copy guarded by a contract-consistency test? The last is what the engine's own `specsToolsConsistency` test pattern suggests.
|
||||
- **Output destination:** does the builder write to a local directory the author commits, or push directly to the Gitea spec repo via git? Affects whether the builder needs git credentials.
|
||||
- **Validation depth:** does the builder validate at runtime with the engine's Zod schema (requires the schema artifact at runtime) or a generated JSON Schema is enough? Zod gives better error messages; JSON Schema is lighter to ship.
|
||||
@@ -0,0 +1,92 @@
|
||||
# Decision Log — spec-mardonar-encounter-engine
|
||||
|
||||
## 2026-06-20 — create (express-mode distillation)
|
||||
|
||||
**Input:** "I want to create updated bmad docs on the vision of this project." Sparse user message; rich project source — `docs/project-overview.md`, `docs/index.md`, `README.md`, `prd.md` (active PRD: Dynamic Goal Registration), `Docs/epics.md`, plus session-context memory. Distilled directly (no field-by-field elicitation). No prior SPEC.md existed at the target folder → create.
|
||||
|
||||
**Slug:** `mardonar-encounter-engine` (from project name; user did not specify). Landed under `_bmad-output/specs/` (customize.toml default) to avoid collision with the top-level `specs/` dir that holds encounter YAMLs.
|
||||
|
||||
**Companions produced (spec-authored):** `stack.md`, `slash-commands.md`, `encounter-spec-fields.md`, `voice-rules.md`.
|
||||
**Companions adopted (referenced, not edited):** `docs/architecture.md` (system design source of truth), `prd.md` (active PRD), `Docs/ux-designs/ux-mardonar-2026-05-30/EXPERIENCE.md` + `DESIGN.md` (UX artifacts).
|
||||
**Sources (fully absorbed, audit only):** `docs/project-overview.md`, `docs/index.md`, `README.md`, `Docs/epics.md`.
|
||||
**Capabilities:** 11 (CAP-1..CAP-11).
|
||||
|
||||
### Pass 1 — Coherence (Spec Law 1–6, 8)
|
||||
|
||||
- **Rule 1 (intent+success):** all 11 capabilities carry both. PASS.
|
||||
- **Rule 2 (WHAT not HOW):** borderline — CAP-5 and CAP-9 name product mechanics (`goal_register`, `encounter_resolve`, `filterLLMResponse`). Treated as product-vocabulary, not implementation prescription (the HOW lives in `encounter-spec-fields.md`/`voice-rules.md`). Accepted.
|
||||
- **Rule 3 (constraints bend design):** each constraint rules something out (jargon, LLM-rolls, raw-send, in-process DB, naive trailing-nudge, unbounded context, blind Foundry mutation, concurrent live runs). PASS.
|
||||
- **Rule 4 (non-goals explicit):** 5 stated. PASS.
|
||||
- **Rule 5 (success signal testable):** ties to AC1–AC10 live suite + specific observable behaviors. PASS.
|
||||
- **Rule 6 (IDs stable/unique):** CAP-1..CAP-11, no reuse/renumber. PASS.
|
||||
- **Rule 8 (lean prose):** dense but load-bearing; no decoration. PASS.
|
||||
|
||||
### Pass 2 — Preservation
|
||||
|
||||
Walked source claim-by-claim. project-overview (what/who/stack/features/drift) → kernel + `stack.md` + `slash-commands.md`. README (commands, session flow, context budget, NPC memory) → kernel + `slash-commands.md` + `encounter-spec-fields.md` + `stack.md`. index (conventions) → Constraints + `voice-rules.md`. epics (FR1–14, NFR1–6, UX-DR1–5) → CAP-9/CAP-10/CAP-11 + Constraints + `voice-rules.md`. prd (dynamic goals, randomizable specs) → CAP-5, CAP-6, `encounter-spec-fields.md` dynamic-goal contract. architecture → adopted companion. UX artifacts → adopted companions.
|
||||
|
||||
**Wrapper-only content dropped (on the record):** the "Generated 2026-06-19 from a deep scan" headers, the historical-Go-design drift-log dates, the navigational doc index links, and the README quick-start/install steps — all process/onboarding metadata, not vision contract.
|
||||
|
||||
**Verdict:** coherent and preservation-complete. Four open questions remain (`/encounter generate` as a CAP, campaign-level state, Foundry-as-core-pillar, HTTP health endpoint) — none block downstream; all are flagged in SPEC.md `## Open Questions` for a human decision.
|
||||
|
||||
## 2026-06-20 — update (user answered all four open questions)
|
||||
|
||||
The user resolved all four open questions with directional answers; this is an in-place update (same slug/folder, capability IDs preserved per Spec Law rule 6). The Open Questions section is removed (all resolved).
|
||||
|
||||
**Resolution 1 — `/encounter generate` removed.** User: it's a cheat that mixes AI content with canonical lore. → New constraint "Encounter lore is human-authored" (AI never writes canonical spec/lore); non-goal sharpened to "Not AI-authored encounter content" with the command retired; `slash-commands.md` marks it retired and flags the handler (`src/bot/commands/encounter.ts`) + AC7 integration tests for retirement. **Downstream action:** the code/test retirement is implementation work, not spec work — recorded here as the next step but not performed within this spec skill.
|
||||
|
||||
**Resolution 2 — campaign-level state + party-size gating in scope.** User: campaign continuity matters; encounters should have minimum player counts (solo trips vs group-only). → New CAP-12 (campaign continuity) and CAP-13 (party-size gating); new companion `campaign-state.md` (campaign record design + `minPlayers`/`maxPlayers` envelope + open design questions); `encounter-spec-fields.md` gains `minPlayers`, `maxPlayers`, `campaignId`.
|
||||
|
||||
**Resolution 3 — Foundry VTT becomes a pillar.** User: the relay gives real connection to the game; deeper integration — post-encounter summaries to Foundry journals, NPC lore journals. → New CAP-14 (summaries → Foundry journals) and CAP-15 (NPC lore journals); new companion `foundry-integration.md` (relay op table + deeper-integration roadmap + open design questions); constraint rewritten from "optional/gated" to "first-class pillar, mutating writes still consent-gated"; `stack.md` Foundry row updated.
|
||||
|
||||
**Resolution 4 — reliability is a baseline, not a pillar.** User: better in-world responses when something is wrong, but no Sentry / 24-7 monitoring. → New CAP-16 (graceful in-world degradation); new constraint "Reliability is a baseline, not a pillar" (rules out Sentry/observability infra); new non-goal "Not a 24/7-monitored production system." The prior HTTP health endpoint question folds into this — it's an ops gap, not a product pillar; not built.
|
||||
|
||||
**Success signal** expanded to name campaign carry-over, Foundry journals, and in-world degradation. **Assumptions** gained an implementation-status note: CAP-12..CAP-15 are greenfield, CAP-16 partially in place, CAP-1..CAP-11 substantially built.
|
||||
|
||||
### Pass 1 — Coherence (Spec Law 1–6, 8)
|
||||
|
||||
- **Rule 1:** CAP-12..CAP-16 each carry intent + success. PASS.
|
||||
- **Rule 2 (WHAT not HOW):** CAP-14/CAP-15 name the relay writes they enable (product mechanic), not implementation; design lives in `foundry-integration.md`. Accepted.
|
||||
- **Rule 3:** new constraints bend design — "human-authored lore" rules out AI spec generation; "Foundry pillar" rules out optional/skippable designs; "reliability baseline" rules out observability infra. PASS.
|
||||
- **Rule 4:** non-goals still ≥1, now sharpened. PASS.
|
||||
- **Rule 5:** success signal references campaign carry-over, Foundry journals, in-world degradation, the live suite. PASS.
|
||||
- **Rule 6:** CAP-12..CAP-16 appended, no reuse/renumber of CAP-1..CAP-11. PASS.
|
||||
- **Rule 8:** lean. PASS.
|
||||
|
||||
### Pass 2 — Preservation
|
||||
|
||||
Each user answer landed in SPEC.md or a companion: (1) generate-removal → constraint + non-goal + `slash-commands.md`; (2) campaign/gating → CAP-12/CAP-13 + `campaign-state.md` + `encounter-spec-fields.md`; (3) Foundry pillar → constraint + CAP-14/CAP-15 + `foundry-integration.md` + `stack.md`; (4) reliability → CAP-16 + constraint + non-goal. Open Questions section emptied and omitted per template.
|
||||
|
||||
**New open design questions surfaced (not blocking — recorded in companions):** shared-inventory ownership (Foundry vs Redis, `campaign-state.md`); explicit `/join` vs soft signal for the min-players gate (`campaign-state.md`); whether the relay already exposes a journal-write RPC or needs relay work (`foundry-integration.md`); NPC-to-Foundry-actor linking shape (`foundry-integration.md`); whether append-only journal writes need the consent gate (`foundry-integration.md`).
|
||||
|
||||
**Verdict:** coherent and preservation-complete. 16 capabilities, 6 spec-authored companions, 4 adopted companions. One downstream implementation action is recorded (retire `/encounter generate` code + AC7 tests) but not executed here.
|
||||
|
||||
## 2026-06-20 — update (spec-pipeline detangling + builder tool as a separate project)
|
||||
|
||||
The user added a final staple: specs become a pipeline-owned artifact pulled from Gitea at Docker build, decoupled from core code, with clean example files + an authoring guide as this repo's contract surface; and a simple encounter-builder tool is spun off as a separate project. In-place update (IDs preserved); new capabilities appended as CAP-17/CAP-18.
|
||||
|
||||
**Detangling + Gitea pull pipeline.** → New CAP-17 (build-time spec pull from Gitea, decoupled from core code); new constraint "Specs are decoupled from core code" (rules out specs-as-code-dependencies and in-repo production editing); `stack.md` gains a "Spec corpus / Gitea" row + the architecture diagram now shows Gitea → build → image; `encounter-spec-fields.md` notes where specs live (production corpus in Gitea, repo ships loader/schema + examples + guide). The current `src/spec/loader.ts` reads `./specs/` in-repo — the decoupling refactor is recorded as greenfield.
|
||||
|
||||
**Authoring guide + clean examples.** → New CAP-18 (a reader following `authoring-guide.md` produces a spec that validates, loads, and runs a complete encounter; ≥1 fully-annotated example ships). New companion `authoring-guide.md` authored in full: contract model, minimal anatomy, what the LLM reads vs what the bot enforces, validation, a reference-example plan (upgrade `specs/market-thief.yaml` into the annotated reference; historical specs become examples or move to Gitea), and the LLM-contract pitfalls (no dice results, no system tags, personas-are-voice, opening-is-pinned, ids-are-stable).
|
||||
|
||||
**Encounter-builder tool = separate project.** → New non-goal "Not an encounter-builder tool" (the builder is a separate project; this repo defines the spec contract it targets, not the authoring UI — to be specced separately). New non-goal "Not runtime spec authoring" (specs authored externally + pulled at build; the engine never mints/mutates specs at runtime). This pairs with the earlier generate-removal: authoring moves out of the engine entirely — hand-authored YAML in Gitea or the separate builder tool, both targeting the contract in `authoring-guide.md`.
|
||||
|
||||
**Why + success signal + assumptions** updated to name the pipeline detangling, the Gitea-pulled spec corpus, and the authoring-guide success criterion. Assumptions note CAP-17/CAP-18 are greenfield (loader refactor + example curation + guide are new deliverables).
|
||||
|
||||
### Pass 1 — Coherence (Spec Law 1–6, 8)
|
||||
|
||||
- **Rule 1:** CAP-17/CAP-18 each carry intent + success. PASS.
|
||||
- **Rule 2 (WHAT not HOW):** CAP-17 names the pipeline outcome (pull + decoupled cadence), not the fetch mechanism; CAP-18 names the authoring outcome, not the guide's contents. Accepted.
|
||||
- **Rule 3:** "Specs decoupled from core code" rules out bundled-source/in-repo-editing designs — bends. PASS.
|
||||
- **Rule 4:** non-goals gained two (builder-tool-not-here, runtime-authoring-not-here). PASS.
|
||||
- **Rule 5:** success signal now names Gitea-pulled corpus + authoring-guide criterion. PASS.
|
||||
- **Rule 6:** CAP-17/CAP-18 appended; CAP-1..CAP-16 untouched. PASS.
|
||||
- **Rule 8:** lean. PASS.
|
||||
|
||||
### Pass 2 — Preservation
|
||||
|
||||
Every user claim landed: spec-detangling → CAP-17 + constraint + `stack.md` + `encounter-spec-fields.md`; pipeline/Gitea → same; clean examples + authoring docs → CAP-18 + `authoring-guide.md`; builder-as-separate-project → two non-goals + a "to be specced separately" pointer. The authoring guide itself preserves the full LLM-contract knowledge (what the LLM reads, what the bot enforces, pitfalls) so the separate builder project can target it without re-deriving.
|
||||
|
||||
**Downstream implementation actions recorded (not executed here):** (a) retire `/encounter generate` code + AC7 tests [from prior update]; (b) refactor `src/spec/loader.ts` to read the build-time-pulled spec dir + relocate the production corpus to Gitea; (c) curate the clean reference example spec (upgrade `market-thief.yaml`) and decide which historical specs stay as examples vs move to Gitea. The encounter-builder tool is a separate project to be specced in its own `/bmad-spec` invocation.
|
||||
|
||||
**Verdict:** coherent and preservation-complete. 18 capabilities, 7 spec-authored companions, 4 adopted companions. Three downstream implementation actions are recorded; the builder tool is flagged for a separate spec.
|
||||
140
_bmad-output/specs/spec-mardonar-encounter-engine/SPEC.md
Normal file
140
_bmad-output/specs/spec-mardonar-encounter-engine/SPEC.md
Normal file
@@ -0,0 +1,140 @@
|
||||
---
|
||||
id: SPEC-mardonar-encounter-engine
|
||||
companions:
|
||||
- stack.md # tech stack reference
|
||||
- slash-commands.md # the Discord command surface catalog
|
||||
- encounter-spec-fields.md # YAML encounter spec field catalog
|
||||
- voice-rules.md # in-world voice + response-filter contract
|
||||
- campaign-state.md # campaign/party continuity + party-size gating design
|
||||
- foundry-integration.md # Foundry as a core pillar + deeper integration roadmap
|
||||
- ../../../docs/spec-authoring-guide.md # adopted — how to author a spec the LLM can read (canonical repo doc)
|
||||
- ../../../docs/specs-pipeline.md # adopted — decoupled specs pulled from Gitea at build (CAP-17)
|
||||
- ../../../docs/architecture.md # adopted — system design source of truth
|
||||
- ../../../prd.md # adopted — active PRD (Dynamic Goal Registration)
|
||||
- ../../../Docs/ux-designs/ux-mardonar-2026-05-30/EXPERIENCE.md # adopted — UX experience
|
||||
- ../../../Docs/ux-designs/ux-mardonar-2026-05-30/DESIGN.md # adopted — UX design
|
||||
sources:
|
||||
- ../../../docs/project-overview.md # absorbed into kernel + companions
|
||||
- ../../../docs/index.md # absorbed into kernel + companions
|
||||
- ../../../README.md # absorbed into kernel + companions
|
||||
- ../../../Docs/epics.md # requirements absorbed into capabilities/constraints
|
||||
---
|
||||
|
||||
> **Canonical contract.** This SPEC and the files in `companions:` are the complete, preservation-validated contract for what to build, test, and validate. Source documents listed in frontmatter are for traceability only — consult them only if you need narrative rationale or prose color this contract intentionally omits.
|
||||
|
||||
# Mardonar Encounter Engine
|
||||
|
||||
## Why
|
||||
|
||||
A **vision to realize** (with a **pain to solve** underneath): the Land of Mardonar Discord community plays D&D 5e, and the table wants encounters that feel alive and carry continuity across a campaign rather than reading as scripted one-shots. Today's static encounter specs force the LLM into awkward resolutions when player actions diverge from a predefined goal list, and an LLM left to narrate freely will fabricate dice results, leak internal tool-call JSON, and break character with utility jargon. This engine makes each Discord thread a living encounter: an LLM narrates in-world, voices NPCs grounded in persistent graph memory, drives skill checks through bot-controlled dice (never its own), and adapts the goal set on the fly when players go off-rails — all behind a hard filter that keeps internal format off the player's screen. The campaign carries from one encounter to the next through shared party state, and Foundry VTT is the live-game connection where character sheets, encounter summaries, and NPC lore journals actually live. Encounter specs are a pipeline-owned artifact — pulled from Gitea at Docker build, decoupled from core code — so a DM can curate the spec corpus independently of the bot release cycle, with a separate builder tool and an authoring guide as the contract surface. The affected parties are the DM (who runs `/encounter start`) and the registered players who post free-text actions in the thread. The backdrop that makes it matter now: the Dynamic Goal Registration PRD is active, the live AC1–AC10 integration suite is green, and the response-filter safety net just closed the raw-JSON leak seen in the 2026-06-20 game session.
|
||||
|
||||
## Capabilities
|
||||
|
||||
- id: CAP-1
|
||||
intent: A DM can launch a structured encounter from a YAML spec so a Discord thread becomes the encounter session with an opening scene.
|
||||
success: `/encounter start <spec>` creates a thread, posts the opening narrative, and a `session:<threadId>` key exists in Redis with the spec loaded and tone cached.
|
||||
|
||||
- id: CAP-2
|
||||
intent: A registered player can post free-text character actions and receive in-world narrative advances from the LLM turn by turn.
|
||||
success: A player action message yields an assistant narrative posted to the thread that contains no utility terms, no raw tool/JSON block, and no fabricated dice result.
|
||||
|
||||
- id: CAP-3
|
||||
intent: The LLM can request a skill check via a tool call so the bot posts a dice embed, collects the player's natural-language roll, and feeds the result back without the LLM inventing a number.
|
||||
success: A `skill_check_emit` tool call produces a Discord embed with a DC; the player's `"I rolled a N Skill"` reply is captured; a `[SKILL CHECK RESULT]` context message follows; the LLM never states a pre-roll number.
|
||||
|
||||
- id: CAP-4
|
||||
intent: A recurring NPC can remember prior interactions across encounters so its persona stays stable and references past dealings with the party.
|
||||
success: A named NPC with a `memoryKey` has graph-memory facts injected into its system prompt at session start and new facts committed at resolution; a follow-up encounter narrates a specific prior-encounter event drawn from graph memory.
|
||||
|
||||
- id: CAP-5
|
||||
intent: The LLM can register a new goal mid-encounter when player actions diverge from the spec's goals, and resolve the encounter against any registered goal.
|
||||
success: An off-spec creative action yields a `goal_register` tool call; the new goal appears in subsequent system prompts under `<hidden_goals>`; `encounter_resolve` with a custom `outcomeId` posts a closing embed naming that goal.
|
||||
|
||||
- id: CAP-6
|
||||
intent: A DM can author a spec with randomizable fields so re-running the same spec yields varied substance and complications.
|
||||
success: A spec declaring randomizable fields produces different concrete values across two runs of the same spec file.
|
||||
|
||||
- id: CAP-7
|
||||
intent: The system can log encounter events and commit NPC memory writes to the graph DB so lore accrues across sessions.
|
||||
success: GraphMCP receives encounter event logs and `foundry_reward` memory writes; the AC1 GraphMCP contract suite passes live.
|
||||
|
||||
- id: CAP-8
|
||||
intent: A player with a linked Foundry actor can have live character stats surfaced and XP/items awarded through the VTT relay.
|
||||
success: `/character register foundry` links an actor; `/actions` and `/character view` render live Foundry stats; `/xp award` and admin give mutate the Foundry actor — all gated behind explicit live consent (never run blind against production).
|
||||
|
||||
- id: CAP-9
|
||||
intent: The system can suppress any LLM response that leaks internal format so players only ever see in-world narration.
|
||||
success: Every player-facing LLM output passes `filterLLMResponse`; a response containing a `tool_call` token, code fence, bare tool-call JSON, system tag, or fabricated roll number is suppressed before `thread.send` and triggers a fenced-retry correction; the AC5 long-encounter and AC9 thread-command suites stay green with no raw JSON reaching the thread.
|
||||
|
||||
- id: CAP-10
|
||||
intent: A player can register a D&D name (and optionally a custom profile with pronouns) so only registered players can act in encounters.
|
||||
success: An unregistered player posting in an encounter thread receives an in-world ephemeral nudge to register; a registered player's action is accepted and their pronouns appear in narration.
|
||||
|
||||
- id: CAP-11
|
||||
intent: The system can cap concurrent player messages per burst and notify dropped players in-world so the LLM is never flooded and players get tone-aware feedback.
|
||||
success: A third-in-burst message is dropped (not queued) and the player receives a tone-keyed ephemeral notice; the cap resets after each LLM response completes; reactions transition 👀 → ⏳ → ✅ without going stale.
|
||||
|
||||
- id: CAP-12
|
||||
intent: A party's state — membership, shared inventory, and encounter history — can persist across encounters so the campaign carries continuity from one thread to the next.
|
||||
success: After encounter A resolves, a subsequent encounter B reads the party's membership and prior-encounter outcomes at start and writes new outcomes at resolution; a later encounter can reference a prior encounter's result as established fact.
|
||||
|
||||
- id: CAP-13
|
||||
intent: A DM can author an encounter with a minimum (and optional maximum) party size so solo trips and group encounters route correctly and a group encounter won't start until enough players are ready.
|
||||
success: A spec with `minPlayers: 3` refuses `/encounter start` until ≥3 registered players have joined; a solo-eligible spec (`minPlayers: 1`) starts for a single player.
|
||||
|
||||
- id: CAP-14
|
||||
intent: The system can post a closing encounter summary to linked characters' Foundry journals so the game's record of play lives in Foundry alongside the sheet.
|
||||
success: On encounter resolution, each Foundry-linked player's actor receives a journal entry carrying the encounter's outcome summary; the AC10 Foundry suite covers the write.
|
||||
|
||||
- id: CAP-15
|
||||
intent: A recurring NPC can carry a Foundry lore journal so NPC backstory and encounter history accrue in the game world, not just the graph.
|
||||
success: A named NPC with a Foundry actor link has journal entries appended as it appears across encounters; a follow-up encounter reads the NPC's Foundry journal and references a prior-encounter event from it.
|
||||
|
||||
- id: CAP-16
|
||||
intent: The system can degrade in-world when an LLM, tool, or relay call fails so players see an in-world pause or stumble rather than a stack trace, raw error, or silence.
|
||||
success: An LLM error, tool-dispatch failure, or relay outage produces an in-world player-facing string (never utility terms) and the turn still completes (history grows by ≥1 via the `[NO RESPONSE]` fallback); no 24/7 monitoring infrastructure is built to support this.
|
||||
|
||||
- id: CAP-17
|
||||
intent: A pipeline can pull the full encounter spec corpus from Gitea at Docker build time so specs ship decoupled from core code and update on their own cadence.
|
||||
success: A Docker build fetches the spec corpus from a configured Gitea source into the image; `/encounters list` reflects only the pulled specs; changing a spec in Gitea and rebuilding updates the available encounters without a code change to the bot.
|
||||
|
||||
- id: CAP-18
|
||||
intent: A spec author can follow documented examples and a contract guide to produce a spec the engine's LLM can read and drive.
|
||||
success: A reader following `authoring-guide.md` produces a spec that passes Zod validation, loads via `/encounter start`, and runs a complete encounter; at least one fully-annotated example spec ships with the repo as the reference.
|
||||
|
||||
## Constraints
|
||||
|
||||
- **In-world voice.** Every player-facing string uses in-world language. The words "bot", "system", "queue", "session", "ephemeral", "rate limit", and "error" never appear in player-visible output. Tone-keyed drop notices and confirmations are pre-generated, not LLM-run.
|
||||
- **The bot controls dice, not the LLM.** The LLM must never state or imply a specific dice result; outcome narration waits for the `[SKILL CHECK RESULT]` system message. This rules out any "the LLM just rolls for the player" design.
|
||||
- **Tool calls are the LLM's only state-effecting channel.** Raw `tool_call` blocks, code fences, bare tool JSON, and internal system tags must never reach players. `responseFilter` is the last-line defense; any new narrator path posting LLM output to a thread must route through `filterLLMResponse`, never `thread.send(raw)` directly.
|
||||
- **Encounter lore is human-authored.** AI never writes into the canonical spec/lore corpus; on-the-spot AI generation (`/encounter generate`) is removed to keep AI output out of canonical lore. This rules out any design that lets the LLM mint encounter specs at runtime.
|
||||
- **Specs are decoupled from core code.** The engine consumes the spec corpus as a build-time-pulled artifact from Gitea; the production spec corpus is not bundled source in this repo and updates on its own release cadence. The repo ships only the loader/schema contract, clean example specs, and the authoring guide. This rules out specs-as-code-dependencies and in-repo production spec editing.
|
||||
- **Single-process TypeScript ESM monolith** on Node 22, strict. All external surfaces — Redis, GraphMCP JSON-RPC, Ollama/LiteLLM, Foundry relay, Gitea spec source — are network/build-time clients; there is no in-process database or VTT.
|
||||
- **LLM via LiteLLM primary, Ollama fallback.** The final API message must be role `user`/`assistant`; a trailing `system` message returns empty (coerced by `toApiMessages`). This rules out appending a trailing system nudge and expecting a reply without the coercion.
|
||||
- **128k context budget.** Pinned opening narrative (and goal block) are never trimmed; oldest non-pinned turn pairs drop first when the sliding window overflows.
|
||||
- **Foundry VTT is a first-class pillar, not optional.** Character sheets, encounter summaries, and NPC lore journals live in Foundry; the relay is the source of truth for live stats and the bot both reads and writes (XP, items, summaries, journals). Mutating writes stay consent-gated — never run blind against production.
|
||||
- **Reliability is a baseline, not a pillar.** Error paths give players in-world graceful responses; the project does not build Sentry, 24/7 monitoring, or alerting infrastructure. This rules out designs that trade player-facing UX for observability tooling.
|
||||
- **Live Discord integration runs are serialized.** One token drives one gateway session; concurrent `connectLiveBots()` calls disconnect each other, so live E2E suites run one at a time.
|
||||
|
||||
## Non-goals
|
||||
|
||||
- **Not a general Discord bot framework.** Only encounter sessions in the Land of Mardonar; no generic moderation, music, or utility commands.
|
||||
- **Not AI-authored encounter content.** Specs are human-authored YAML; on-the-spot AI generation (`/encounter generate`) is retired to keep AI output out of canonical lore. Authoring stays spec-driven, not a visual editor.
|
||||
- **Not an encounter-builder tool.** The builder (a simple tool that outputs contract-adherent specs) is a separate project; this repo defines the spec contract it targets, not the authoring UI. To be specced separately.
|
||||
- **Not runtime spec authoring.** Specs are authored externally (the builder tool or hand-written YAML in Gitea) and pulled at Docker build; the engine never mints or mutates specs at runtime.
|
||||
- **Not a VTT or map renderer.** Foundry remains the source of truth for character sheets and journals; the bot reads live stats and writes XP/items/summaries/journals, it does not replace Foundry's UI or render battlemaps.
|
||||
- **Not a multi-tenant or horizontally-scaled deployment.** One Node process per deployment; no orchestration, sharding, or per-guild isolation is in scope.
|
||||
- **Not a public HTTP API or web UI.** Discord threads are the only player surface.
|
||||
- **Not a 24/7-monitored production system.** Reliability is a baseline; we invest in graceful in-world error responses to players, not Sentry, alerting, or observability infrastructure.
|
||||
|
||||
## Success signal
|
||||
|
||||
A DM runs `/encounter start` on a randomizable spec pulled from Gitea at build with no spec file in the code repo; over a 20–30 turn encounter a recurring NPC references the party's prior encounter from graph memory and a Foundry lore journal, a player's off-rails action triggers a dynamic `goal_register` that the LLM resolves to a custom outcome, skill checks flow through bot-controlled embeds, the party's state carries from this encounter into the next, the closing summary is posted to linked characters' Foundry journals, and any mid-encounter failure degrades in-world rather than leaking a stack trace — all with zero leaked internal format reaching players, demonstrated green across the live integration suite; and a first-time author following `authoring-guide.md` produces a spec that runs a complete encounter.
|
||||
|
||||
## Assumptions
|
||||
|
||||
- The "vision" specced here is the whole encounter-engine product, distilled from `docs/`, `README.md`, `prd.md`, and `Docs/epics.md` plus the user's 2026-06-20 direction answers, rather than a fresh brain-dump.
|
||||
- The active PRD (Dynamic Goal Registration + randomizable specs) is part of the current vision, kept as an adopted companion (`prd.md`).
|
||||
- Slug `mardonar-encounter-engine` was chosen from the project name; the user did not specify one.
|
||||
- Output lands under `_bmad-output/specs/` (the customize.toml default) rather than the existing top-level `specs/` directory, which holds encounter YAMLs.
|
||||
- CAP-12 (campaign continuity), CAP-13 (party-size gating), CAP-14 (Foundry summaries), CAP-15 (NPC lore journals), CAP-16 (graceful degradation), CAP-17 (Gitea build-time pull), and CAP-18 (authoring guide + examples) state the **vision direction**; the existing codebase substantially implements CAP-1..CAP-11, CAP-16 is partially in place (the always-grow `[NO RESPONSE]` fallback + in-world error strings), and CAP-12..CAP-15 + CAP-17 + CAP-18 are greenfield. The current `src/spec/loader.ts` reads `./specs/` in-repo; decoupling to a build-time Gitea pull + relocating the production corpus out of the repo is a refactor, and curating clean example specs + writing the authoring guide are new deliverables.
|
||||
@@ -0,0 +1,32 @@
|
||||
# Campaign State
|
||||
|
||||
Design companion for CAP-12 (campaign continuity) and CAP-13 (party-size gating). This is vision-direction design; the existing codebase is per-encounter only, so both are greenfield. Reference companion to SPEC.md.
|
||||
|
||||
## Campaign continuity (CAP-12)
|
||||
|
||||
A campaign is a party + its accrued state across encounters. Today each encounter is a standalone session keyed `session:<threadId>`; campaign continuity adds a campaign record that survives across threads.
|
||||
|
||||
| Entity | Lives in | Shape | Notes |
|
||||
|---|---|---|---|
|
||||
| Campaign | Redis (new key shape, e.g. `campaign:<campaignId>`) | `{ id, name, party: string[], sharedInventory?, encounterHistory: [{encounterId, outcomeId, summary, at}] }` | created by the DM; referenced by each encounter spec |
|
||||
| Party membership | campaign record | list of registered player IDs | a player joins a campaign, not just an encounter |
|
||||
| Encounter outcome | campaign.encounterHistory | appended at resolution | the resolution summary + outcomeId feed both Foundry journals (CAP-14) and the next encounter's context |
|
||||
|
||||
**Open design question:** does a campaign own a shared inventory (the party's pool of loot/items) separate from each character's Foundry sheet, or does "shared" live in Foundry party-actor containers? Foundry is the pillar, so lean toward Foundry-owned where possible.
|
||||
|
||||
**Read/write points:**
|
||||
- Encounter start: read the campaign record, inject party membership + relevant encounter-history outcomes into the system prompt (so the LLM treats prior results as established fact).
|
||||
- Encounter resolution: append the outcome to `campaign.encounterHistory`; the same summary is posted to each linked character's Foundry journal (CAP-14).
|
||||
|
||||
## Party-size gating (CAP-13)
|
||||
|
||||
Encounters declare their party-size envelope so solo trips and group encounters route correctly.
|
||||
|
||||
| Spec field | Type | Default | Behavior |
|
||||
|---|---|---|---|
|
||||
| `minPlayers` | integer | 1 | `/encounter start` refuses until ≥ `minPlayers` registered players have joined the thread |
|
||||
| `maxPlayers` | integer (optional) | unset | if set, joining is refused beyond `maxPlayers` |
|
||||
|
||||
**Routing intent:** small solo-eligible entries (`minPlayers: 1`) for side trips and scouting; larger entries (`minPlayers: 2+`, e.g. a heist or boss) that won't start until a group is ready. The DM authors the envelope per spec; the bot enforces it at start and join.
|
||||
|
||||
**Open design question:** is "joined" a soft signal (players posted in the thread) or an explicit `/join` command? An explicit join is cleaner for the ≥N gate; a soft signal reuses existing message presence. Lean explicit.
|
||||
@@ -0,0 +1,43 @@
|
||||
# Encounter Spec Fields
|
||||
|
||||
The YAML encounter-spec catalog. The runtime type is `EncounterSpec = z.infer<typeof EncounterSpecSchema>` from `src/spec/loader.ts` (the static type and the runtime validator cannot drift). **Where specs live:** the production corpus is pulled from Gitea at Docker build time (CAP-17), decoupled from core code; the repo ships only the loader/schema contract, clean example specs, and `authoring-guide.md`. Today `src/spec/loader.ts` reads `./specs/` in-repo — decoupling to the build-time Gitea pull is a refactor. Reference companion to SPEC.md.
|
||||
|
||||
| Field | Type | Purpose |
|
||||
|---|---|---|
|
||||
| `encounterId` | string (coerced to `gen-<id>` when generated) | unique ID → session key |
|
||||
| `title` | string | display name in Discord embeds |
|
||||
| `tone` | string (free-text, optional) | narration flavor; drives drop-notice selection; cached on session state |
|
||||
| `setting` | `{ location, mood, ambientNpcs }` | scene framing |
|
||||
| `openingNarrative` | string | posted at session start; pinned (never trimmed) |
|
||||
| `nameKey` | string (optional) | persona vocabulary key |
|
||||
| `npcs` | 1–3 personas | each: `id`, `name`, `role`, `persona`, optional `memoryKey` |
|
||||
| `goals.primary` | list | main target endings the LLM steers toward |
|
||||
| `goals.secondary` | list | valid but non-primary outcomes |
|
||||
| `skillChecks` | map of named DCs | e.g. `chase_dc: 13` |
|
||||
| `sportsmanshipRules` | list | "do not allow" rules |
|
||||
| `randomizable` | map (optional) | fields that vary per run (substance, complication) |
|
||||
| `tools` | list (optional) | which tool plugins are active for this encounter |
|
||||
| `minPlayers` | integer (default 1) | party-size floor — `/encounter start` refuses until ≥N registered players have joined (CAP-13) |
|
||||
| `maxPlayers` | integer (optional) | party-size ceiling — joining refused beyond this (CAP-13) |
|
||||
| `campaignId` | string (optional) | links the encounter to a campaign record whose party + encounter history carry continuity (CAP-12) |
|
||||
|
||||
## Persona fields
|
||||
|
||||
| Field | Purpose |
|
||||
|---|---|
|
||||
| `id` | stable NPC identifier |
|
||||
| `name` | display name |
|
||||
| `role` | role in the scene (e.g. `merchant`, `guard`) |
|
||||
| `persona` | voice/behavior directive |
|
||||
| `memoryKey` | if set, NPC has persistent graph memory (CAP-4) |
|
||||
|
||||
## Dynamic goal contract (CAP-5)
|
||||
|
||||
`goal_register` (`src/harness/tools/goalRegister.ts`) appends a runtime goal to `spec.goals.primary`/`secondary`:
|
||||
- `id` — kebab-case, auto-prefixed `dynamic_` if not already, validated `^[a-z0-9-_]+$`, no conflict with existing goals.
|
||||
- `label` — trigger-condition description; shown in the closing embed as the Outcome.
|
||||
- `isPrimary` — primary driver vs secondary fallback.
|
||||
- `reason` — logged for debugging.
|
||||
- **Caps:** at most 2 dynamic goals per session; goal registration blocked after 20 messages so the encounter winds down.
|
||||
|
||||
Resolution: `encounter_resolve` with a custom `outcomeId` finds the goal in `spec.goals` and renders its `label` as the Outcome field. `resolution.ts` already handles unregistered outcomeIds gracefully.
|
||||
@@ -0,0 +1,37 @@
|
||||
# Foundry Integration
|
||||
|
||||
Foundry VTT is a first-class pillar, not an optional integration (per the 2026-06-20 direction). The relay is the live-game connection where character sheets, encounter summaries, and NPC lore journals actually live. Reference companion to SPEC.md; the read-side (live stats, inventory, spells) and award-side (XP, items) exist today; the journal side is vision-direction (greenfield).
|
||||
|
||||
## What the relay is
|
||||
|
||||
External HTTP relay (`VTT_RELAY_URL`, `x-api-key` auth). Reads and mutating ops go through `src/vtt/foundryClient.ts`. Mutating ops (`giveItem`, `modifyExperience`, and the new journal writes) are consent-gated — never run blind against production.
|
||||
|
||||
| Op | Kind | Status |
|
||||
|---|---|---|
|
||||
| `searchActors`, `filterPlayerActors` | read | live |
|
||||
| `getActorDetails`, `getActorInventory`, `getActorSpells` | read | live (drives `/actions`, `/character view`) |
|
||||
| `giveItem` | mutating | live (admin give, gated) |
|
||||
| `modifyExperience` | mutating | live (`/xp award`, gated) |
|
||||
| post-encounter summary → actor journal | mutating | **greenfield (CAP-14)** |
|
||||
| NPC lore journal append/read | mutating + read | **greenfield (CAP-15)** |
|
||||
|
||||
## Deeper integration roadmap
|
||||
|
||||
### CAP-14 — Post-encounter summaries to Foundry journals
|
||||
On `encounter_resolve`, the closing outcome summary is written as a journal entry to each Foundry-linked player's actor. The summary already exists (the resolution embed text + `outcomeSummary`); the new work is a relay write that creates a journal entry on the actor. This makes Foundry the durable record of what a character has been through, readable in-game.
|
||||
|
||||
- Trigger: encounter resolution (after the resolution embed posts).
|
||||
- Payload: encounter title, outcome label/summary, date, the party present.
|
||||
- Fan-out: one journal entry per linked player actor.
|
||||
|
||||
### CAP-15 — NPC lore journals in Foundry
|
||||
Recurring NPCs (those with a Foundry actor link, in addition to the graph `memoryKey`) carry a Foundry lore journal. As an NPC appears across encounters, journal entries are appended; at session start the journal is read and injected so the NPC's voice references its own recorded history. This puts NPC backstory in the game world where players can discover it, not only in the graph.
|
||||
|
||||
- Trigger: NPC appears in an encounter (session start) → read journal; NPC departs/resolves → append journal entry.
|
||||
- Relationship to graph memory (CAP-4): graph is the fast semantic-recall store; the Foundry journal is the durable, player-discoverable narrative record. Both are sources; graph is injected at prompt time, journal is both read (prompt) and written (resolution).
|
||||
|
||||
## Open design questions
|
||||
|
||||
- Does the relay already expose a journal-write RPC, or does this need a relay-side addition? Determines whether CAP-14/CAP-15 are bot-only or require relay work.
|
||||
- NPC-to-Foundry-actor linking: is it spec-authored (a `foundryActorUuid` on the NPC persona) or resolved at runtime by name? Lean spec-authored for stability.
|
||||
- Should journal writes be consent-gated like XP/items, or are they considered non-destructive (append-only) and safe to run without the `RUN_FOUNDRY_LIVE` gate? Append-only is low-risk, but consistency with the existing gate is simpler.
|
||||
@@ -0,0 +1,57 @@
|
||||
# Slash Commands
|
||||
|
||||
The Discord command surface — the only player/DM interaction surface (no HTTP UI). Reference companion to SPEC.md.
|
||||
|
||||
## Player commands
|
||||
|
||||
| Command | Description |
|
||||
|---|---|
|
||||
| `/dndname set <name>` | Register or update your D&D character name |
|
||||
| `/dndname show` | Show your current registered name |
|
||||
| `/dndname clear` | Remove your registration |
|
||||
|
||||
Players must register before participating. An unregistered player posting in an encounter thread gets an in-world ephemeral nudge to register.
|
||||
|
||||
## DM / Admin commands
|
||||
|
||||
| Command | Description |
|
||||
|---|---|
|
||||
| `/encounter start <spec-name>` | Load a spec and open a new encounter thread |
|
||||
| `/encounter status` | Show session phase, player list, event count |
|
||||
| `/encounter end` | Force-resolve the encounter (admin override) |
|
||||
| `/encounters list` | List available encounter specs |
|
||||
| `/encounters info <spec>` | Show a spec's metadata |
|
||||
|
||||
`<spec-name>` maps to a file in `./specs/` (e.g. `/encounter start market-thief` → `./specs/market-thief.yaml`).
|
||||
|
||||
> **Retired:** `/encounter generate` (on-the-spot AI spec generation) is removed from the system — AI must never author canonical encounter lore. The command, its handler in `src/bot/commands/encounter.ts`, and the AC7 integration tests are slated for retirement (see `.decision-log.md`).
|
||||
|
||||
## Character commands
|
||||
|
||||
| Command | Description |
|
||||
|---|---|
|
||||
| `/character register foundry` | Link your Foundry actor (modal → searchActors) |
|
||||
| `/character register custom` | Register a custom profile (name, class, race, pronouns, backstory) |
|
||||
| `/character show` | Show your profile |
|
||||
| `/character view [user]` | View a character's live Foundry stats (requires link) |
|
||||
| `/character clear` | Remove your registration |
|
||||
| `/character admin list` | List all registered characters in the guild |
|
||||
| `/character admin remove <user>` | Remove another player's registration |
|
||||
| `/character admin give` | Give an item to a Foundry actor (mutating) |
|
||||
|
||||
## In-encounter commands
|
||||
|
||||
| Command | Description |
|
||||
|---|---|
|
||||
| `/action <verb>` | Submit a character action with inventory/spell context |
|
||||
| `/roll <dice>` | Submit a roll; forces a `skill_check_emit` tool call via a mandatory nudge |
|
||||
| `/turn pass` | Pass the turn |
|
||||
| `/turn list` | List turn order |
|
||||
| `/xp award [amount]` | Award XP to encounter players (Foundry-linked only) |
|
||||
| `/xp show` | Show XP state |
|
||||
|
||||
## Command-to-data-surface reach
|
||||
|
||||
- **GraphMCP** (JSON-RPC): `encounter.ts` (`/encounter start` → `logEncounter`), `encounters.ts` (list/info). These two reach GraphMCP directly. (`/encounter generate` was the third; retired.)
|
||||
- **Redis**: registries (`dndname`, `character`, `actions`, `view`) + session state (`encounter`, `turn`, `roll`, `xp`) + campaign state (CAP-12, greenfield).
|
||||
- **Foundry VTT relay**: `character` (register foundry, view, admin give), `actions` (inventory/spells), `xp award` (modifyExperience).
|
||||
40
_bmad-output/specs/spec-mardonar-encounter-engine/stack.md
Normal file
40
_bmad-output/specs/spec-mardonar-encounter-engine/stack.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Stack
|
||||
|
||||
Tech stack for the Mardonar Encounter Engine. Reference companion to SPEC.md; the kernel cites it rather than inlining the table.
|
||||
|
||||
| Layer | Technology | Notes |
|
||||
|---|---|---|
|
||||
| Runtime | Node.js 22 (ESM, TypeScript 5.8 strict) | single-process monolith |
|
||||
| Discord | discord.js v14 | slash commands, threads, reactions, modals |
|
||||
| LLM (primary) | LiteLLM proxy (`LITELLM_BASE_URL`) | OpenAI-compatible |
|
||||
| LLM (fallback) | Ollama (`OLLAMA_BASE_URL`) — `gemma4-it:e2b`, 128k context | used when LiteLLM is unreachable |
|
||||
| Session cache | Redis (ioredis), 12h TTL | session, player/character registries |
|
||||
| Graph DB | Neo4j via GraphMCP JSON-RPC (`/mcp` over HTTP) | NPC memory, lore, encounter events |
|
||||
| Foundry VTT | External relay (`VTT_RELAY_URL`, `x-api-key`) | **core pillar** — character sheets, encounter summaries, NPC lore journals live here; reads + mutating writes (XP, items, summaries, journals), all consent-gated |
|
||||
| Spec corpus | Gitea repo, pulled at Docker build (`SPECS_GIT_URL`/ref) | pipeline-owned artifact, decoupled from core code (CAP-17); repo ships only the loader/schema + examples + authoring guide |
|
||||
| Validation | Zod | env vars + encounter spec |
|
||||
| Logging | custom plaintext logger (`src/lib/logger.ts`) | pino was retired 2026-06-19 |
|
||||
| Testing | Vitest 3 (unit + integration) | gates: `RUN_FULL_E2E`, `RUN_GRAPHMCP_LIVE`, `RUN_FOUNDRY_LIVE` |
|
||||
| Build | `tsc` → multi-stage Node 22 alpine Dockerfile | `rootDir: src` |
|
||||
|
||||
## Architecture shape
|
||||
|
||||
Layered backend with a plugin-style tool registry. Discord I/O (`src/bot/`) drives an LLM harness (`src/harness/`) that talks to three external surfaces: Redis (session state), GraphMCP (JSON-RPC: NPC memory, lore, event log), and the Foundry VTT relay (live stats, XP grants). Tools self-register via `registerTool()` at module load; each encounter spec declares which tool plugins are active.
|
||||
|
||||
```
|
||||
Gitea spec corpus ──(pulled at Docker build)──▶ image /specs
|
||||
│
|
||||
Discord ──▶ src/bot/ (commands, embeds, handlers) │
|
||||
│ │
|
||||
▼ │
|
||||
src/harness/ (promptBuilder, contextAssembler, ◀── src/spec/loader.ts reads /specs
|
||||
llmClient, toolParser, toolDispatcher,
|
||||
toolRegistry, tools/* plugin registry)
|
||||
│
|
||||
┌────────────┼────────────┐
|
||||
▼ ▼ ▼
|
||||
Redis GraphMCP VTT relay
|
||||
(session (JSON-RPC: (Foundry
|
||||
state, NPC memory, live stats,
|
||||
campaign) lore, log) XP, journals)
|
||||
```
|
||||
@@ -0,0 +1,38 @@
|
||||
# Voice & Response-Filter Rules
|
||||
|
||||
Editorial voice rules + the response-filter contract. These are the hard rules that bend every narrator decision. Reference companion to SPEC.md.
|
||||
|
||||
## In-world voice
|
||||
|
||||
All player-facing bot strings use in-world language. **Never** appear in player-visible output: "bot", "system", "queue", "session", "ephemeral", "rate limit", "error", "user". Tone-keyed drop notices and confirmations are pre-generated constants, not LLM-run at runtime. Internal system messages (`[TOOL]`, `[SKILL CHECK RESULT]`, `[FILTER CORRECTION]`, `[NO RESPONSE]`) are role `system` and are appended to history — they are never sent to the thread.
|
||||
|
||||
Tone-keyed drop-notice strings (CAP-11), selected from the encounter's `tone`:
|
||||
|
||||
| tone | drop notice |
|
||||
|---|---|
|
||||
| grim | "The chaos swallowed your words…" |
|
||||
| comedic | "Everyone was talking at once…" |
|
||||
| mysterious | "Something muffled your voice…" |
|
||||
| tense | "No time — the moment moved on without you…" |
|
||||
| baseline (unrecognised) | "The echoes could not carry all voices at once…" |
|
||||
|
||||
## The bot controls dice
|
||||
|
||||
The LLM must never state or imply a specific dice result. Outcome narration waits for the `[SKILL CHECK RESULT]` system message. The `ROLL_CLAIM_RE` filter rejects "you rolled a 15", "the die shows", "rolls a 7", etc. — unless `skipRollClaim` is set, which happens only when a `[SKILL CHECK RESULT]` is in the recent context window (the LLM is narrating a known outcome, not fabricating a pre-roll).
|
||||
|
||||
## Response-filter contract (CAP-9)
|
||||
|
||||
`filterLLMResponse` (`src/bot/handlers/responseFilter.ts`) is the last-line defense before `thread.send(response.narrative)`. It rejects, in order:
|
||||
|
||||
| reason | detector | correction text (system, not player-facing) |
|
||||
|---|---|---|
|
||||
| `empty_response` | narrative trim empty | "Your previous response was empty. Continue the scene." |
|
||||
| `echoed_system_tag` | `[TOOL…]`, `[SKILL CHECK…]`, `[SESSION…]`, `[/roll…]`, `[SYSTEM…]` | "Do NOT echo internal system tags…" |
|
||||
| `leaked_tool_call` | `tool_call` token, code fence, bare `{"tool":…,"args":…}` JSON (tool-first or args-first) | "Your last response leaked a raw tool-call block… output ONLY a fenced ```tool_call block…" |
|
||||
| `fabricated_roll_result` | roll-number claim (unless `skipRollClaim`) | "Do NOT state or imply a specific dice result…" |
|
||||
|
||||
On rejection, `runLLMTurn` appends a `[FILTER CORRECTION]` system message and schedules one retry. A second consecutive rejection hits the `alreadyRetried` guard (last message starts with `[FILTER CORRECTION]`) and falls through to the `[NO RESPONSE]` fallback — never to players. The `leaked_tool_call` guard closed the 2026-06-20 game-session leak where `tool_call\n{ "tool": "skill_check_emit", "args": {...} }` was posted to the thread.
|
||||
|
||||
## LLM message shape
|
||||
|
||||
`toApiMessages()` (`src/harness/llmMessages.ts`) maps history to the API payload and coerces only the last `system` message to `user` — OpenAI/Ollama-compatible endpoints return empty when the final message is role `system`. Stored history keeps role `system` so role-based guards (e.g. the `[FILTER CORRECTION]` retry-loop check) still match.
|
||||
@@ -1,7 +1,7 @@
|
||||
{
|
||||
"market-thief": {
|
||||
"runs": 9,
|
||||
"lastRun": "2026-06-19T23:21:11.305Z"
|
||||
"runs": 21,
|
||||
"lastRun": "2026-06-20T01:03:00.706Z"
|
||||
},
|
||||
"mawfang-pursuit": {
|
||||
"runs": 2,
|
||||
@@ -62,5 +62,21 @@
|
||||
"velvet-auction": {
|
||||
"runs": 1,
|
||||
"lastRun": "2026-06-19T23:42:21.918Z"
|
||||
},
|
||||
"gen-mardonar-silt-standoff-001": {
|
||||
"runs": 1,
|
||||
"lastRun": "2026-06-20T00:42:53.002Z"
|
||||
},
|
||||
"gen-mardonar-stormscar-watch-001": {
|
||||
"runs": 1,
|
||||
"lastRun": "2026-06-20T01:28:12.564Z"
|
||||
},
|
||||
"gen-mardonar-breakout-001": {
|
||||
"runs": 1,
|
||||
"lastRun": "2026-06-20T01:29:55.789Z"
|
||||
},
|
||||
"gen-mardonar-voldramir-descent-001": {
|
||||
"runs": 1,
|
||||
"lastRun": "2026-06-20T03:21:56.659Z"
|
||||
}
|
||||
}
|
||||
103
docs/spec-authoring-guide.md
Normal file
103
docs/spec-authoring-guide.md
Normal file
@@ -0,0 +1,103 @@
|
||||
# Encounter Spec Authoring Guide
|
||||
|
||||
How to author a YAML encounter spec the engine's LLM can read and drive. This is the canonical contract surface: hand-authors follow it, and the separate **encounter-builder tool** targets it. Field semantics live in [`data-models.md`](./data-models.md); the runtime validator is `EncounterSpecSchema` in `src/spec/loader.ts`.
|
||||
|
||||
## Where specs live
|
||||
|
||||
The production spec corpus is **decoupled from core code**: it lives in a Gitea repo and is pulled into the image at Docker build time (see [specs-pipeline.md](./specs-pipeline.md)). This repo ships only the loader/schema contract, clean example specs (in `./specs/`), and this guide. At runtime the bot reads `config.SPECS_DIR` (default `./specs`) — the same dir whether it holds the bundled examples or the build-time-pulled corpus.
|
||||
|
||||
## The contract model
|
||||
|
||||
A spec is the LLM's instruction sheet. Every field maps to either something the LLM **reads** in its system prompt (setting, NPCs, goals, tone, tools, memory) or something the **bot enforces** (skill-check DCs, party-size gate, campaign link, randomizable draws). The LLM never sees engine internals — it sees prose directives. Write for a narrator, not a programmer.
|
||||
|
||||
## Minimal anatomy
|
||||
|
||||
A spec must validate against `EncounterSpecSchema` in `src/spec/loader.ts`. The required spine:
|
||||
|
||||
```yaml
|
||||
encounterId: market-thief # kebab-case; session key
|
||||
title: The Market Thief # shown in Discord embeds
|
||||
setting:
|
||||
location: Lower Mardonar market square
|
||||
mood: tense, crowded, late afternoon
|
||||
ambientNpcs: hawkers, a city guard patrol, fleeing urchins
|
||||
openingNarrative: |
|
||||
The square bustles... (this is pinned for the whole encounter — never trimmed)
|
||||
npcs:
|
||||
- id: dal
|
||||
name: Dal the Quick
|
||||
role: pickpocket
|
||||
persona: |
|
||||
Nervous, fast, talks too much when cornered. Will bargain before fighting.
|
||||
goals:
|
||||
primary:
|
||||
- id: thief_caught
|
||||
label: Dal is caught and the purse is returned
|
||||
secondary:
|
||||
- id: thief_escaped
|
||||
label: Dal escapes into the crowd
|
||||
sportsmanshipRules:
|
||||
- No PvP between players
|
||||
- No auto-hitting named NPCs
|
||||
skillChecks:
|
||||
chase_dc: 13
|
||||
perception_dc: 11
|
||||
```
|
||||
|
||||
## Writing what the LLM reads
|
||||
|
||||
- **`setting`** — three strings. `location` grounds the scene; `mood` is a style directive (the LLM leans into it); `ambientNpcs` gives background life without persona overhead. Keep `mood` to a few adjectives.
|
||||
- **`openingNarrative`** — the scene-setting text posted at session start. It is **pinned** (never trimmed from context), so put everything the LLM must keep in frame here. Write it in-world.
|
||||
- **`npcs`** — 1–3 personas. `persona` is a **voice/behavior directive**, not a stat block: how they speak, what they want, how they react under pressure. One to three sentences beats a paragraph. Give each a stable `id` and a `name`. Add `memoryKey` only for NPCs that should remember across encounters; leave it off for one-off NPCs.
|
||||
- **`goals`** — `primary` is what the LLM steers toward; `secondary` is valid-but-not-main. Each goal needs a stable `id` and a `label` (the `label` becomes the closing embed's Outcome text). 2–3 primary goals is the sweet spot. The LLM may register more mid-encounter via `goal_register`.
|
||||
- **`tone`** (optional) — free-text narration flavor (`grim`, `tense`, `comedic`, `mysterious`). Also selects the in-world drop-notice string when the burst cap drops a message. Unrecognised tones fall back to a baseline notice.
|
||||
|
||||
## Writing what the bot enforces
|
||||
|
||||
- **`skillChecks`** — a map of named DCs (`chase_dc: 13`). The LLM references these by name when it emits `skill_check_emit`. Name them for the action, not the stat (`chase_dc`, not `dex_dc`) — the LLM picks the skill, the spec supplies the target number. Free-text companion keys (`chase_skill`, `chase_note`) ride alongside the DC as LLM-read context; they never become dice.
|
||||
- **`randomizable`** (optional) — declare fields that vary per run so the same spec file yields different substance/complications:
|
||||
|
||||
```yaml
|
||||
randomizable:
|
||||
leak_substance:
|
||||
- Caustic Rust-Blight Silt
|
||||
- Sleeping Ether-Vapor
|
||||
- Wild-Magic Slurry
|
||||
leak_complication:
|
||||
- mutated_silt_rats
|
||||
- corroded_lock
|
||||
- greedy_scavenger
|
||||
```
|
||||
|
||||
- **`tools`** (optional) — which tool plugins are active for this encounter. Omitting it activates the default set; list explicitly to narrow it (e.g. a no-combat encounter that doesn't need `skill_check_emit`). Every name must be a registered plugin — the `tests/unit/specsToolsConsistency.test.ts` suite fails the build if a spec references an unknown tool.
|
||||
- **`xpReward`** (optional) — flat XP awarded to every participant when the encounter resolves, regardless of which goal/outcome fired. Omit for encounters that grant no XP.
|
||||
|
||||
### Planned, not yet enforced
|
||||
|
||||
`minPlayers` / `maxPlayers` (party-size gating) and `campaignId` (campaign continuity) are part of the engine vision (CAP-12, CAP-13) but are **not yet in `EncounterSpecSchema`**. Zod silently strips unknown keys, so writing them today does nothing — the bot ignores them. Do not add them to a spec expecting gating or campaign linkage; wait until the fields land in the schema (the builder tool will surface them when they do).
|
||||
|
||||
## Validation
|
||||
|
||||
The runtime type is `EncounterSpec = z.infer<typeof EncounterSpecSchema>` (`src/spec/loader.ts`) — the static type and the runtime validator cannot drift. Validate before deploying:
|
||||
|
||||
```bash
|
||||
npm run build # tsc compiles + Zod validates on load
|
||||
```
|
||||
|
||||
A spec that fails Zod validation is rejected at `/encounter start` with an in-world error, never a stack trace.
|
||||
|
||||
## A clean reference example
|
||||
|
||||
`specs/market-thief.yaml` is the fully-annotated reference spec — copy it as your starting point. It exercises every common field (`encounterId`, `title`, `tone`, `setting`, `openingNarrative`, `npcs` with `nameKey`/`memoryKey`, `goals`, `sportsmanshipRules`, `skillChecks` with `_skill`/`_note` companions, `randomizable`, `tools`, `dmNotes`) with inline `#` comments explaining each field's LLM/bot role. The optional `xpReward` is documented above and intentionally omitted from this example (some live tests rely on this spec having no XP default; add it to your own spec when you want flat resolution XP).
|
||||
|
||||
## Pitfalls — the LLM contract
|
||||
|
||||
- **Don't put dice results in the spec.** The bot controls dice. The LLM narrates outcomes only after the `[SKILL CHECK RESULT]` message. A spec that pre-declares "the thief rolls a 15" teaches the LLM to fabricate rolls.
|
||||
- **Don't put system tags or `tool_call` syntax in prose.** `openingNarrative` and `persona` are player-facing prose; any `tool_call`, `[TOOL]`, `[SKILL CHECK]`, or fenced JSON will either be stripped or, if the LLM echoes it, suppressed by the response filter. Keep prose clean in-world text.
|
||||
- **Personas are voice, not stats.** A spec that reads like a character sheet (HP, AC, spell lists) wastes context the LLM can't use — Foundry owns stats. Describe how an NPC behaves and sounds.
|
||||
- **`openingNarrative` is pinned.** It stays in context for the whole encounter, so keep it tight and load-bearing; don't bury the scene's core tension in flavor.
|
||||
- **`id` fields are stable.** Goal and NPC ids are referenced by `goal_register`/`encounter_resolve`/memory across encounters; never rename a live id.
|
||||
|
||||
## Authoring tooling
|
||||
|
||||
Hand-editing YAML is fine; the separate **encounter-builder tool** (its own project) is a guided form that outputs contract-adherent specs — it targets this guide and `EncounterSpecSchema`, never re-implementing them. On-the-spot AI generation (`/encounter generate`) was retired to keep AI output out of canonical lore.
|
||||
63
docs/specs-pipeline.md
Normal file
63
docs/specs-pipeline.md
Normal file
@@ -0,0 +1,63 @@
|
||||
# Spec Pipeline — Decoupled Specs Pulled at Build Time
|
||||
|
||||
How the encounter spec corpus ships decoupled from core code and is pulled from Gitea at Docker build time. Implements CAP-17 of the engine spec. Companion to [spec-authoring-guide.md](./spec-authoring-guide.md) (how to write a spec) — this doc covers how specs get into the running bot.
|
||||
|
||||
## The model
|
||||
|
||||
The bot reads encounter specs from a single directory, `config.SPECS_DIR` (default `./specs`). That directory is populated one of two ways, chosen at **Docker build time**:
|
||||
|
||||
1. **Bundled examples (default).** No `SPECS_GIT_URL` → the in-repo `./specs` example specs are baked into the image. This is the local/dev/CI path; nothing external is required.
|
||||
2. **Gitea pull (opt-in).** `SPECS_GIT_URL` build arg set → the Docker build clones the spec corpus from Gitea (optionally at `SPECS_GIT_REF`) and bakes *that* into the image instead of the bundled examples.
|
||||
|
||||
Either way the runtime is identical: the bot reads `config.SPECS_DIR`. There is no runtime fetch — the pull happens once, at build, so a deploy is reproducible from a known spec ref.
|
||||
|
||||
## Building with the Gitea corpus
|
||||
|
||||
```bash
|
||||
# Bundled examples (default — local/dev/CI)
|
||||
docker build -t mardonar-bot .
|
||||
|
||||
# Pull the production corpus from Gitea at build time
|
||||
docker build -t mardonar-bot \
|
||||
--build-arg SPECS_GIT_URL=https://gitea.example/you/mardonar-specs.git \
|
||||
--build-arg SPECS_GIT_REF=main \
|
||||
.
|
||||
```
|
||||
|
||||
`SPECS_GIT_REF` is optional — unset uses the repo's default branch. It accepts a branch, tag, or commit SHA (the build does a full clone then `git checkout`, so SHAs work).
|
||||
|
||||
## Pulling locally (dev)
|
||||
|
||||
`scripts/pull-specs.mjs` mirrors the build-time pull for local development — useful to test against the real corpus without a full Docker build:
|
||||
|
||||
```bash
|
||||
SPECS_GIT_URL=https://gitea.example/you/mardonar-specs.git \
|
||||
SPECS_GIT_REF=main \
|
||||
node scripts/pull-specs.mjs # writes to SPECS_DIR (default ./specs)
|
||||
# or target a separate dir to avoid clobbering the bundled examples:
|
||||
node scripts/pull-specs.mjs --dir ./specs-pulled
|
||||
```
|
||||
|
||||
With no `SPECS_GIT_URL` it prints a message and exits 0 — it never silently clobbers `./specs`.
|
||||
|
||||
## What ships in this repo vs. in Gitea
|
||||
|
||||
| Lives in this repo (`mardonar-npcs`) | Lives in the Gitea spec repo |
|
||||
|---|---|
|
||||
| `src/spec/loader.ts` — the `EncounterSpecSchema` contract | the production encounter YAML corpus |
|
||||
| `./specs/` — clean example specs + the annotated reference | (Gitea is the source of truth for what players actually run) |
|
||||
| `docs/spec-authoring-guide.md` — how to write a spec | |
|
||||
| `Dockerfile` + `scripts/pull-specs.mjs` — the pull mechanism | |
|
||||
|
||||
The engine never depends on a spec file's presence to compile; it depends only on the schema. Specs are data, not code.
|
||||
|
||||
## Migrating the production corpus to Gitea (your step)
|
||||
|
||||
The mechanism is in place; the one-time migration needs your Gitea infra:
|
||||
|
||||
1. Create a Gitea repo (e.g. `mardonar-specs`) and push the production encounter YAMLs into it.
|
||||
2. Decide which in-repo `./specs/*.yaml` stay as **examples** (at minimum the annotated reference `market-thief.yaml`) and which move to Gitea only. Leave the examples in `./specs`; remove the rest from the repo once Gitea holds them.
|
||||
3. Build with `--build-arg SPECS_GIT_URL=...`. Verify `/encounters list` reflects the Gitea corpus.
|
||||
4. From then on, curating specs is a Gitea commit + image rebuild — no bot code change, no bot release.
|
||||
|
||||
> **Note:** the running bot is unaffected until you build with `SPECS_GIT_URL`. The default build still bakes the in-repo examples, so there is no outage risk while you set up Gitea.
|
||||
57
scripts/pull-specs.mjs
Normal file
57
scripts/pull-specs.mjs
Normal file
@@ -0,0 +1,57 @@
|
||||
#!/usr/bin/env node
|
||||
// Pull the encounter spec corpus from a Gitea git source into SPECS_DIR.
|
||||
// Opt-in dev helper mirroring the Docker build-time pull (CAP-17). When
|
||||
// SPECS_GIT_URL is unset it does nothing — use the bundled ./specs examples,
|
||||
// or set SPECS_GIT_URL to pull the production corpus.
|
||||
//
|
||||
// Usage:
|
||||
// SPECS_GIT_URL=https://gitea.example/you/mardonar-specs.git \
|
||||
// SPECS_GIT_REF=main \
|
||||
// node scripts/pull-specs.mjs [--dir ./specs]
|
||||
//
|
||||
// --dir overrides the target directory (defaults to SPECS_DIR env, then ./specs).
|
||||
import { execFileSync } from 'node:child_process';
|
||||
import { rmSync, renameSync, cpSync } from 'node:fs';
|
||||
import { resolve } from 'node:path';
|
||||
|
||||
const url = process.env.SPECS_GIT_URL;
|
||||
const ref = process.env.SPECS_GIT_REF ?? '';
|
||||
|
||||
const dirArgIdx = process.argv.indexOf('--dir');
|
||||
const dir =
|
||||
dirArgIdx >= 0 && process.argv[dirArgIdx + 1]
|
||||
? process.argv[dirArgIdx + 1]
|
||||
: (process.env.SPECS_DIR ?? './specs');
|
||||
|
||||
if (!url) {
|
||||
console.error(
|
||||
'pull-specs: SPECS_GIT_URL is not set — nothing to pull. ' +
|
||||
'Use the bundled ./specs examples, or set SPECS_GIT_URL to pull the production corpus.',
|
||||
);
|
||||
process.exit(0);
|
||||
}
|
||||
|
||||
const target = resolve(dir);
|
||||
const tmp = resolve('.specs-pulled-tmp');
|
||||
|
||||
console.log(`pull-specs: cloning ${url}${ref ? ` @ ${ref}` : ''} → ${target}`);
|
||||
rmSync(tmp, { recursive: true, force: true });
|
||||
try {
|
||||
execFileSync('git', ['clone', url, tmp], { stdio: 'inherit' });
|
||||
if (ref) execFileSync('git', ['-C', tmp, 'checkout', ref], { stdio: 'inherit' });
|
||||
rmSync(target, { recursive: true, force: true });
|
||||
// renameSync fails with EXDEV across filesystem mounts (e.g. repo dir vs
|
||||
// /tmp on different devices); fall back to a recursive copy + remove.
|
||||
try {
|
||||
renameSync(tmp, target);
|
||||
} catch (err) {
|
||||
if (err.code !== 'EXDEV') throw err;
|
||||
cpSync(tmp, target, { recursive: true });
|
||||
rmSync(tmp, { recursive: true, force: true });
|
||||
}
|
||||
console.log(`pull-specs: spec corpus updated in ${target}`);
|
||||
} catch (err) {
|
||||
console.error('pull-specs: failed —', err.message);
|
||||
rmSync(tmp, { recursive: true, force: true });
|
||||
process.exit(1);
|
||||
}
|
||||
@@ -1,19 +1,62 @@
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
# THE FULLY-ANNOTATED REFERENCE SPEC
|
||||
# Copy this file as your starting point. Every field below has an inline `#`
|
||||
# note explaining whether the LLM READS it (it appears in the system prompt as
|
||||
# a narration directive) or the BOT ENFORCES it (the engine acts on it directly
|
||||
# and the LLM never sees the mechanism). Write for a narrator, not a programmer.
|
||||
#
|
||||
# Contract model, pitfalls, and the field catalog live in
|
||||
# docs/spec-authoring-guide.md; the runtime validator is EncounterSpecSchema in
|
||||
# src/spec/loader.ts. This file must pass Zod validation and load via
|
||||
# `/encounter start market-thief`.
|
||||
#
|
||||
# YAML note: prose fields use the `>` folded scalar so line breaks become
|
||||
# spaces. Put `#` comments ONLY on their own lines between fields — never
|
||||
# inside a `>` block, or the comment text becomes part of the prose.
|
||||
# ─────────────────────────────────────────────────────────────────────────────
|
||||
|
||||
# encounterId — BOT ENFORCES. The session key (Redis `session:<encounterId>`-
|
||||
# scoped thread id) and the id the LLM echoes back in encounter_resolve. Must be
|
||||
# unique across the corpus. Stable forever once a spec is live (graph memory and
|
||||
# Foundry journals reference it). kebab-case.
|
||||
encounterId: "mardonar-market-thief-001"
|
||||
|
||||
# title — BOT ENFORCES (shown in Discord embeds) / LLM READS (opening scene
|
||||
# header). Short human-readable label. Keep it in-world.
|
||||
title: "The Market Square Thief"
|
||||
|
||||
# tone — LLM READS (narration flavor) + BOT ENFORCES (selects which pre-written
|
||||
# in-world drop notice is shown when the burst cap drops a player's message).
|
||||
# Free text: tense, grim, comedic, mysterious... Unrecognised tones fall back
|
||||
# to a baseline drop notice, so prefer a value the engine knows if you care
|
||||
# about the exact drop wording.
|
||||
tone: "tense"
|
||||
|
||||
# setting — LLM READS. Three strings injected into the system prompt to ground
|
||||
# the scene. The LLM leans into `mood` as a style directive; `ambientNpcs` gives
|
||||
# background life without the overhead of a full persona.
|
||||
setting:
|
||||
# location — LLM READS. Where the scene happens. One line.
|
||||
location: "City of Mardonar — Market Square Food Festival"
|
||||
# mood — LLM READS. A briefing for a DM: sensory detail, pacing, time of day.
|
||||
# A few adjectives/lines beat a paragraph — this eats context every turn.
|
||||
mood: >
|
||||
Midday sun beats down on a lively crowd. The air smells of roasting meat,
|
||||
fresh bread, and spiced cider. Merchants shout their wares. Children weave
|
||||
through legs. Two city guards are visible at the far end of the square,
|
||||
too far to respond quickly.
|
||||
# ambientNpcs — LLM READS. Set dressing the LLM can use as props/witnesses.
|
||||
# Not interactive personas — name one or two so a crowded scene feels alive.
|
||||
ambientNpcs: >
|
||||
A dozen festival-goers milling about. A juggler performing near the fountain.
|
||||
A heavyset merchant arguing with a customer two stalls down. An elderly couple
|
||||
sharing a meat pie on a bench.
|
||||
|
||||
# openingNarrative — LLM READS, and PINNED (never trimmed from context for the
|
||||
# whole encounter). Posted verbatim at session start. Put everything the LLM
|
||||
# must keep in frame here — the core tension, the trigger event, named hooks.
|
||||
# `{{name_key}}` placeholders are filled at start from the randomizable draw
|
||||
# (see randomizable below), so the opening reads as concrete, not templated.
|
||||
openingNarrative: >
|
||||
The food festival fills Market Square with color and noise. Stalls stretch in
|
||||
every direction — honeyed nuts, smoked fish, fresh-pressed cider, towers of
|
||||
@@ -23,12 +66,18 @@ openingNarrative: >
|
||||
going scarlet. "THIEF!" Her voice cuts through the crowd like a blade.
|
||||
The festival-goers nearest her freeze and stare. Would anyone intervene?
|
||||
|
||||
# npcs — LLM READS. 1–5 personas. `persona` is a VOICE/BEHAVIOR directive, not
|
||||
# a stat block (Foundry owns stats — HP/AC/spell lists here waste context the
|
||||
# LLM can't use). How they speak, what they want, how they react under pressure.
|
||||
# Add `memoryKey` ONLY for NPCs that should remember across encounters; leave it
|
||||
# off for one-off NPCs. `nameKey` binds the displayed name to a randomizable draw
|
||||
# so the same spec yields different names per run.
|
||||
npcs:
|
||||
- id: "miriam-vendor-mardonar"
|
||||
name: "Miriam"
|
||||
nameKey: vendor_name
|
||||
role: "Apple stand vendor"
|
||||
persona: >
|
||||
- id: "miriam-vendor-mardonar" # BOT ENFORCES — stable id for graph memory.
|
||||
name: "Miriam" # LLM READS — display name.
|
||||
nameKey: vendor_name # BOT ENFORCES — resolves via randomizable[].
|
||||
role: "Apple stand vendor" # LLM READS — one-line role cue.
|
||||
persona: > # LLM READS — voice/behavior directive.
|
||||
Stout, red-faced Dwarf woman in her sixties. Has run this stall for twenty
|
||||
years and takes every theft as a personal insult. She is loud, indignant,
|
||||
and will berate anyone nearby who does nothing. She is NOT a fighter and
|
||||
@@ -38,7 +87,7 @@ npcs:
|
||||
she will mutter darkly about the state of the city and the uselessness of
|
||||
bystanders. She refers to the apple as "my finest Crimson Bellflower, worth
|
||||
three silvers if it's worth a copper."
|
||||
memoryKey: "miriam-vendor-mardonar"
|
||||
memoryKey: "miriam-vendor-mardonar" # BOT ENFORCES — graph-memory lookup/commit.
|
||||
|
||||
- id: "dal-thief-mardonar"
|
||||
name: "Dal"
|
||||
@@ -55,11 +104,17 @@ npcs:
|
||||
him but has never used it on a person.
|
||||
memoryKey: "dal-thief-mardonar"
|
||||
|
||||
# goals — LLM READS (steered toward primary; secondary are valid-but-not-main).
|
||||
# `hidden` (default true) means players don't see the goal list — the LLM
|
||||
# drives toward them invisibly. Each goal needs a stable `id` (referenced by
|
||||
# encounter_resolve) and a `label` (the label becomes the closing embed's Outcome
|
||||
# text). 2–3 primary is the sweet spot. The LLM may register MORE goals
|
||||
# mid-encounter via the goal_register tool when players go off-rails.
|
||||
goals:
|
||||
hidden: true
|
||||
primary:
|
||||
- id: "catch"
|
||||
label: >
|
||||
- id: "catch" # stable id — encounter_resolve outcomeId.
|
||||
label: > # becomes the closing embed's Outcome text.
|
||||
Players physically catch or restrain Dal — tackle, grab, spell, or block
|
||||
his escape route so he cannot run.
|
||||
- id: "kill"
|
||||
@@ -92,6 +147,9 @@ goals:
|
||||
slowed him. If guards arrive and Dal is caught, players receive no reward
|
||||
but the city notes their cooperation.
|
||||
|
||||
# sportsmanshipRules — LLM READS. Injected as hard guardrails on narration. The
|
||||
# LLM uses these to redirect absurd/game-breaking actions in-character rather
|
||||
# than silently allowing them. Keep them as short in-world-ish directives.
|
||||
sportsmanshipRules:
|
||||
- "No instant kills on a non-threatening, unarmed teenager without prior escalation."
|
||||
- "No controlling another player character's actions or speaking for them."
|
||||
@@ -104,6 +162,13 @@ sportsmanshipRules:
|
||||
"⚠️ That wasn't great sportsmanship. Let's keep it grounded — what would
|
||||
your character realistically attempt here?"
|
||||
|
||||
# skillChecks — BOT ENFORCES. A map of named target numbers the LLM references
|
||||
# BY NAME when it emits the skill_check_emit tool call. The LLM picks the skill
|
||||
# and names the check; the spec supplies the DC. Name checks for the ACTION, not
|
||||
# the stat (chase_dc, not dex_dc). Free-text `_skill`/`_note` companion keys are
|
||||
# LLM-READ context that travel alongside the DC — they never become dice.
|
||||
# NEVER put a dice result here — the bot controls dice, the LLM only narrates
|
||||
# the outcome after the [SKILL CHECK RESULT] message.
|
||||
skillChecks:
|
||||
chase_dc: 13
|
||||
chase_skill: "Athletics or Acrobatics (player's choice)"
|
||||
@@ -134,6 +199,13 @@ skillChecks:
|
||||
If a player offers Dal food, coin, or genuine kindness while he is cornered.
|
||||
Success causes him to surrender and explain his situation.
|
||||
|
||||
# randomizable — BOT ENFORCES. Declare fields that vary per run so the same
|
||||
# spec file yields different substance across runs. `key` binds the draw to a
|
||||
# {{placeholder}} in openingNarrative (or an npc.nameKey). `query` is the prompt
|
||||
# the resolver sends to GraphMCP's vocabulary lookup; `fallback` is used if the
|
||||
# lookup fails or is unavailable. `source: vocabulary` + `category` route the
|
||||
# query to the vocabulary namespace. A draw without a placeholder is still
|
||||
# useful — it seeds background detail the LLM can weave in.
|
||||
randomizable:
|
||||
- key: vendor_name
|
||||
source: vocabulary
|
||||
@@ -158,18 +230,30 @@ randomizable:
|
||||
query: "festivals, food events, or public celebrations in Mardonar city"
|
||||
fallback: "the annual Harvest Week food festival draws vendors from three districts"
|
||||
|
||||
# tools — BOT ENFORCES. Which tool plugins are active for this encounter.
|
||||
# Omitting it activates the default set; list explicitly to narrow it (e.g. a
|
||||
# no-combat encounter that drops skill_check_emit). Every name MUST be a
|
||||
# registered plugin — the specs-tools consistency test fails the build if a
|
||||
# spec references an unknown tool. Registered set: skill_check_emit,
|
||||
# encounter_resolve, context_recall, goal_register, foundry_lookup,
|
||||
# foundry_reward.
|
||||
tools:
|
||||
- skill_check_emit
|
||||
- encounter_resolve
|
||||
- context_recall
|
||||
- goal_register
|
||||
- foundry_lookup
|
||||
- foundry_reward
|
||||
- skill_check_emit # LLM emits to request a bot-controlled dice embed.
|
||||
- encounter_resolve # LLM emits to end the encounter with an outcome id.
|
||||
- context_recall # LLM emits to pull prior NPC/party facts from graph memory.
|
||||
- goal_register # LLM emits to add an off-rails goal mid-encounter.
|
||||
- foundry_lookup # LLM emits to surface a linked player's live Foundry stats.
|
||||
- foundry_reward # LLM emits to award XP/items via the Foundry relay.
|
||||
|
||||
# dmNotes — LLM READS. Author-only guidance injected into the system prompt as
|
||||
# framing for the DM's intent. Use it to set the encounter's stakes, tone
|
||||
# calibration, and the "feel" you want — not rules the LLM must mechanically
|
||||
# follow (those go in sportsmanshipRules/skillChecks). The LLM sees this, so
|
||||
# keep it in-world-ish and avoid utility jargon.
|
||||
dmNotes: >
|
||||
This encounter is intentionally low-stakes — a warm-up scene in a public
|
||||
setting with no combat required. The goal is to establish player character
|
||||
personalities and how they interact with a morally simple situation (hungry
|
||||
kid steals food). There is no "correct" outcome. Lean into the crowd's
|
||||
reactions. If players hesitate, have Miriam single one of them out directly.
|
||||
Dal should feel like a person, not a target.
|
||||
Dal should feel like a person, not a target.
|
||||
@@ -2,10 +2,8 @@ import { SlashCommandBuilder } from '@discordjs/builders';
|
||||
import { EmbedBuilder, AttachmentBuilder } from 'discord.js';
|
||||
import type { ChatInputCommandInteraction, TextChannel } from 'discord.js';
|
||||
import { buildEncounterListEmbed } from '../embeds/encounterDiscovery.js';
|
||||
import { writeFileSync } from 'fs';
|
||||
import { join } from 'path';
|
||||
import { load, dump } from 'js-yaml';
|
||||
import { loadSpec, EncounterSpecSchema, listSpecFiles } from '../../spec/loader.js';
|
||||
import { dump } from 'js-yaml';
|
||||
import { loadSpec, listSpecFiles } from '../../spec/loader.js';
|
||||
import { getAllToolNames } from '../../harness/toolRegistry.js';
|
||||
import type { EncounterSpec } from '../../types/index.js';
|
||||
import { sessionManager } from '../../session/sessionManager.js';
|
||||
@@ -13,9 +11,7 @@ import { playerRegistry } from '../../session/playerRegistry.js';
|
||||
import { config } from '../../config.js';
|
||||
import {
|
||||
queryAsNPC, formatNPCMemory, logEncounter,
|
||||
listEncounters, semanticSearch,
|
||||
} from '../../graphmcp/client.js';
|
||||
import type { EncounterResultItem, SemanticSearchResult } from '../../graphmcp/client.js';
|
||||
import { resolveRandomizables } from '../../graphmcp/loreResolver.js';
|
||||
import { buildOpeningNarrative } from '../../harness/promptBuilder.js';
|
||||
import { callLLM } from '../../harness/llmClient.js';
|
||||
@@ -58,17 +54,6 @@ export const data = new SlashCommandBuilder()
|
||||
.addSubcommand(sub =>
|
||||
sub.setName('list').setDescription('Show all active encounters in this server'),
|
||||
)
|
||||
.addSubcommand(sub =>
|
||||
sub
|
||||
.setName('generate')
|
||||
.setDescription('Use the LLM + knowledge graph to generate a new encounter spec')
|
||||
.addStringOption(o =>
|
||||
o.setName('theme')
|
||||
.setDescription('Optional: steer the theme ("something in the sewers", "political intrigue", etc.)')
|
||||
.setRequired(false)
|
||||
.setMaxLength(200),
|
||||
),
|
||||
)
|
||||
.addSubcommand(sub =>
|
||||
sub.setName('spec').setDescription('Send the YAML spec for the current encounter thread'),
|
||||
);
|
||||
@@ -106,8 +91,6 @@ export async function execute(interaction: ChatInputCommandInteraction): Promise
|
||||
await handleEnd(interaction);
|
||||
} else if (sub === 'list') {
|
||||
await handleList(interaction, guildId);
|
||||
} else if (sub === 'generate') {
|
||||
await handleGenerate(interaction);
|
||||
} else if (sub === 'spec') {
|
||||
await handleSpec(interaction);
|
||||
}
|
||||
@@ -474,208 +457,6 @@ async function handleList(
|
||||
await interaction.reply({ embeds: [embed], ephemeral: true });
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// /encounter generate
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
async function handleGenerate(interaction: ChatInputCommandInteraction): Promise<void> {
|
||||
await interaction.deferReply({ ephemeral: true });
|
||||
|
||||
const theme = interaction.options.getString('theme') ?? '';
|
||||
|
||||
// Gather context in parallel; GraphMCP offline is non-fatal
|
||||
const [recentEncounters, loreResult] = await Promise.all([
|
||||
listEncounters(5).catch((): EncounterResultItem[] => []),
|
||||
semanticSearch(
|
||||
theme
|
||||
? `Mardonar lore related to: ${theme}`
|
||||
: 'Mardonar factions, dangerous locations, unresolved tensions, notable NPCs',
|
||||
6,
|
||||
).catch((): SemanticSearchResult => ({ chunks: [] })),
|
||||
]);
|
||||
|
||||
// Existing slugs so the LLM avoids collisions
|
||||
const existingSlugs: string[] = [];
|
||||
try {
|
||||
existingSlugs.push(...listSpecFiles(config.SPECS_DIR));
|
||||
} catch { /* specs dir may not exist yet */ }
|
||||
|
||||
const recentSection = recentEncounters.length > 0
|
||||
? recentEncounters
|
||||
.map(e => `- [${e.timestamp?.slice(0, 10) ?? 'unknown'}] "${e.title}" at ${e.location}: ${e.summary}`)
|
||||
.join('\n')
|
||||
: 'No recent encounters recorded — invent freely.';
|
||||
|
||||
const loreSection = loreResult.chunks.length > 0
|
||||
? loreResult.chunks.slice(0, 4).map(c => `- ${c.content.slice(0, 220)}`).join('\n')
|
||||
: 'No lore available — draw from the established Mardonar setting.';
|
||||
|
||||
const themeDirective = theme ? `\nDM-requested theme: "${theme}"\n` : '';
|
||||
|
||||
const systemMsg: ChatMessage = {
|
||||
role: 'system',
|
||||
content: [
|
||||
'You are a D&D encounter designer for the world of Mardonar — a gritty semi-steampunk city of steam-powered industry,',
|
||||
'Ratling crime syndicates, Dwarf smiths, Mawfang Tribe hunters on the outskirts, and the permanent Stormscar storm',
|
||||
'looming on the horizon. Output ONLY a single ```yaml code block. No preamble, no explanation.',
|
||||
].join(' '),
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
|
||||
const userMsg: ChatMessage = {
|
||||
role: 'user',
|
||||
content: [
|
||||
themeDirective,
|
||||
'RECENT ENCOUNTER HISTORY — build on these hooks, do not repeat them:',
|
||||
recentSection,
|
||||
'',
|
||||
'LORE FROM KNOWLEDGE GRAPH:',
|
||||
loreSection,
|
||||
'',
|
||||
`EXISTING SPEC SLUGS — your encounterId must not conflict: ${existingSlugs.join(', ') || '(none yet)'}`,
|
||||
'',
|
||||
'OUTPUT a complete YAML spec using this exact structure (all fields required unless marked optional):',
|
||||
'```yaml',
|
||||
'encounterId: "mardonar-<slug>-<3-digit-number>" # unique kebab-case',
|
||||
'title: "Short evocative title"',
|
||||
'tone: "grim" # one of: grim | tense | mysterious | comedic',
|
||||
'',
|
||||
'setting:',
|
||||
' location: "City of Mardonar — specific named place"',
|
||||
' mood: >',
|
||||
' 3-5 sentences: time of day, sensory detail, atmosphere, pacing notes.',
|
||||
' ambientNpcs: >',
|
||||
' 2-4 sentences: background characters (not interactive). Name at least one.',
|
||||
'',
|
||||
'openingNarrative: >',
|
||||
' 3-5 sentences posted verbatim to Discord. Second-person or omniscient. Ends on a decision hook, not a summary.',
|
||||
'',
|
||||
'npcs:',
|
||||
' - id: "name-role-mardonar" # globally unique kebab-case',
|
||||
' name: "Display Name"',
|
||||
' role: "One-line role"',
|
||||
' persona: >',
|
||||
' Detailed brief: appearance, motivation, speech patterns, what they will and will NOT do.',
|
||||
' memoryKey: "name-role-mardonar" # omit for throwaway NPCs',
|
||||
'',
|
||||
'goals:',
|
||||
' hidden: true',
|
||||
' primary:',
|
||||
' - id: "outcome_snake_case"',
|
||||
' label: >',
|
||||
' Precise trigger: name exactly what state the world must be in for this outcome.',
|
||||
' - id: "second_primary"',
|
||||
' label: >',
|
||||
' Second primary goal.',
|
||||
' secondary:',
|
||||
' - id: "fallback"',
|
||||
' label: >',
|
||||
' Fallback ending — players flee, fail, or do nothing.',
|
||||
'',
|
||||
'sportsmanshipRules:',
|
||||
' - "No <specific constraint>."',
|
||||
' - >',
|
||||
' If a player attempts something absurd or game-breaking, respond in-character to redirect, or break character',
|
||||
' with: "⚠️ That wasn\'t great sportsmanship. Let\'s keep it grounded — what would your character realistically attempt here?"',
|
||||
'',
|
||||
'skillChecks:',
|
||||
' <name>_dc: 14',
|
||||
' <name>_skill: "Athletics"',
|
||||
' <name>_note: >',
|
||||
' Who gets advantage/disadvantage, what success/failure looks like.',
|
||||
'',
|
||||
'# optional — details that vary between sessions',
|
||||
'randomizable:',
|
||||
' - key: some_name',
|
||||
' query: "Mardonar lore query to find a good value"',
|
||||
' fallback: "Default value if GraphMCP is offline"',
|
||||
'',
|
||||
'dmNotes: >',
|
||||
' Emotional core, third path if players are creative, future hooks from each outcome.',
|
||||
'```',
|
||||
].join('\n'),
|
||||
timestamp: Date.now(),
|
||||
};
|
||||
|
||||
let rawResponse: string;
|
||||
try {
|
||||
const result = await callLLM([systemMsg, userMsg]);
|
||||
rawResponse = result.narrative;
|
||||
} catch (err) {
|
||||
await interaction.editReply(`LLM call failed: ${String(err)}`);
|
||||
return;
|
||||
}
|
||||
|
||||
if (!rawResponse.trim()) {
|
||||
await interaction.editReply('LLM returned an empty response. Try `/encounter generate` again.');
|
||||
return;
|
||||
}
|
||||
|
||||
// Extract YAML from fenced block
|
||||
const yamlMatch = /```ya?ml\s*\n([\s\S]+?)```/.exec(rawResponse);
|
||||
const yamlContent = yamlMatch ? yamlMatch[1].trim() : rawResponse.trim();
|
||||
|
||||
let parsed: Record<string, unknown>;
|
||||
try {
|
||||
parsed = load(yamlContent) as Record<string, unknown>;
|
||||
} catch (err) {
|
||||
await interaction.editReply(
|
||||
`LLM produced invalid YAML: ${String(err)}\n\`\`\`\n${rawResponse.slice(0, 600)}\n\`\`\``,
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
// Prefix encounterId with 'gen-' so the bot can detect generated specs at resolution time
|
||||
const rawId = (parsed.encounterId as string | undefined) ?? `generated-${Date.now()}`;
|
||||
const genId = rawId.startsWith('gen-') ? rawId : `gen-${rawId}`;
|
||||
parsed.encounterId = genId;
|
||||
|
||||
const finalYaml = dump(parsed, { lineWidth: 120, quotingType: '"' });
|
||||
|
||||
// Validate — write anyway, but surface warnings
|
||||
const validation = EncounterSpecSchema.safeParse(parsed);
|
||||
const warnings = validation.success
|
||||
? []
|
||||
: validation.error.errors.map(e => `- ${e.path.join('.')}: ${e.message}`);
|
||||
|
||||
const slug = genId.replace(/[^a-z0-9-]/gi, '-').toLowerCase();
|
||||
const outPath = join(config.SPECS_DIR, `${slug}.yaml`);
|
||||
|
||||
try {
|
||||
writeFileSync(outPath, finalYaml, 'utf-8');
|
||||
} catch (err) {
|
||||
await interaction.editReply(`Failed to write spec file: ${String(err)}`);
|
||||
return;
|
||||
}
|
||||
|
||||
const specTitle = (parsed.title as string | undefined) ?? 'Unknown';
|
||||
const specTone = (parsed.tone as string | undefined) ?? 'unset';
|
||||
const setting = parsed.setting as Record<string, string> | undefined;
|
||||
const specLocation = setting?.location ?? 'Unknown';
|
||||
|
||||
const embed = new EmbedBuilder()
|
||||
.setTitle(`🎲 Generated: ${specTitle}`)
|
||||
.addFields(
|
||||
{ name: 'File', value: `\`${slug}.yaml\``, inline: true },
|
||||
{ name: 'Tone', value: specTone, inline: true },
|
||||
{ name: 'Location', value: specLocation },
|
||||
)
|
||||
.setColor(0x9b59b6)
|
||||
.setFooter({ text: `Run with: /encounter start ${slug}` });
|
||||
|
||||
const attachment = new AttachmentBuilder(Buffer.from(finalYaml, 'utf-8'), { name: `${slug}.yaml` });
|
||||
|
||||
const warningBlock = warnings.length > 0
|
||||
? `\n⚠️ Schema warnings (review before running):\n${warnings.slice(0, 5).join('\n')}`
|
||||
: '';
|
||||
|
||||
await interaction.editReply({
|
||||
content: `Spec generated and saved to \`specs/${slug}.yaml\`. Review before running.${warningBlock}`,
|
||||
embeds: [embed],
|
||||
files: [attachment],
|
||||
});
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// /encounter spec
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
@@ -344,6 +344,8 @@ export async function runLLMTurn(
|
||||
? 'Do NOT state or imply a specific dice result. Wait for the [SKILL CHECK RESULT] system message before narrating any outcome.'
|
||||
: filter.reason === 'echoed_system_tag'
|
||||
? 'Do NOT echo internal system tags like [TOOL], [SESSION], or [SKILL CHECK] verbatim in your response.'
|
||||
: filter.reason === 'leaked_tool_call'
|
||||
? 'Your last response leaked a raw tool-call block (the word "tool_call", a code fence, or bare {"tool":...,"args":...} JSON) as player-facing narration. To emit a tool call, output ONLY a fenced ```tool_call block containing the JSON — never write "tool_call", the JSON, or a code fence as part of the narration. Otherwise narrate the scene in-world with no tool block at all.'
|
||||
: 'Your previous response was empty. Continue the scene.';
|
||||
|
||||
const correction: ChatMessage = {
|
||||
|
||||
@@ -7,6 +7,22 @@ import { log } from '../../lib/logger.js';
|
||||
// LLM echoing internal system message tags verbatim
|
||||
const SYSTEM_TAG_RE = /\[TOOL[^\]]*\]|\[SKILL CHECK[^\]]*\]|\[SESSION[^\]]*\]|\[\/roll[^\]]*\]|\[SYSTEM[^\]]*\]/i;
|
||||
|
||||
// LLM leaking the internal tool-call emit format as player-facing narration.
|
||||
// parseToolCall is supposed to lift these out before they reach the filter; if
|
||||
// it missed (the model used an unrecognised header, left trailing text after the
|
||||
// JSON, fenced it as ```json, or emitted bare JSON without a header) the raw
|
||||
// `tool_call\n{...}` or `{"tool":...,"args":...}` would be posted to players
|
||||
// verbatim — exactly the leak seen in the 2026-06-20 game session, where Zal-Bot
|
||||
// posted `tool_call\n{ "tool": "skill_check_emit", "args": {...} }` to the thread.
|
||||
// The in-world narrator never emits any of these (prose has no `tool_call`
|
||||
// token, no code fences, no tool-call JSON), so presence is always a leak:
|
||||
// suppress and ask the model to re-emit as a fenced ```tool_call block.
|
||||
// - bare `tool_call` token (unfenced header, or inside a ```tool_call fence)
|
||||
// - bare tool-call JSON, tool-first (the canonical emit shape)
|
||||
// - bare tool-call JSON, args-first (if the model reorders the keys)
|
||||
const LEAKED_TOOL_CALL_RE =
|
||||
/\btool_call\b|\{\s*"tool"\s*:\s*"[^"]+"\s*,\s*"args"\s*:\s*\{|\{\s*"args"\s*:\s*\{[\s\S]{0,300}?\}\s*,\s*"tool"\s*:\s*"/i;
|
||||
|
||||
// LLM claiming a specific dice roll result (the bot controls dice, not the LLM).
|
||||
// Catches: "you rolled a 15", "the brawler rolled a 12", "rolls a 7", "the die shows", etc.
|
||||
const ROLL_CLAIM_RE = /\brolled\s+(?:a\s+)?\d+\b|\bthe die shows\b|\bthe dice (?:show|showed)\b|\brolls?\s+(?:a\s+)?\d+\b/i;
|
||||
@@ -39,6 +55,10 @@ export function filterLLMResponse(
|
||||
return { ok: false, reason: 'echoed_system_tag' };
|
||||
}
|
||||
|
||||
if (LEAKED_TOOL_CALL_RE.test(narrative)) {
|
||||
return { ok: false, reason: 'leaked_tool_call' };
|
||||
}
|
||||
|
||||
if (!options?.skipRollClaim && ROLL_CLAIM_RE.test(narrative)) {
|
||||
return { ok: false, reason: 'fabricated_roll_result' };
|
||||
}
|
||||
|
||||
@@ -121,11 +121,11 @@ export async function queryAsNPC(
|
||||
|
||||
// Map a raw GraphMCP search chunk to the declared SemanticChunk shape. The live
|
||||
// backend returns `{ text, score, source, author, timestamp, msgID }`, but the
|
||||
// client's SemanticChunk type (and its callers — encounter.ts handleGenerate,
|
||||
// mentionHandler) read `.content`. Without this mapping, `c.content` is
|
||||
// undefined and `c.content.slice(...)` in /encounter generate throws the same
|
||||
// "Cannot read properties of undefined (reading 'slice')" class as the
|
||||
// loreResult.chunks crash. Accept either field name for robustness.
|
||||
// client's SemanticChunk type (and its callers — mentionHandler, loreResolver)
|
||||
// read `.content`. Without this mapping, `c.content` is undefined and
|
||||
// `c.content.slice(...)` throws "Cannot read properties of undefined (reading
|
||||
// 'slice')" — the same class as the loreResult.chunks crash. Accept either
|
||||
// field name for robustness.
|
||||
function toSemanticChunk(raw: unknown): SemanticChunk {
|
||||
const r = (raw ?? {}) as { text?: unknown; content?: unknown; score?: unknown; source?: unknown };
|
||||
const content =
|
||||
|
||||
@@ -1,6 +1,7 @@
|
||||
import OpenAI from 'openai';
|
||||
import { config } from '../config.js';
|
||||
import { parseToolCall } from './toolParser.js';
|
||||
import { toApiMessages } from './llmMessages.js';
|
||||
import { log } from '../lib/logger.js';
|
||||
import type { ChatMessage, LLMResponse } from '../types/index.js';
|
||||
|
||||
@@ -23,7 +24,7 @@ export async function callLLM(messages: ChatMessage[]): Promise<LLMResponse> {
|
||||
|
||||
const response = await getClient().chat.completions.create({
|
||||
model,
|
||||
messages: messages.map(m => ({ role: m.role, content: m.content })),
|
||||
messages: toApiMessages(messages),
|
||||
temperature: config.OLLAMA_TEMPERATURE,
|
||||
});
|
||||
|
||||
|
||||
35
src/harness/llmMessages.ts
Normal file
35
src/harness/llmMessages.ts
Normal file
@@ -0,0 +1,35 @@
|
||||
import type { ChatMessage } from '../types/index.js';
|
||||
|
||||
/**
|
||||
* Map chat history to the `{ role, content }` payload for an OpenAI- /
|
||||
* Ollama-compatible chat endpoint, coercing a TRAILING `system` message to
|
||||
* `user`.
|
||||
*
|
||||
* Why: OpenAI-compatible endpoints (the LiteLLM proxy in front of
|
||||
* ollama-cloud, and some Ollama builds) return an EMPTY completion when the
|
||||
* final message has role `system` — they expect the last turn to be a
|
||||
* user/assistant message. Several narrator paths append a trailing system
|
||||
* directive and then schedule a turn:
|
||||
* - `/roll` and `/turn` inject a system nudge as the last history entry,
|
||||
* - `responseFilter` appends a `[FILTER CORRECTION]` system message and
|
||||
* schedules a retry,
|
||||
* - the roll handler appends a `[SKILL CHECK RESULT]` system message and
|
||||
* schedules the reaction turn.
|
||||
* Each went silent — the always-grow `[NO RESPONSE]` fallback fired because the
|
||||
* model was asked to respond to a system monologue. Coercing only the LAST
|
||||
* `system` message to `user` for the API call lets the model respond, while the
|
||||
* stored history keeps role `system` so the role-based guards in
|
||||
* `messageRouter` (e.g. the `[FILTER CORRECTION]` retry-loop guard) still match.
|
||||
*
|
||||
* Confirmed empirically: the same /roll nudge content returns an empty response
|
||||
* as a trailing `system` message and a correct `skill_check_emit` tool call as a
|
||||
* trailing `user` message.
|
||||
*/
|
||||
export function toApiMessages(
|
||||
messages: ChatMessage[],
|
||||
): { role: 'system' | 'user' | 'assistant'; content: string }[] {
|
||||
const api = messages.map(m => ({ role: m.role, content: m.content }));
|
||||
const last = api[api.length - 1];
|
||||
if (last && last.role === 'system') last.role = 'user';
|
||||
return api;
|
||||
}
|
||||
@@ -1,6 +1,7 @@
|
||||
import { Ollama } from 'ollama';
|
||||
import { config } from '../config.js';
|
||||
import { parseToolCall } from './toolParser.js';
|
||||
import { toApiMessages } from './llmMessages.js';
|
||||
import { log } from '../lib/logger.js';
|
||||
import type { ChatMessage, LLMResponse } from '../types/index.js';
|
||||
|
||||
@@ -12,7 +13,7 @@ export async function callLLM(messages: ChatMessage[]): Promise<LLMResponse> {
|
||||
|
||||
const response = await ollama.chat({
|
||||
model,
|
||||
messages: messages.map(m => ({ role: m.role, content: m.content })),
|
||||
messages: toApiMessages(messages),
|
||||
stream: false,
|
||||
options: { temperature: config.OLLAMA_TEMPERATURE, num_ctx: config.OLLAMA_NUM_CTX },
|
||||
});
|
||||
|
||||
@@ -79,8 +79,8 @@ export function loadSpec(specName: string): EncounterSpec {
|
||||
|
||||
// List every encounter spec in `dir` (defaults to config.SPECS_DIR) as spec
|
||||
// names with the file extension stripped. The single source for spec discovery
|
||||
// — used by /encounter random, /encounter generate, and the specs-tools
|
||||
// consistency test so the discovery rule can't drift between call sites.
|
||||
// — used by /encounter random and the specs-tools consistency test so the
|
||||
// discovery rule can't drift between call sites.
|
||||
export function listSpecFiles(dir: string = config.SPECS_DIR): string[] {
|
||||
return readdirSync(dir)
|
||||
.filter(f => f.endsWith('.yaml') || f.endsWith('.yml'))
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
---
|
||||
stepsCompleted: ['step-01-preflight-and-context', 'step-02-generation-mode', 'step-03-test-strategy', 'step-04-generate-tests', 'step-05-validate-and-complete']
|
||||
lastStep: 'step-05-validate-and-complete'
|
||||
lastSaved: '2026-06-19'
|
||||
lastSaved: '2026-06-20'
|
||||
workflowType: 'testarch-atdd'
|
||||
storyId: 'graphmcp.live.1'
|
||||
storyKey: 'graphmcp-live-integration-tests'
|
||||
@@ -13,6 +13,10 @@ generatedTestFiles:
|
||||
- 'tests/integration/graphmcp/skill-check.test.ts'
|
||||
- 'tests/integration/graphmcp/lore-and-events.test.ts'
|
||||
- 'tests/integration/graphmcp/long-encounter.test.ts'
|
||||
- 'tests/integration/graphmcp/encounters-command.test.ts'
|
||||
- 'tests/integration/graphmcp/registries-command.test.ts'
|
||||
- 'tests/integration/graphmcp/encounter-thread-commands.test.ts'
|
||||
- 'tests/integration/graphmcp/foundry-command.test.ts'
|
||||
- 'tests/integration/graphmcp/support/env.ts'
|
||||
- 'tests/integration/graphmcp/support/poll.ts'
|
||||
- 'tests/integration/graphmcp/support/factories.ts'
|
||||
@@ -65,6 +69,11 @@ A live-infrastructure integration test suite that runs a real Mardonar encounter
|
||||
4. **AC4 — Lore/question answering + event read-after-write.** Given real lore in the graph, when a player @mentions the bot or asks a question that triggers `context_recall`/`semantic_search`, then the answer references real lore retrieved from the graph; when `log_encounter` writes an event, then `list_encounters`/`search_encounters` return that event afterward (read-after-write consistency).
|
||||
|
||||
5. **AC5 — Long encounter (20–30 turns) with complex skill usage, varied goal outcomes, and final-output verification.** Given an active run-tagged encounter, when the suite drives 20–30 turns through the real scheduler (`scheduleEncounterLLMTurn` + history polling) with a scripted driver strategy, resolving every `skill_check_emit` via `handleRollInteraction`, then the encounter reaches a valid goal outcome (one of the spec's `goals.primary`/`secondary` ids) within the turn cap; different driver strategies reach DIFFERENT goal outcomes; and the final `encounter_resolve` output is read back from GraphMCP (`list_encounters` matched by run-id in the title → `get_encounter` returns the LLM-written summary, participants, and the resolved `outcomeId` in the title).
|
||||
6. **AC6 — `/encounters` slash command reads GraphMCP for real.** Given a run-tagged encounter seeded via `log_encounter` and visible via `list_encounters` (read-after-write), when the suite drives the registered `/encounters` handlers with fake interactions — `execute()` (list), `handleEncounterSelect` (select → `get_encounter`), `handleSearchModalSubmit` (search modal → `search_encounters`) — then each handler hits the **live** GraphMCP and renders the seeded encounter: the list select-menu contains its id, the select handler's details embed carries the GraphMCP title + participants, and the search-by-participant menu surfaces it; an empty search query is rejected in-world without calling `search_encounters`. GraphMCP-only (no Discord login, no Redis) → fast and parallel-safe with the Discord-token-gated suites.
|
||||
7. **AC7 — RETIRED 2026-06-20.** `/encounter generate` (on-the-spot AI spec generation) was removed from the system — AI must never author canonical encounter lore (per `spec-mardonar-encounter-engine` non-goal + constraint). The `handleGenerate` handler, the `generate` subcommand registration, and `tests/integration/graphmcp/encounter-generate.test.ts` were deleted; the spec-authoring path moves to a separate encounter-builder tool (see `spec-encounter-builder`). The GraphMCP client normalization that AC7 surfaced (`semanticSearch`/`listEncounters` wrong-shape handling) remains guarded by AC1 + `tests/unit/graphmcpClient.test.ts`.
|
||||
8. **AC8 — Registry-backed slash commands round-trip through real Redis.** Given Redis up, when the suite drives `/dndname set|show|clear`, `/character register custom` (modal) + `show` + `clear` + `admin list` + `admin remove`, `/actions` (no Foundry link), and `/character view` (no profile / no link) via the registered handlers with fake interactions, then each round-trips through the real `characterRegistry`/`playerRegistry` (Redis): writes are visible to subsequent reads, embeds render the persisted data, and the pre-Foundry branches reply with their in-world prompts. Redis-only (no Discord login, no Foundry, no LLM) → fast and parallel-safe.
|
||||
9. **AC9 — Encounter-thread slash commands `/roll`, `/turn`, `/xp award`.** Given a live encounter thread, when the suite drives `/turn` (nudge → real LLM turn → history grows), `/roll` (appends the action + mandatory `skill_check_emit` nudge → the LLM emits the tool → `pendingSkillCheck` set), and `/xp award` (no-players branch, no-amount/no-spec-default branch, and with-player branch where `awardXP` skips an unlinked player), then each command's contract holds against the real scheduler + LLM + Redis. Full stack (Discord + LLM + Redis + GraphMCP for `/encounter start`).
|
||||
10. **AC10 — Foundry-VTT-dependent slash command paths.** Given a reachable Foundry relay and a sacrificial test PC, when the suite drives `/character register foundry` (modal → `searchActors` → link), `/character view` (with link → `getActorDetails`/`Inventory`/`Spells`), `/actions` (with link → `getActorInventory`/`Spells`), `/character admin give` (modal → `giveItem`, mutating), and `awardXP` with a linked player (`modifyExperience`, mutating), then each hits the real Foundry relay. Gated `RUN_FOUNDRY_LIVE=1` + `E2E_FOUNDRY_CHARACTER_NAME` (the relay is remote production by default; the mutating paths are unsafe to run blind) → skips by default, consistent with the AC3/AC4 scaffold pattern.
|
||||
|
||||
---
|
||||
|
||||
@@ -124,6 +133,11 @@ All scaffolds are real `it()` tests under `describe.skipIf(...)` — skipped wit
|
||||
| `tests/integration/graphmcp/encounter-lifecycle.test.ts` | AC2 | `RUN_FULL_E2E=1` | 3 (S2.1 start, S2.2 driver turn, S2.3 end) |
|
||||
| `tests/integration/graphmcp/skill-check.test.ts` | AC3 | `RUN_FULL_E2E=1` | 2 (S3.1 emit, S3.2 resolve) |
|
||||
| `tests/integration/graphmcp/lore-and-events.test.ts` | AC4 | `RUN_FULL_E2E=1` | 2 (S4.1 mention, S4.2 read-after-write) |
|
||||
| `tests/integration/graphmcp/long-encounter.test.ts` | AC5 | `RUN_FULL_E2E=1` | 1 (×5 strategies via `E2E_STRATEGY`; 20–30 turns, skill checks, GraphMCP summary read-back) |
|
||||
| `tests/integration/graphmcp/encounters-command.test.ts` | AC6 | `RUN_FULL_E2E=1` | 4 (C1 list, C2 select→get_encounter, C3 search→search_encounters, C4 empty-query rejected) — GraphMCP-only, no Discord login |
|
||||
| `tests/integration/graphmcp/registries-command.test.ts` | AC8 | `RUN_FULL_E2E=1` | 6 (/dndname set/show/clear, /character register-custom modal + show + clear + admin list/remove, /actions no-link, /character view no-profile/no-link) — Redis-only, no Discord login |
|
||||
| `tests/integration/graphmcp/encounter-thread-commands.test.ts` | AC9 | `RUN_FULL_E2E=1` | 5 (/turn nudge, /roll forces skill_check_emit, /xp no-players, /xp no-amount, /xp with-player) — full stack |
|
||||
| `tests/integration/graphmcp/foundry-command.test.ts` | AC10 | `RUN_FULL_E2E=1` + `RUN_FOUNDRY_LIVE=1` + `E2E_FOUNDRY_CHARACTER_NAME` | 5 (register foundry modal→link, view with link, actions with link, admin give [mutating], awardXP [mutating]) — skips by default |
|
||||
| `tests/integration/graphmcp/support/env.ts` | — | — | config-env bootstrap (stubs Discord creds if absent; seeds `DISCORD_ALLOWED_CHANNELS` from `E2E_TEST_CHANNEL_ID`) |
|
||||
| `tests/integration/graphmcp/support/poll.ts` | — | — | `waitFor` / `untilStable` (eventual-consistency + LLM-turn polling) |
|
||||
| `tests/integration/graphmcp/support/factories.ts` | — | — | `runId`, `buildEncounterLog`, `titleMatchesRun` |
|
||||
@@ -332,13 +346,18 @@ $ npx vitest run tests/integration
|
||||
RUN v3.2.6
|
||||
↓ tests/integration/phase1.test.ts (2 tests | 2 skipped)
|
||||
↓ tests/integration/graphmcp/contract.test.ts (7 tests | 7 skipped)
|
||||
↓ tests/integration/graphmcp/encounters-command.test.ts (4 tests | 4 skipped)
|
||||
↓ tests/integration/graphmcp/encounter-generate.test.ts (2 tests | 2 skipped)
|
||||
↓ tests/integration/graphmcp/registries-command.test.ts (6 tests | 6 skipped)
|
||||
↓ tests/integration/graphmcp/encounter-thread-commands.test.ts (5 tests | 5 skipped)
|
||||
↓ tests/integration/graphmcp/foundry-command.test.ts (5 tests | 5 skipped)
|
||||
↓ tests/integration/graphmcp/lore-and-events.test.ts (2 tests | 2 skipped)
|
||||
↓ tests/integration/graphmcp/encounter-lifecycle.test.ts (3 tests | 3 skipped)
|
||||
↓ tests/integration/graphmcp/skill-check.test.ts (2 tests | 2 skipped)
|
||||
↓ tests/integration/graphmcp/encounter-lifecycle.test.ts (3 tests | 3 skipped)
|
||||
↓ tests/integration/graphmcp/long-encounter.test.ts (1 test | 1 skipped)
|
||||
Test Files 6 skipped (6)
|
||||
Tests 17 skipped (17)
|
||||
Duration ~600ms
|
||||
Test Files 11 skipped (11)
|
||||
Tests 39 skipped (39)
|
||||
Duration ~740ms
|
||||
```
|
||||
→ exit 0. All scaffolds transpile and skip cleanly (CI-safe; no live infra required to import).
|
||||
|
||||
@@ -379,6 +398,22 @@ Provisioned infra: test guild + `DISCORD_TOKEN` (bot under test) + `E2E_DRIVER_T
|
||||
|
||||
**AC3 + AC4:** scaffolds transpile + skip cleanly; live execution pending a dedicated run window (AC1/AC2/AC5 already exercise the skill-check tool and GraphMCP read-after-write paths end-to-end).
|
||||
|
||||
**AC8 — registry slash commands (6 tests):** all PASS live (616ms) against real Redis only — no Discord login, no Foundry, no LLM. `/dndname` set→show→clear round-trips through `playerRegistry`; `/character register custom` modal persists via `characterRegistry` and `/character show` renders it, `/character clear` removes it; `/character admin list` (empty + populated, embed field names verified via `toJSON()`) and `admin remove` round-trip; `/actions` with no Foundry link and `/character view` no-profile/no-link hit their in-world pre-Foundry prompts. Each write is cross-checked by reading the same registry the command writes through.
|
||||
|
||||
**AC9 — encounter-thread commands `/roll` `/turn` `/xp award` (5 tests):** all PASS live (24.5s, full stack). Each command driven in its own freshly-started encounter thread to avoid LLM non-determinism cross-contamination. `/turn` → editReply "Nudging the narrator…" + a real LLM turn grows history; `/roll` → editReply "Action submitted", action + mandatory tool-call nudge appended, and the LLM emits `skill_check_emit` → `pendingSkillCheck` set (player + numeric DC verified); `/xp award` → no-players branch, no-amount/no-spec-default branch (market-thief has no `xpReward`), and with-player branch (`awardXP` skips the unlinked player, command confirms "XP awarded").
|
||||
|
||||
**AC10 — Foundry-dependent commands (5 tests):** transpile + skip cleanly by default. Gated `RUN_FOUNDRY_LIVE=1` + `E2E_FOUNDRY_CHARACTER_NAME` because `config.VTT_RELAY_URL` defaults to a remote production relay and two paths mutate real characters (`giveItem`, `modifyExperience`). Covers `/character register foundry` (modal → `searchActors` → link), `/character view` + `/actions` with a link (live actor details/inventory/spells), `/character admin give` (`giveItem`), and `awardXP` (`modifyExperience`). Live run pending a sacrificial Foundry test PC — consistent with the AC3/AC4 scaffold pattern.
|
||||
|
||||
**Bug surfaced + fixed during AC9 live validation (significant — silent narrator on trailing system messages):**
|
||||
- `src/harness/litellmClient.ts` + `src/harness/ollamaClient.ts` — the OpenAI-/Ollama-compatible chat endpoints return an **EMPTY completion when the final message has role `system`** (they expect the last turn to be `user`/`assistant`). Several narrator paths append a trailing `system` directive and then schedule a turn, so the model went silent and the always-grow `[NO RESPONSE]` fallback fired every time: `/roll` (its mandatory `skill_check_emit` nudge), `/turn` (its `[RETRY]` nudge), `responseFilter` (its `[FILTER CORRECTION]` retry), and the roll handler's `[SKILL CHECK RESULT]` reaction turn. `/roll` therefore never produced a skill check, and `/turn`/filter-correction/reaction turns never narrated. Confirmed empirically: the same `/roll` nudge content returns empty as a trailing `system` message and a correct `skill_check_emit` tool call as a trailing `user` message. Fixed by extracting `src/harness/llmMessages.ts` `toApiMessages()` which coerces ONLY the last `system` message to `user` for the API payload — the stored history keeps role `system`, so the role-based guards in `messageRouter` (e.g. the `[FILTER CORRECTION]` retry-loop guard at `lastMsg?.role === 'system'`) still match. After the fix, `/roll` emits `skill_check_emit` on the first attempt (DC 10, embed posted) and `/turn` gets a real response. 404 unit tests still pass. The long-encounter suite (AC5) was passing despite this because its driver turns always end on a `user` message; the silent reaction turns were masked by the history-growth poll.
|
||||
|
||||
**AC6 — `/encounters` command (4 tests):** all PASS live (475ms) against real GraphMCP only — no Discord login, no Redis, so parallel-safe with the token-gated suites. Seeds a run-tagged encounter via `log_encounter`, polls `list_encounters` for read-after-write, then drives the registered handlers with fake interactions: `execute()` renders a select menu whose option values contain the seeded id (C1); `handleEncounterSelect` renders a details embed whose `toJSON()` title + participants field carry the GraphMCP `get_encounter` data, cross-checked against a direct `getEncounter` call (C2); `handleSearchModalSubmit` renders a matches menu containing the seeded id when searching by participant `Miriam`, cross-checked against a direct `searchEncounters` call (C3); an empty query is rejected in-world with no menu rendered (C4).
|
||||
|
||||
**AC7 — `/encounter generate` (2 tests):** all PASS live (18.70s) against the full stack. G1: `execute()` fans out `list_encounters` + `semantic_search` against live GraphMCP (non-fatal), calls the real LLM (ollama-cloud, 3887 tokens / 14.9s), writes `specs/<gen-slug>.yaml`, and the spec loads via `loadSpec` keeping the `gen-` prefix with ≥1 NPC + ≥1 primary goal (tools, if any, all registered). G2: `/encounter start <gen-slug>` creates a real thread + persisted `SessionState` with a non-empty opening narrative — exercising the downstream `resolveRandomizables` + `queryAsNPC` GraphMCP paths the generated spec feeds. afterAll removes the `gen-` spec + thread + session.
|
||||
|
||||
**Bug surfaced + fixed during AC7 live validation:**
|
||||
- `src/bot/commands/encounter.ts` `handleGenerate` — `parsed.encounterId` was cast `as string | undefined` and then `rawId.startsWith('gen-')` was called on it. YAML parses an unquoted `encounterId: 123` as a **number**, so `rawId.startsWith` would throw `TypeError` mid-flight — after the LLM call but before the file write / reply — silently killing `/encounter generate` with no user-facing error. Fixed: coerce defensively (`typeof rawIdField === 'string' && length > 0 ? rawIdField : generated-${Date.now()}`) so a non-string / empty encounterId falls back to a generated id instead of crashing. This is a plausible source of the user-reported intermittent "still issues with the encounter generate command"; the happy path (string encounterId) was already green end-to-end (generate → write → validate → start) via AC7.
|
||||
|
||||
---
|
||||
|
||||
## Notes
|
||||
|
||||
239
tests/integration/graphmcp/encounter-thread-commands.test.ts
Normal file
239
tests/integration/graphmcp/encounter-thread-commands.test.ts
Normal file
@@ -0,0 +1,239 @@
|
||||
// AC9 — encounter-thread slash commands /roll, /turn, /xp award (live, full stack).
|
||||
//
|
||||
// These three commands only function inside an active encounter thread and route
|
||||
// through the real scheduler (scheduleEncounterLLMTurn) + LLM, so they were not
|
||||
// covered by the registry (AC8) or GraphMCP (AC6/AC7) suites:
|
||||
// - /roll → appends the action + a mandatory skill_check_emit nudge, schedules
|
||||
// a turn; the LLM must emit skill_check_emit → pendingSkillCheck set.
|
||||
// - /turn → appends a [RETRY] nudge and schedules a turn; history grows ≥1.
|
||||
// - /xp award → reads session.players + spec.xpReward; awardXP posts to the thread.
|
||||
//
|
||||
// Each command is driven in its OWN freshly-started encounter thread so LLM
|
||||
// non-determinism in one (a skill check, an early resolution) can't contaminate
|
||||
// the assertions of another. Hybrid slash-command pattern: fake interaction
|
||||
// backed by the REAL thread object from the live client.
|
||||
//
|
||||
// Gate: RUN_FULL_E2E=1. Requires the full live stack (Discord + LLM + Redis +
|
||||
// GraphMCP for /encounter start). Serialized — one live E2E at a time. Skipped by
|
||||
// default → CI-safe.
|
||||
|
||||
import './support/env.js';
|
||||
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
||||
import { execute as encounterExecute } from '../../../src/bot/commands/encounter.js';
|
||||
import { execute as rollExecute } from '../../../src/bot/commands/roll.js';
|
||||
import { execute as turnExecute } from '../../../src/bot/commands/turn.js';
|
||||
import { execute as xpExecute } from '../../../src/bot/commands/xp.js';
|
||||
import { sessionManager } from '../../../src/session/sessionManager.js';
|
||||
import { scheduleEncounterLLMTurn } from '../../../src/bot/handlers/messageRouter.js';
|
||||
import { playerRegistry } from '../../../src/session/playerRegistry.js';
|
||||
import { runId } from './support/factories.js';
|
||||
import { connectLiveBots, disconnectLiveBots, type LiveBots } from './support/liveBots.js';
|
||||
import { fakeInteraction, parseThreadIdFromReply } from './support/fakes.js';
|
||||
import { flushRedisForGuild, disconnectRedis, deleteThread, deleteSession } from './support/cleanup.js';
|
||||
import { waitFor } from './support/poll.js';
|
||||
import type { ThreadChannel } from 'discord.js';
|
||||
|
||||
const runE2E = process.env.RUN_FULL_E2E === '1';
|
||||
const SPEC = process.env.E2E_SPEC ?? 'market-thief';
|
||||
|
||||
describe.skipIf(!runE2E)('AC9 — /roll, /turn, /xp award inside a live encounter thread', () => {
|
||||
let bots: LiveBots;
|
||||
const run = runId();
|
||||
const driverId = process.env.E2E_DRIVER_USER_ID ?? 'e2e-driver-user';
|
||||
const driverName = `E2E Driver ${run}`;
|
||||
// /xp allowlist: if DISCORD_ALLOWED_USERS is set, the caller must be in it.
|
||||
const allowed = (process.env.DISCORD_ALLOWED_USERS ?? '').split(',').map(s => s.trim()).filter(Boolean);
|
||||
const dmId = allowed[0] ?? driverId;
|
||||
const threadIds: string[] = [];
|
||||
|
||||
beforeAll(async () => {
|
||||
bots = await connectLiveBots();
|
||||
await flushRedisForGuild(bots.guild.id);
|
||||
// /roll requires a registered player name.
|
||||
await playerRegistry.set(bots.guild.id, driverId, driverName);
|
||||
}, 120_000);
|
||||
|
||||
afterAll(async () => {
|
||||
try {
|
||||
for (const id of threadIds) {
|
||||
await deleteThread(bots.channel, id);
|
||||
await deleteSession(id);
|
||||
}
|
||||
await playerRegistry.delete(bots.guild.id, driverId).catch(() => null);
|
||||
} finally {
|
||||
await disconnectRedis();
|
||||
await disconnectLiveBots(bots);
|
||||
}
|
||||
}, 120_000);
|
||||
|
||||
// Start a fresh encounter thread and return the real thread + session baseline.
|
||||
async function startEncounter(): Promise<{ threadId: string; thread: ThreadChannel; baseline: number }> {
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'start',
|
||||
stringOptions: { spec: SPEC },
|
||||
channel: bots.channel,
|
||||
guildId: bots.guild.id,
|
||||
userId: driverId,
|
||||
username: driverName,
|
||||
});
|
||||
await encounterExecute(interaction);
|
||||
const threadId = parseThreadIdFromReply(lastText());
|
||||
expect(threadId, 'encounter must start').toBeTruthy();
|
||||
threadIds.push(threadId!);
|
||||
const thread = await bots.channel.threads.fetch(threadId!);
|
||||
const session = await waitFor(() => sessionManager.get(threadId!).then(s => s ?? null), { timeoutMs: 30_000 });
|
||||
return { threadId: threadId!, thread, baseline: session!.history.length };
|
||||
}
|
||||
|
||||
// ── /turn ─────────────────────────────────────────────────────────────────
|
||||
it('R1 /turn nudges the narrator: editReply + history grows via a real LLM turn', async () => {
|
||||
const { threadId, thread, baseline } = await startEncounter();
|
||||
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'turn', // unused by /turn, but harmless
|
||||
channel: thread,
|
||||
guildId: bots.guild.id,
|
||||
userId: driverId,
|
||||
username: driverName,
|
||||
});
|
||||
await turnExecute(interaction, bots.botClient);
|
||||
|
||||
expect(lastText(), '/turn must acknowledge ephemerally').toMatch(/Nudging the narrator/i);
|
||||
|
||||
// The nudge schedules a turn; runLLMTurn always grows history by ≥1.
|
||||
const grown = await waitFor(
|
||||
async () => {
|
||||
const s = await sessionManager.get(threadId);
|
||||
return s && s.history.length > baseline ? s : null;
|
||||
},
|
||||
{ timeoutMs: 120_000, intervalMs: 3_000 },
|
||||
);
|
||||
expect(grown.history.length, 'a narrator turn must be appended').toBeGreaterThan(baseline);
|
||||
}, 180_000);
|
||||
|
||||
// ── /roll ─────────────────────────────────────────────────────────────────
|
||||
it('R2 /roll submits the action, appends it + the mandatory tool-call nudge, and forces a skill check', async () => {
|
||||
const { threadId, thread, baseline } = await startEncounter();
|
||||
|
||||
const action = 'I scan the festival crowd for the hooded thief — perception check.';
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'roll',
|
||||
stringOptions: { action },
|
||||
channel: thread,
|
||||
guildId: bots.guild.id,
|
||||
userId: driverId,
|
||||
username: driverName,
|
||||
});
|
||||
await rollExecute(interaction, bots.botClient);
|
||||
|
||||
expect(lastText(), '/roll must acknowledge ephemerally').toMatch(/Action submitted/i);
|
||||
|
||||
// /roll appends the user action + the mandatory skill_check_emit nudge
|
||||
// synchronously, before the turn is scheduled.
|
||||
const afterSubmit = await sessionManager.get(threadId);
|
||||
expect(afterSubmit!.history.length, 'action + nudge must be in history').toBeGreaterThanOrEqual(baseline + 2);
|
||||
|
||||
// The forced nudge tells the LLM to emit skill_check_emit and NOTHING ELSE;
|
||||
// poll for the pending check to appear. The model occasionally goes silent on
|
||||
// the strict "output NOTHING ELSE" nudge (returns an empty response → the
|
||||
// always-grow [NO RESPONSE] fallback). That's a transient LLM-compliance
|
||||
// wobble, not a /roll logic fault — so re-schedule the turn (mirroring a
|
||||
// player re-roll) up to two more times before declaring failure.
|
||||
let checked;
|
||||
for (let attempt = 0; attempt < 3 && !checked; attempt++) {
|
||||
if (attempt > 0) scheduleEncounterLLMTurn(threadId, thread, bots.botClient, true);
|
||||
try {
|
||||
checked = await waitFor(
|
||||
async () => {
|
||||
const s = await sessionManager.get(threadId);
|
||||
return s?.pendingSkillCheck ? s : null;
|
||||
},
|
||||
{ timeoutMs: 70_000, intervalMs: 3_000 },
|
||||
);
|
||||
} catch {
|
||||
// No pending check this attempt; loop re-schedules unless this was the last.
|
||||
}
|
||||
}
|
||||
|
||||
if (!checked) {
|
||||
// Diagnostic: dump what the LLM actually produced so a non-compliant
|
||||
// response is visible, not just a bare timeout.
|
||||
const s = await sessionManager.get(threadId);
|
||||
const tail = s?.history.slice(-5).map(m => `[${m.role}] ${typeof m.content === 'string' ? m.content.slice(0, 200) : String(m.content)}`) ?? [];
|
||||
throw new Error(
|
||||
`[/roll] LLM did not emit skill_check_emit after 3 attempts.\n` +
|
||||
`pendingSkillCheck=${JSON.stringify(s?.pendingSkillCheck)}\nphase=${s?.phase}\n` +
|
||||
`history tail:\n${tail.join('\n')}`,
|
||||
);
|
||||
}
|
||||
expect(checked.pendingSkillCheck, '/roll must result in a pending skill check').toBeTruthy();
|
||||
expect(checked.pendingSkillCheck!.player, 'pending check must name the player').toBeTruthy();
|
||||
expect(typeof checked.pendingSkillCheck!.dc, 'pending check must set a DC').toBe('number');
|
||||
}, 240_000);
|
||||
|
||||
// ── /xp award — no players branch ─────────────────────────────────────────
|
||||
it('R3 /xp award with no players joined replies that there is nothing to award', async () => {
|
||||
const { threadId, thread } = await startEncounter();
|
||||
// A freshly-started encounter has no players yet.
|
||||
const before = await sessionManager.get(threadId);
|
||||
expect(Object.keys(before!.players).length, 'sanity: fresh encounter has no players').toBe(0);
|
||||
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'award',
|
||||
channel: thread,
|
||||
guildId: bots.guild.id,
|
||||
userId: dmId,
|
||||
username: 'E2E DM',
|
||||
});
|
||||
await xpExecute(interaction);
|
||||
|
||||
expect(lastText(), 'must report no players to award').toMatch(/No players joined this encounter/i);
|
||||
}, 90_000);
|
||||
|
||||
// ── /xp award — no amount + spec has no default branch ────────────────────
|
||||
it('R4 /xp award with no amount and no spec default prompts for an amount', async () => {
|
||||
const { threadId, thread } = await startEncounter();
|
||||
// The no-players guard fires before the amount check, so populate a player
|
||||
// to reach the no-amount branch. market-thief has no xpReward default.
|
||||
await sessionManager.update(threadId, {
|
||||
players: { [driverId]: { discordId: driverId, dndName: driverName } },
|
||||
});
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'award',
|
||||
channel: thread,
|
||||
guildId: bots.guild.id,
|
||||
userId: dmId,
|
||||
username: 'E2E DM',
|
||||
});
|
||||
await xpExecute(interaction);
|
||||
|
||||
expect(lastText(), 'must ask for an amount when none and no spec default').toMatch(/No XP amount specified/i);
|
||||
}, 90_000);
|
||||
|
||||
// ── /xp award — with a player (no Foundry link) + amount ──────────────────
|
||||
it('R5 /xp award with a joined player awards (skipping the unlinked player) and confirms', async () => {
|
||||
const { threadId, thread } = await startEncounter();
|
||||
// Populate session.players with the driver (no Foundry link → awardXP skips,
|
||||
// posts a "skipped: no Foundry character linked" line, and the command confirms).
|
||||
await sessionManager.update(threadId, {
|
||||
players: { [driverId]: { discordId: driverId, dndName: driverName } },
|
||||
});
|
||||
|
||||
const { interaction, lastText } = fakeInteraction({
|
||||
subcommand: 'award',
|
||||
stringOptions: {}, // getInteger('amount') is null in the fake → we pass amount via a richer options stub below
|
||||
channel: thread,
|
||||
guildId: bots.guild.id,
|
||||
userId: dmId,
|
||||
username: 'E2E DM',
|
||||
});
|
||||
// fakeInteraction.getInteger always returns null; /xp reads getInteger('amount').
|
||||
// Patch the options to return 50 so we exercise the with-amount path.
|
||||
(interaction.options as unknown as { getInteger: (n: string) => number | null }).getInteger = (name: string) =>
|
||||
name === 'amount' ? 50 : null;
|
||||
|
||||
await xpExecute(interaction);
|
||||
|
||||
expect(lastText(), '/xp must confirm the award').toMatch(/XP awarded/i);
|
||||
}, 90_000);
|
||||
});
|
||||
224
tests/integration/graphmcp/encounters-command.test.ts
Normal file
224
tests/integration/graphmcp/encounters-command.test.ts
Normal file
@@ -0,0 +1,224 @@
|
||||
// AC6 — /encounters slash command hits GraphMCP for real data (live).
|
||||
//
|
||||
// The /encounters command surface has three GraphMCP read paths, none of which
|
||||
// were previously exercised end-to-end through the registered command handlers:
|
||||
// - execute() → list_encounters → renders a select menu of past encounters
|
||||
// - handleEncounterSelect → get_encounter → renders a details embed for the chosen id
|
||||
// - handleSearchModalSubmit → search_encounters → renders a select menu of matches
|
||||
//
|
||||
// This test seeds a uniquely run-tagged encounter via log_encounter, waits for
|
||||
// read-after-write consistency, then drives each handler with a FAKE
|
||||
// StringSelectMenu / ModalSubmit / ChatInputCommand interaction whose captured
|
||||
// editReply payload we inspect. The fakes carry no real Discord channel — these
|
||||
// handlers only call GraphMCP + interaction.editReply, so no gateway login is
|
||||
// required. That makes this suite GraphMCP-only: fast, and parallel-safe with
|
||||
// the Discord-token-gated suites (AC2/AC5) — it never logs in with DISCORD_TOKEN.
|
||||
//
|
||||
// Gate: RUN_FULL_E2E=1. Requires GraphMCP up (GRAPHMCP_URL). Skipped by default → CI-safe.
|
||||
|
||||
import './support/env.js';
|
||||
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
||||
import {
|
||||
execute,
|
||||
handleEncounterSelect,
|
||||
handleSearchModalSubmit,
|
||||
} from '../../../src/bot/commands/encounters.js';
|
||||
import {
|
||||
logEncounter,
|
||||
listEncounters,
|
||||
getEncounter,
|
||||
searchEncounters,
|
||||
type EncounterResultItem,
|
||||
} from '../../../src/graphmcp/client.js';
|
||||
import { runId, buildEncounterLog } from './support/factories.js';
|
||||
import { waitFor } from './support/poll.js';
|
||||
import type {
|
||||
ChatInputCommandInteraction,
|
||||
StringSelectMenuInteraction,
|
||||
ModalSubmitInteraction,
|
||||
} from 'discord.js';
|
||||
|
||||
const runE2E = process.env.RUN_FULL_E2E === '1';
|
||||
|
||||
// ── Fakes ────────────────────────────────────────────────────────────────────
|
||||
// These handlers read only `interaction.values` / `interaction.fields` and call
|
||||
// deferReply/editReply, so a minimal capture-only fake suffices. No real channel
|
||||
// is needed (unlike /encounter start, which creates a real thread).
|
||||
|
||||
interface Captured {
|
||||
content?: string;
|
||||
embeds?: unknown[];
|
||||
components?: unknown[];
|
||||
}
|
||||
|
||||
function fakeChatInput(): { interaction: ChatInputCommandInteraction; edits: Captured[] } {
|
||||
const edits: Captured[] = [];
|
||||
const interaction = {
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async reply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
} as unknown as ChatInputCommandInteraction;
|
||||
return { interaction, edits };
|
||||
}
|
||||
|
||||
function fakeSelectMenu(values: string[]): { interaction: StringSelectMenuInteraction; edits: Captured[] } {
|
||||
const edits: Captured[] = [];
|
||||
const interaction = {
|
||||
values,
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async reply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
} as unknown as StringSelectMenuInteraction;
|
||||
return { interaction, edits };
|
||||
}
|
||||
|
||||
function fakeModalSubmit(fields: Record<string, string>): { interaction: ModalSubmitInteraction; edits: Captured[] } {
|
||||
const edits: Captured[] = [];
|
||||
const interaction = {
|
||||
fields: {
|
||||
getTextInputValue: (name: string) => fields[name] ?? '',
|
||||
},
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async reply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
} as unknown as ModalSubmitInteraction;
|
||||
return { interaction, edits };
|
||||
}
|
||||
|
||||
// A StringSelectMenu's option values live at
|
||||
// `components[0].toJSON().components[0].options[].value` (ActionRowBuilder →
|
||||
// select menu JSON). Reading via toJSON() keeps us off fragile builder internals.
|
||||
function selectOptionValues(payload: Captured): string[] {
|
||||
const row = payload.components?.[0] as { toJSON?: () => { components: Array<{ options?: Array<{ value: string }> }> } } | undefined;
|
||||
if (!row?.toJSON) return [];
|
||||
const json = row.toJSON();
|
||||
const select = json.components?.[0];
|
||||
return select?.options?.map(o => o.value) ?? [];
|
||||
}
|
||||
|
||||
describe.skipIf(!runE2E)('AC6 — /encounters command reads GraphMCP for real (live)', () => {
|
||||
const run = runId();
|
||||
const seed = buildEncounterLog(run, {
|
||||
title: 'The Standoff at Silt-Dock',
|
||||
participants: 'E2E Driver, Miriam, Dal',
|
||||
summary: `Automated suite run ${run}: a thief was cornered at the harbour stalls.`,
|
||||
location: 'Mardonar — Silt-Dock',
|
||||
type: 'encounter',
|
||||
});
|
||||
let seeded: EncounterResultItem | null = null;
|
||||
|
||||
beforeAll(async () => {
|
||||
// Seed the encounter, then poll list_encounters until read-after-write
|
||||
// makes it visible. We locate it by the run id in the title (unique per run).
|
||||
await logEncounter(seed);
|
||||
seeded = await waitFor(
|
||||
async () => {
|
||||
const list = await listEncounters(100);
|
||||
return list.find(e => typeof e.title === 'string' && e.title.includes(run)) ?? null;
|
||||
},
|
||||
{ timeoutMs: 45_000, intervalMs: 2_000 },
|
||||
);
|
||||
}, 90_000);
|
||||
|
||||
// GraphMCP exposes no delete tool, so the seeded [E2E] record is left in place
|
||||
// (identifiable by the run id) — consistent with the rest of the live suite.
|
||||
afterAll(() => { /* no-op: see cleanup.ts GRAPHMCP_CLEANUP_LIMITATION */ });
|
||||
|
||||
// C1 — /encounters list renders a select menu populated from list_encounters
|
||||
it('C1 list calls list_encounters and renders a select menu containing the seeded encounter', async () => {
|
||||
expect(seeded, 'beforeAll must have seeded a findable encounter').toBeTruthy();
|
||||
|
||||
const { interaction, edits } = fakeChatInput();
|
||||
await execute(interaction, undefined as never);
|
||||
|
||||
expect(edits.length, 'execute must editReply with the chronicle browser').toBeGreaterThanOrEqual(1);
|
||||
const payload = edits[0];
|
||||
// The command only builds the select menu when list is non-empty.
|
||||
expect(payload.components, 'a select-menu row must be rendered').toBeTruthy();
|
||||
const values = selectOptionValues(payload);
|
||||
expect(
|
||||
values,
|
||||
`the seeded encounter id must appear as a select option (got: ${values.slice(0, 5).join(', ')}…)`,
|
||||
).toContain(seeded!.id);
|
||||
}, 60_000);
|
||||
|
||||
// C2 — selecting an encounter calls get_encounter and renders a details embed
|
||||
it('C2 handleEncounterSelect calls get_encounter and renders an embed with the GraphMCP details', async () => {
|
||||
expect(seeded).toBeTruthy();
|
||||
|
||||
const { interaction, edits } = fakeSelectMenu([seeded!.id]);
|
||||
await handleEncounterSelect(interaction);
|
||||
|
||||
expect(edits.length, 'select handler must editReply with a details embed').toBe(1);
|
||||
const embed = edits[0].embeds?.[0] as { toJSON?: () => { title?: string; fields?: Array<{ name: string; value: string }> } } | undefined;
|
||||
expect(embed, 'an embed must be rendered').toBeTruthy();
|
||||
const json = embed!.toJSON();
|
||||
expect(json.title, 'embed title must come from get_encounter').toContain(seed.title);
|
||||
// Participants are rendered as a field; verify GraphMCP data reached the embed.
|
||||
const participantsField = json.fields?.find(f => f.name.toLowerCase().includes('witness') || f.name.toLowerCase().includes('participant'));
|
||||
expect(participantsField, 'a participants field must be present').toBeTruthy();
|
||||
expect(participantsField!.value, 'participants field must include a seeded participant').toContain('Miriam');
|
||||
|
||||
// Cross-check: the same id fetched directly must match what the handler rendered.
|
||||
const direct = await getEncounter(seeded!.id);
|
||||
expect(direct.title).toBe(seed.title);
|
||||
}, 60_000);
|
||||
|
||||
// C3 — the search modal calls search_encounters and renders matches
|
||||
it('C3 handleSearchModalSubmit calls search_encounters and surfaces the seeded encounter by participant', async () => {
|
||||
expect(seeded).toBeTruthy();
|
||||
|
||||
const { interaction, edits } = fakeModalSubmit({
|
||||
search_query: '',
|
||||
search_location: '',
|
||||
search_participant: 'Miriam',
|
||||
});
|
||||
await handleSearchModalSubmit(interaction);
|
||||
|
||||
expect(edits.length, 'search handler must editReply with results').toBe(1);
|
||||
// The "no results" branch replies with a plain string content; the
|
||||
// success branch renders a select menu. We seeded data, so expect a menu.
|
||||
expect(edits[0].components, 'a results select-menu must be rendered').toBeTruthy();
|
||||
const values = selectOptionValues(edits[0]);
|
||||
expect(
|
||||
values,
|
||||
`search by participant "Miriam" must include the seeded encounter (got: ${values.slice(0, 5).join(', ')}…)`,
|
||||
).toContain(seeded!.id);
|
||||
|
||||
// Cross-check via the same search call the command makes.
|
||||
const direct = await searchEncounters({ participant: 'Miriam', limit: 20 });
|
||||
expect(direct.some(e => e.id === seeded!.id), 'direct search_encounters must also return the seeded id').toBe(true);
|
||||
}, 60_000);
|
||||
|
||||
// C4 — empty search criteria is rejected in-world (no GraphMCP hit needed)
|
||||
it('C4 handleSearchModalSubmit rejects an empty query without calling search_encounters', async () => {
|
||||
const { interaction, edits } = fakeModalSubmit({
|
||||
search_query: '',
|
||||
search_location: '',
|
||||
search_participant: '',
|
||||
});
|
||||
await handleSearchModalSubmit(interaction);
|
||||
|
||||
expect(edits.length).toBe(1);
|
||||
expect(edits[0].content, 'must warn that a filter is required').toMatch(/at least one filter/i);
|
||||
expect(edits[0].components, 'no select menu when no criteria given').toBeFalsy();
|
||||
}, 30_000);
|
||||
});
|
||||
194
tests/integration/graphmcp/foundry-command.test.ts
Normal file
194
tests/integration/graphmcp/foundry-command.test.ts
Normal file
@@ -0,0 +1,194 @@
|
||||
// AC10 — Foundry-VTT-dependent slash command paths (live, gated).
|
||||
//
|
||||
// Covers the slash-command paths that reach the Foundry VTT relay — the one
|
||||
// data surface not exercised by the GraphMCP (AC6/AC7), registry (AC8), or
|
||||
// encounter-thread (AC9) suites:
|
||||
// - /character register foundry (modal) → searchActors + filterPlayerActors → link
|
||||
// - /character view (with a link) → getActorDetails + getActorInventory + getActorSpells
|
||||
// - /actions (with a link) → getActorInventory + getActorSpells
|
||||
// - /character admin give (modal) → searchActors + giveItem (MUTATING)
|
||||
// - awardXP with a Foundry-linked player → modifyExperience (MUTATING)
|
||||
//
|
||||
// GATED separately from RUN_FULL_E2E: the Foundry relay (config.VTT_RELAY_URL)
|
||||
// points at a REMOTE PRODUCTION instance by default, and two of these paths
|
||||
// mutate real characters (giveItem adds an item, modifyExperience awards XP).
|
||||
// Running them blind against production is unsafe, so this suite only activates
|
||||
// under RUN_FOUNDRY_LIVE=1 AND with E2E_FOUNDRY_CHARACTER_NAME naming a
|
||||
// sacrificial test PC in that world. Without those, it skips — CI-safe, and
|
||||
// parallel-safe with the token-gated suites (no Discord login here: registry is
|
||||
// Redis, Foundry is the relay).
|
||||
//
|
||||
// Gate: RUN_FULL_E2E=1 && RUN_FOUNDRY_LIVE=1 && E2E_FOUNDRY_CHARACTER_NAME.
|
||||
// Requires Redis + a reachable Foundry relay. Skipped by default.
|
||||
|
||||
import './support/env.js';
|
||||
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
||||
import {
|
||||
execute as characterExecute,
|
||||
handleFoundryLinkModal,
|
||||
handleGiveModal,
|
||||
} from '../../../src/bot/commands/character.js';
|
||||
import { execute as actionsExecute } from '../../../src/bot/commands/actions.js';
|
||||
import { characterRegistry } from '../../../src/session/characterRegistry.js';
|
||||
import { awardXP } from '../../../src/session/xpAwarder.js';
|
||||
import type { SessionState } from '../../../src/types/index.js';
|
||||
import { runId } from './support/factories.js';
|
||||
|
||||
const runE2E = process.env.RUN_FULL_E2E === '1';
|
||||
const runFoundry = process.env.RUN_FOUNDRY_LIVE === '1';
|
||||
const foundryCharName = process.env.E2E_FOUNDRY_CHARACTER_NAME ?? '';
|
||||
const active = runE2E && runFoundry && foundryCharName.length > 0;
|
||||
|
||||
// ── Fakes (same shape as AC8 — these handlers need no real Discord channel) ──
|
||||
interface Captured { content?: string; embeds?: unknown[]; }
|
||||
|
||||
function fakeChatInput(o: { subcommand?: string; group?: string | null; guildId: string; userId?: string; username?: string }) {
|
||||
const replies: Captured[] = [];
|
||||
const edits: Captured[] = [];
|
||||
const modals: unknown[] = [];
|
||||
const interaction = {
|
||||
guildId: o.guildId,
|
||||
guild: { name: 'E2E Guild' },
|
||||
user: { id: o.userId ?? 'e2e-foundry-user', username: o.username ?? 'E2E Foundry', bot: false, send: async () => null },
|
||||
options: {
|
||||
getSubcommand: () => o.subcommand ?? '',
|
||||
getSubcommandGroup: () => o.group ?? null,
|
||||
getString: () => null,
|
||||
getUser: () => null,
|
||||
getInteger: () => null,
|
||||
getBoolean: () => null,
|
||||
},
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(p: string | Captured) { edits.push(typeof p === 'string' ? { content: p } : p); return {}; },
|
||||
async reply(p: string | Captured) { replies.push(typeof p === 'string' ? { content: p } : p); return {}; },
|
||||
async followUp() { return {}; },
|
||||
async showModal(m: unknown) { modals.push(m); return {}; },
|
||||
} as unknown as Parameters<typeof characterExecute>[0];
|
||||
return { interaction, replies, edits, modals, lastText: () => (edits.at(-1) ?? replies.at(-1))?.content };
|
||||
}
|
||||
|
||||
function fakeModalSubmit(o: { fields: Record<string, string>; guildId: string; userId?: string; username?: string }) {
|
||||
const replies: Captured[] = [];
|
||||
const edits: Captured[] = [];
|
||||
const interaction = {
|
||||
guildId: o.guildId,
|
||||
guild: { name: 'E2E Guild' },
|
||||
user: { id: o.userId ?? 'e2e-foundry-user', username: o.username ?? 'E2E Foundry', bot: false, send: async () => null },
|
||||
fields: { getTextInputValue: (name: string) => o.fields[name] ?? '' },
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(p: string | Captured) { edits.push(typeof p === 'string' ? { content: p } : p); return {}; },
|
||||
async reply(p: string | Captured) { replies.push(typeof p === 'string' ? { content: p } : p); return {}; },
|
||||
} as unknown as Parameters<typeof handleFoundryLinkModal>[0];
|
||||
return { interaction, replies, edits, lastText: () => (edits.at(-1) ?? replies.at(-1))?.content };
|
||||
}
|
||||
|
||||
function embedJson(payload: Captured): { title?: string; description?: string } {
|
||||
const e = payload.embeds?.[0] as { toJSON?: () => { title?: string; description?: string } } | undefined;
|
||||
return e?.toJSON?.() ?? {};
|
||||
}
|
||||
|
||||
describe.skipIf(!active)(`AC10 — Foundry-dependent slash commands (live, sacrificial PC "${foundryCharName}")`, () => {
|
||||
const run = runId();
|
||||
const guildId = `e2e-foundry-${run}`;
|
||||
const userId = 'e2e-foundry-user';
|
||||
|
||||
beforeAll(async () => {
|
||||
const { redis } = await import('../../../src/db/redis.js');
|
||||
await redis.del(`characters:${guildId}`).catch(() => null);
|
||||
}, 30_000);
|
||||
|
||||
afterAll(async () => {
|
||||
const { redis } = await import('../../../src/db/redis.js');
|
||||
await redis.del(`characters:${guildId}`).catch(() => null);
|
||||
redis.disconnect();
|
||||
}, 30_000);
|
||||
|
||||
// F1 — register foundry modal → searchActors → registry link
|
||||
it('F1 /character register foundry modal links the actor (searchActors → characterRegistry)', async () => {
|
||||
// execute() opens the modal.
|
||||
const reg = fakeChatInput({ group: 'register', subcommand: 'foundry', guildId, userId });
|
||||
await characterExecute(reg.interaction);
|
||||
expect(reg.modals.length, 'register foundry must showModal').toBe(1);
|
||||
|
||||
// Drive the modal-submit handler the bot routes to on submit.
|
||||
const modal = fakeModalSubmit({ fields: { foundry_character_name: foundryCharName }, guildId, userId });
|
||||
await handleFoundryLinkModal(modal.interaction);
|
||||
expect(modal.lastText(), 'must confirm the link').toMatch(/Linked to/i);
|
||||
|
||||
const profile = await characterRegistry.get(guildId, userId);
|
||||
expect(profile, 'profile must be persisted').toBeTruthy();
|
||||
expect(profile!.source, 'source must be foundry').toBe('foundry');
|
||||
expect(profile!.foundryActorUuid, 'a Foundry actor uuid must be stored').toBeTruthy();
|
||||
}, 60_000);
|
||||
|
||||
// F2 — /character view with a link fetches live Foundry stats
|
||||
it('F2 /character view renders live actor details from Foundry', async () => {
|
||||
// Depends on F1 having linked the actor. Re-link if not present (test order
|
||||
// is not guaranteed across files, but within this describe it()s run in order).
|
||||
let profile = await characterRegistry.get(guildId, userId);
|
||||
if (!profile?.foundryActorUuid) {
|
||||
const modal = fakeModalSubmit({ fields: { foundry_character_name: foundryCharName }, guildId, userId });
|
||||
await handleFoundryLinkModal(modal.interaction);
|
||||
profile = await characterRegistry.get(guildId, userId);
|
||||
}
|
||||
expect(profile?.foundryActorUuid, 'F2 needs a linked actor').toBeTruthy();
|
||||
|
||||
const view = fakeChatInput({ subcommand: 'view', guildId, userId });
|
||||
await characterExecute(view.interaction);
|
||||
expect(view.edits.length, 'view must editReply').toBeGreaterThanOrEqual(1);
|
||||
const json = embedJson(view.edits[0]);
|
||||
expect(json.title, 'view embed title must carry the character name').toBeTruthy();
|
||||
expect(view.lastText(), 'must not hit the relay-error branch').not.toMatch(/Could not fetch character data/i);
|
||||
}, 60_000);
|
||||
|
||||
// F3 — /actions with a link fetches inventory + spells
|
||||
it('F3 /actions renders the linked actor\'s weapons/spells from Foundry', async () => {
|
||||
const profile = await characterRegistry.get(guildId, userId);
|
||||
expect(profile?.foundryActorUuid, 'F3 needs a linked actor').toBeTruthy();
|
||||
|
||||
const act = fakeChatInput({ guildId, userId });
|
||||
await actionsExecute(act.interaction);
|
||||
expect(act.edits.length, 'actions must editReply with an embed').toBe(1);
|
||||
const json = embedJson(act.edits[0]);
|
||||
expect(json.title, 'actions embed title must be the character — Actions').toMatch(/— Actions/i);
|
||||
expect(act.lastText(), 'must not hit the relay-error branch').not.toMatch(/Could not reach Foundry VTT/i);
|
||||
}, 60_000);
|
||||
|
||||
// F4 — admin give modal → giveItem (MUTATING — only runs under the gate)
|
||||
it('F4 /character admin give modal gives an item to the sacrificial PC (giveItem)', async () => {
|
||||
const give = fakeModalSubmit({ fields: { give_character_name: foundryCharName, give_item_name: 'Potion of Healing' }, guildId, userId });
|
||||
await handleGiveModal(give.interaction);
|
||||
// A successful give confirms "given to <name>"; a relay/actor failure surfaces
|
||||
// a Could-not-reach / no-match message — assert the success path under the
|
||||
// gate (the sacrificial PC is expected to exist and be a player actor).
|
||||
expect(give.lastText(), 'must confirm the item was given').toMatch(/given to/i);
|
||||
}, 60_000);
|
||||
|
||||
// F5 — awardXP with a Foundry-linked player → modifyExperience (MUTATING)
|
||||
it('F5 awardXP awards XP to a Foundry-linked player (modifyExperience)', async () => {
|
||||
const profile = await characterRegistry.get(guildId, userId);
|
||||
expect(profile?.foundryActorUuid, 'F5 needs a linked actor').toBeTruthy();
|
||||
|
||||
const session: SessionState = {
|
||||
encounterId: `e2e-xp-${run}`,
|
||||
threadId: 'e2e-xp-thread',
|
||||
guildId,
|
||||
spec: { title: 'E2E XP', encounterId: `e2e-xp-${run}`, tone: 'tense', setting: { location: 'Mardonar' }, npcs: [], goals: { hidden: true, primary: [], secondary: [] }, skillChecks: {}, sportsmanshipRules: [] } as SessionState['spec'],
|
||||
players: { [userId]: { discordId: userId, dndName: profile!.dndName } },
|
||||
history: [],
|
||||
phase: 'active',
|
||||
heldMessages: [],
|
||||
createdAt: Date.now(),
|
||||
updatedAt: Date.now(),
|
||||
};
|
||||
// awardXP posts a summary to the thread; a fake channel.send captures it.
|
||||
const sent: string[] = [];
|
||||
const fakeThread = { send: async (c: string) => { sent.push(typeof c === 'string' ? c : String(c)); return {}; } } as never;
|
||||
|
||||
const result = await awardXP(session, 25, fakeThread);
|
||||
|
||||
expect(result.awarded.length, 'the linked player must be awarded, not skipped').toBeGreaterThanOrEqual(1);
|
||||
expect(result.awarded[0].dndName, 'awarded name must be the linked character').toBe(profile!.dndName);
|
||||
expect(sent.join('\n'), 'the thread summary must announce +25 XP').toMatch(/\+25 XP awarded/i);
|
||||
}, 60_000);
|
||||
});
|
||||
265
tests/integration/graphmcp/registries-command.test.ts
Normal file
265
tests/integration/graphmcp/registries-command.test.ts
Normal file
@@ -0,0 +1,265 @@
|
||||
// AC8 — registry-backed slash commands hit real Redis for data (live).
|
||||
//
|
||||
// Covers the slash commands whose data path is the player/character registry in
|
||||
// Redis — not GraphMCP, not Foundry, not the LLM — so they were never exercised
|
||||
// by the GraphMCP live suite:
|
||||
// - /dndname set | show | clear → playerRegistry (Redis)
|
||||
// - /character register custom (modal) → characterRegistry (Redis)
|
||||
// - /character show | clear → characterRegistry (Redis)
|
||||
// - /character admin list | remove → characterRegistry (Redis)
|
||||
// - /actions (no Foundry link) → characterRegistry read, Foundry-not-reached branch
|
||||
// - /character view (no profile / no link) → characterRegistry read, pre-Foundry branches
|
||||
//
|
||||
// These run against REAL Redis only — no Discord gateway login, no Foundry relay,
|
||||
// no LLM — so they are fast and parallel-safe with the token-gated suites. Each
|
||||
// command's registered handler is driven with a fake interaction; round-trips are
|
||||
// asserted by reading the same registry the command writes through.
|
||||
//
|
||||
// Gate: RUN_FULL_E2E=1. Requires Redis up (REDIS_URL). Skipped by default → CI-safe.
|
||||
|
||||
import './support/env.js';
|
||||
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
|
||||
import { execute as dndnameExecute } from '../../../src/bot/commands/dndname.js';
|
||||
import { execute as characterExecute, handleCustomRegisterModal } from '../../../src/bot/commands/character.js';
|
||||
import { execute as actionsExecute } from '../../../src/bot/commands/actions.js';
|
||||
import { characterRegistry } from '../../../src/session/characterRegistry.js';
|
||||
import { playerRegistry } from '../../../src/session/playerRegistry.js';
|
||||
import { runId } from './support/factories.js';
|
||||
|
||||
const runE2E = process.env.RUN_FULL_E2E === '1';
|
||||
|
||||
// ── Fakes ────────────────────────────────────────────────────────────────────
|
||||
// These handlers read only interaction.options.* + interaction.user/guildId and
|
||||
// call reply/editReply/showModal — no real Discord channel is needed.
|
||||
|
||||
interface Captured { content?: string; embeds?: unknown[]; components?: unknown[]; }
|
||||
|
||||
interface ChatInputOpts {
|
||||
subcommand?: string;
|
||||
group?: string | null;
|
||||
strings?: Record<string, string>;
|
||||
users?: Record<string, { id: string; username: string }>;
|
||||
guildId: string;
|
||||
userId?: string;
|
||||
username?: string;
|
||||
}
|
||||
|
||||
function fakeChatInput(o: ChatInputOpts) {
|
||||
const replies: Captured[] = [];
|
||||
const edits: Captured[] = [];
|
||||
const modals: unknown[] = [];
|
||||
const user = {
|
||||
id: o.userId ?? 'e2e-user',
|
||||
username: o.username ?? 'E2E User',
|
||||
bot: false,
|
||||
send: async () => null,
|
||||
};
|
||||
const interaction = {
|
||||
guildId: o.guildId,
|
||||
guild: { name: 'E2E Guild' },
|
||||
channelId: 'e2e-channel',
|
||||
channel: undefined,
|
||||
user,
|
||||
options: {
|
||||
getSubcommand: () => o.subcommand ?? '',
|
||||
getSubcommandGroup: () => o.group ?? null,
|
||||
getString: (name: string) => o.strings?.[name] ?? null,
|
||||
getUser: (name: string) => o.users?.[name] ?? null,
|
||||
getInteger: () => null,
|
||||
getBoolean: () => null,
|
||||
},
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async reply(payload: string | Captured) {
|
||||
replies.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async followUp() { return {}; },
|
||||
async showModal(modal: unknown) { modals.push(modal); return {}; },
|
||||
} as unknown as Parameters<typeof characterExecute>[0];
|
||||
const lastText = () => (edits.at(-1) ?? replies.at(-1))?.content;
|
||||
return { interaction, replies, edits, modals, lastText };
|
||||
}
|
||||
|
||||
function fakeModalSubmit(o: { fields: Record<string, string>; guildId: string; userId?: string; username?: string }) {
|
||||
const replies: Captured[] = [];
|
||||
const edits: Captured[] = [];
|
||||
const interaction = {
|
||||
guildId: o.guildId,
|
||||
guild: { name: 'E2E Guild' },
|
||||
user: { id: o.userId ?? 'e2e-user', username: o.username ?? 'E2E User', bot: false, send: async () => null },
|
||||
fields: { getTextInputValue: (name: string) => o.fields[name] ?? '' },
|
||||
async deferReply() { /* no-op */ },
|
||||
async editReply(payload: string | Captured) {
|
||||
edits.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
async reply(payload: string | Captured) {
|
||||
replies.push(typeof payload === 'string' ? { content: payload } : payload);
|
||||
return {};
|
||||
},
|
||||
} as unknown as Parameters<typeof handleCustomRegisterModal>[0];
|
||||
const lastText = () => (edits.at(-1) ?? replies.at(-1))?.content;
|
||||
return { interaction, replies, edits, lastText };
|
||||
}
|
||||
|
||||
// Embed title lives at embeds[0].toJSON().title — read via the stable API.
|
||||
function embedTitle(payload: Captured): string | undefined {
|
||||
const e = payload.embeds?.[0] as { toJSON?: () => { title?: string } } | undefined;
|
||||
return e?.toJSON?.()?.title;
|
||||
}
|
||||
|
||||
// If DISCORD_ALLOWED_USERS is configured, admin commands require an id in that
|
||||
// list; use the first allowed id so the admin sub-tests aren't flaked out by the
|
||||
// allowlist. Empty list = everyone allowed.
|
||||
const ALLOWED = (process.env.DISCORD_ALLOWED_USERS ?? '').split(',').map(s => s.trim()).filter(Boolean);
|
||||
const adminId = ALLOWED[0] ?? 'e2e-admin';
|
||||
|
||||
describe.skipIf(!runE2E)('AC8 — registry slash commands round-trip through real Redis (live)', () => {
|
||||
const run = runId();
|
||||
const guildId = `e2e-regs-${run}`;
|
||||
const userId = 'e2e-player';
|
||||
const stubClient = {} as never; // /dndname set only touches the client if there are held messages (none here)
|
||||
|
||||
beforeAll(async () => {
|
||||
// Start clean for our synthetic guild.
|
||||
const { redis } = await import('../../../src/db/redis.js');
|
||||
await redis.del(`characters:${guildId}`).catch(() => null);
|
||||
}, 30_000);
|
||||
|
||||
afterAll(async () => {
|
||||
const { redis } = await import('../../../src/db/redis.js');
|
||||
await redis.del(`characters:${guildId}`).catch(() => null);
|
||||
redis.disconnect();
|
||||
}, 30_000);
|
||||
|
||||
// ── /dndname ──────────────────────────────────────────────────────────────
|
||||
it('D1 /dndname set→show→clear round-trips through playerRegistry', async () => {
|
||||
// show before set → no registration
|
||||
const show0 = fakeChatInput({ subcommand: 'show', guildId, userId });
|
||||
await dndnameExecute(show0.interaction, stubClient);
|
||||
expect(show0.lastText(), 'unregistered show must say so').toMatch(/No character registered/i);
|
||||
|
||||
// set
|
||||
const setName = `Rook-${run}`;
|
||||
const setI = fakeChatInput({ subcommand: 'set', strings: { name: setName }, guildId, userId });
|
||||
await dndnameExecute(setI.interaction, stubClient);
|
||||
expect(setI.lastText(), 'set must confirm the name').toContain(setName);
|
||||
|
||||
// show after set → reflects it
|
||||
const show1 = fakeChatInput({ subcommand: 'show', guildId, userId });
|
||||
await dndnameExecute(show1.interaction, stubClient);
|
||||
expect(show1.lastText(), 'show after set must return the name').toContain(setName);
|
||||
|
||||
// Cross-check the registry the command writes through.
|
||||
const profile = await playerRegistry.get(guildId, userId);
|
||||
expect(profile?.dndName, 'playerRegistry must persist the name').toBe(setName);
|
||||
|
||||
// clear
|
||||
const clearI = fakeChatInput({ subcommand: 'clear', guildId, userId });
|
||||
await dndnameExecute(clearI.interaction, stubClient);
|
||||
expect(clearI.lastText(), 'clear must confirm').toMatch(/cleared/i);
|
||||
expect(await playerRegistry.get(guildId, userId), 'registry must be empty after clear').toBeNull();
|
||||
}, 30_000);
|
||||
|
||||
// ── /character register custom (modal) + show + clear ─────────────────────
|
||||
it('D2 /character register custom modal persists, /character show renders, /character clear removes', async () => {
|
||||
// execute() opens the modal; it does not persist anything itself.
|
||||
const reg = fakeChatInput({ group: 'register', subcommand: 'custom', guildId, userId });
|
||||
await characterExecute(reg.interaction);
|
||||
expect(reg.modals.length, 'register custom must showModal').toBe(1);
|
||||
|
||||
// Drive the modal-submit handler the bot routes to on submit.
|
||||
const charName = `Zara-${run}`;
|
||||
const modal = fakeModalSubmit({
|
||||
fields: { char_name: charName, char_pronouns: 'she/her', char_class: 'Rogue', char_race: 'Half-Elf', char_backstory: 'ex-smuggler' },
|
||||
guildId,
|
||||
userId,
|
||||
});
|
||||
await handleCustomRegisterModal(modal.interaction);
|
||||
expect(modal.lastText(), 'modal submit must confirm the saved name').toContain(charName);
|
||||
|
||||
// /character show renders the persisted profile as an embed.
|
||||
const show = fakeChatInput({ subcommand: 'show', guildId, userId });
|
||||
await characterExecute(show.interaction);
|
||||
expect(show.replies.length, 'show must reply with an embed').toBe(1);
|
||||
expect(embedTitle(show.replies[0]), 'show embed title must be the character sheet').toContain(charName);
|
||||
|
||||
// Cross-check the registry.
|
||||
const profile = await characterRegistry.get(guildId, userId);
|
||||
expect(profile?.dndName).toBe(charName);
|
||||
expect(profile?.source).toBe('custom');
|
||||
expect(profile?.characterClass).toBe('Rogue');
|
||||
expect(profile?.foundryActorUuid, 'custom registration must not link Foundry').toBeUndefined();
|
||||
|
||||
// /character clear removes it.
|
||||
const clear = fakeChatInput({ subcommand: 'clear', guildId, userId });
|
||||
await characterExecute(clear.interaction);
|
||||
expect(clear.lastText(), 'clear must confirm').toMatch(/cleared/i);
|
||||
expect(await characterRegistry.get(guildId, userId)).toBeNull();
|
||||
}, 30_000);
|
||||
|
||||
// ── /character show with no profile ───────────────────────────────────────
|
||||
it('D3 /character show with no profile replies with the get-started prompt', async () => {
|
||||
const show = fakeChatInput({ subcommand: 'show', guildId, userId: 'nobody-yet' });
|
||||
await characterExecute(show.interaction);
|
||||
expect(show.lastText(), 'must prompt to register').toMatch(/No character profile found/i);
|
||||
}, 15_000);
|
||||
|
||||
// ── /character admin list + admin remove ──────────────────────────────────
|
||||
it('D4 /character admin list (empty + populated) and admin remove round-trip through characterRegistry', async () => {
|
||||
// Empty list first.
|
||||
const list0 = fakeChatInput({ group: 'admin', subcommand: 'list', guildId, userId: adminId });
|
||||
await characterExecute(list0.interaction);
|
||||
expect(list0.lastText(), 'empty admin list must say so').toMatch(/No characters registered/i);
|
||||
|
||||
// Seed two registrations directly via the registry the admin commands read.
|
||||
await characterRegistry.set(guildId, { discordId: 'p1', dndName: `Aria-${run}`, source: 'custom' });
|
||||
await characterRegistry.set(guildId, { discordId: 'p2', dndName: `Bram-${run}`, source: 'custom' });
|
||||
|
||||
const list1 = fakeChatInput({ group: 'admin', subcommand: 'list', guildId, userId: adminId });
|
||||
await characterExecute(list1.interaction);
|
||||
expect(list1.replies.length, 'populated admin list must reply with an embed').toBe(1);
|
||||
const json = (list1.replies[0].embeds?.[0] as { toJSON: () => { fields?: Array<{ name: string }> } }).toJSON();
|
||||
const names = json.fields?.map(f => f.name) ?? [];
|
||||
expect(names, 'admin list must include both registered characters').toEqual(
|
||||
expect.arrayContaining([`Aria-${run}`, `Bram-${run}`]),
|
||||
);
|
||||
|
||||
// admin remove one user.
|
||||
const remove = fakeChatInput({
|
||||
group: 'admin', subcommand: 'remove', guildId, userId: adminId,
|
||||
users: { user: { id: 'p1', username: 'AriaUser' } },
|
||||
});
|
||||
await characterExecute(remove.interaction);
|
||||
expect(remove.lastText(), 'remove must confirm by username').toContain('AriaUser');
|
||||
expect(await characterRegistry.get(guildId, 'p1'), 'removed user must be gone').toBeNull();
|
||||
expect(await characterRegistry.get(guildId, 'p2'), 'other user must remain').not.toBeNull();
|
||||
}, 30_000);
|
||||
|
||||
// ── /actions with no Foundry link ─────────────────────────────────────────
|
||||
it('D5 /actions with no linked Foundry character replies with the link prompt', async () => {
|
||||
// Register a custom (non-Foundry) character so the profile exists but has no link.
|
||||
await characterRegistry.set(guildId, { discordId: userId, dndName: `NoLink-${run}`, source: 'custom' });
|
||||
|
||||
const act = fakeChatInput({ guildId, userId });
|
||||
await actionsExecute(act.interaction);
|
||||
expect(act.lastText(), 'must prompt to link a Foundry character').toMatch(/No Foundry character linked/i);
|
||||
}, 15_000);
|
||||
|
||||
// ── /character view pre-Foundry branches ──────────────────────────────────
|
||||
it('D6 /character view with no profile and with an unlinked profile hits the pre-Foundry branches', async () => {
|
||||
// No profile at all.
|
||||
const view0 = fakeChatInput({ subcommand: 'view', guildId, userId: 'never-registered' });
|
||||
await characterExecute(view0.interaction);
|
||||
expect(view0.edits.at(-1)?.content, 'no-profile view must prompt to register').toMatch(/No character profile found/i);
|
||||
|
||||
// Profile exists but no Foundry link.
|
||||
const view1 = fakeChatInput({ subcommand: 'view', guildId, userId });
|
||||
await characterExecute(view1.interaction);
|
||||
expect(view1.edits.at(-1)?.content, 'unlinked view must point to register foundry').toMatch(/isn't linked to Foundry VTT/i);
|
||||
}, 15_000);
|
||||
});
|
||||
@@ -140,4 +140,81 @@ describe('filterLLMResponse', () => {
|
||||
expect(filterLLMResponse('The innkeeper nods. [SESSION] Zara entered.').ok).toBe(false);
|
||||
});
|
||||
});
|
||||
|
||||
// Regression: 2026-06-20 game session leaked raw tool_call JSON to players when
|
||||
// parseToolCall missed a variant the model emitted. The filter is the
|
||||
// defense-in-depth safety net — any leaked internal-format content is
|
||||
// suppressed before thread.send posts it to the Discord thread.
|
||||
describe('leaked tool-call block', () => {
|
||||
it('catches the exact unfenced leak from the 2026-06-20 session', () => {
|
||||
const r = filterLLMResponse(
|
||||
'tool_call\n{ "tool": "skill_check_emit", "args": { "player": "Maximus", "prompt": "Melee attack with knife against Angro Harn", "skill": "Ranged Attack", "dc": 12 } }',
|
||||
);
|
||||
expect(r.ok).toBe(false);
|
||||
expect(r.reason).toBe('leaked_tool_call');
|
||||
});
|
||||
it('catches unfenced tool_call header with role in args', () => {
|
||||
expect(
|
||||
filterLLMResponse(
|
||||
'tool_call\n{ "tool": "skill_check_emit", "args": { "player": "Maximus", "role": "player", "prompt": "Melee attack with knife against the sleeping guard", "skill": "Melee Attack", "dc": 8 } }',
|
||||
).ok,
|
||||
).toBe(false);
|
||||
});
|
||||
it('catches a fenced ```tool_call block', () => {
|
||||
expect(
|
||||
filterLLMResponse('```tool_call\n{"tool":"skill_check_emit","args":{"player":"Maximus","prompt":"x","skill":"Perception","dc":10}}\n```').ok,
|
||||
).toBe(false);
|
||||
});
|
||||
it('catches bare tool-call JSON without a tool_call header', () => {
|
||||
const r = filterLLMResponse('{ "tool": "skill_check_emit", "args": { "player": "Maximus", "dc": 12 } }');
|
||||
expect(r.ok).toBe(false);
|
||||
expect(r.reason).toBe('leaked_tool_call');
|
||||
});
|
||||
it('catches bare tool-call JSON with no spaces', () => {
|
||||
expect(filterLLMResponse('{"tool":"skill_check_emit","args":{"player":"Maximus","dc":12}}').ok).toBe(false);
|
||||
});
|
||||
it('catches tool-call JSON fenced as ```json', () => {
|
||||
expect(
|
||||
filterLLMResponse('```json\n{ "tool": "skill_check_emit", "args": { "dc": 12 } }\n```').ok,
|
||||
).toBe(false);
|
||||
});
|
||||
it('catches a tool_call header followed by prose preamble before it', () => {
|
||||
// Even with narrative preamble, the tool_call token marks the rest as a leak.
|
||||
expect(
|
||||
filterLLMResponse(
|
||||
'Maximus draws the knife.\ntool_call\n{ "tool": "skill_check_emit", "args": { "player": "Maximus", "prompt": "stab", "skill": "Melee Attack", "dc": 12 } }',
|
||||
).ok,
|
||||
).toBe(false);
|
||||
});
|
||||
it('catches args-first tool-call JSON', () => {
|
||||
expect(
|
||||
filterLLMResponse('{ "args": { "player": "Maximus", "dc": 12 }, "tool": "skill_check_emit" }').ok,
|
||||
).toBe(false);
|
||||
});
|
||||
it('is case-insensitive on the tool_call token', () => {
|
||||
expect(filterLLMResponse('TOOL_CALL\n{ "tool": "x", "args": {} }').ok).toBe(false);
|
||||
});
|
||||
it('still blocks leaked tool-call even when skipRollClaim is true', () => {
|
||||
const r = filterLLMResponse('tool_call\n{ "tool": "skill_check_emit", "args": { "dc": 12 } }', {
|
||||
skipRollClaim: true,
|
||||
});
|
||||
expect(r.ok).toBe(false);
|
||||
expect(r.reason).toBe('leaked_tool_call');
|
||||
});
|
||||
});
|
||||
|
||||
describe('leaked tool-call — no false positives on clean in-world prose', () => {
|
||||
it('does not flag normal narration', () => {
|
||||
expect(filterLLMResponse('The blade catches the torchlight as Maximus steps forward.').ok).toBe(true);
|
||||
});
|
||||
it('does not flag the word "tool" in prose (no underscore join)', () => {
|
||||
expect(filterLLMResponse('The burglar lays down his tools and waits.').ok).toBe(true);
|
||||
});
|
||||
it('does not flag an unrelated JSON-like brace in narration', () => {
|
||||
expect(filterLLMResponse('A {strange} wind howls through the empty hall.').ok).toBe(true);
|
||||
});
|
||||
it('does not flag dialogue that merely mentions a call', () => {
|
||||
expect(filterLLMResponse('"I will call for the guard," the innkeeper shouts.').ok).toBe(true);
|
||||
});
|
||||
});
|
||||
});
|
||||
|
||||
Reference in New Issue
Block a user