kaykayyali/zalbot

Fork 0

Files

Kaysser Kayyali fbd991a2b0

tests / Unit tests (Node 22) (push) Failing after 28s

Details

feat: docs pass, test fixes, advanced review

2026-06-19 16:15:06 +00:00

27 KiB

Raw Permalink Blame History

Mardonar Encounter Engine — Architecture

Single-part backend project. Discord-native, LLM-driven D&D encounter engine. Generated 2026-06-19 from a deep scan of /home/kaykayyali/hosting/mardonar-npcs.

Executive Summary

The Mardonar Encounter Engine is a Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. An LLM (Gemma 4 IT e2b via LiteLLM with Ollama fallback) narrates the scene, voices NPCs, drives skill checks, and steers the encounter toward hidden outcomes defined in a YAML spec. NPC memory, lore context, and encounter history are persisted in a graph database (Neo4j) accessed through a JSON-RPC MCP server (GraphMCP). Active session state lives in Redis with a TTL. The bot can also reach into Foundry VTT to resolve character stats and award XP via an external relay.

Key constraint: the harness controls everything the LLM sees. The 128k context window is partitioned into hard zones (system / pinned / sliding / safety) and the assembly pipeline is deterministic. Tool calls are extracted from fenced tool_call JSON blocks, not via native function calling — Gemma at e2b quantization isn't reliable for native tools.

1. Technology Stack

Layer	Technology	Version	Notes
Runtime	Node.js	22 (alpine)	ESM modules, NodeNext resolution
Language	TypeScript	5.8	strict mode, declaration + sourcemap output
Discord	discord.js	v14.18	Slash commands + embeds + threads
LLM primary	LiteLLM proxy	(env: `LITELLM_BASE_URL`)	OpenAI-compatible
LLM fallback	Ollama	env: `OLLAMA_BASE_URL`	gemma4-it:e2b, 128k context
Session cache	Redis (ioredis)	5.4	TTL = `SESSION_TTL_HOURS` (default 12h)
Graph DB	Neo4j	5	via GraphMCP JSON-RPC, not direct
Lore / NPC memory	GraphMCP HTTP JSON-RPC	(env: `GRAPHMCP_URL`)	6 RPC tools exposed
Foundry VTT	VTT relay HTTPS	(env: `VTT_RELAY_URL`)	Optional, requires API key
Validation	Zod	3.24	env + encounter spec
Logging	custom (src/lib/logger.ts)	—	plaintext stdout; no env-driven level filter
Testing	Vitest	3.1	`tests/unit` + `tests/integration`
Build	tsc → dist/	5.8	multi-stage Dockerfile

Architecture pattern: layered backend with a plugin-style tool registry. Three layers: bot (Discord I/O), harness (LLM orchestration), session + db + graphmcp + vtt (data + integrations).

2. Source Tree

mardonar-bot/
├── src/
│   ├── bot/                          # Discord I/O layer
│   │   ├── index.ts                  # Entry: Client setup, event wiring
│   │   ├── commands/                 # 8 slash command modules
│   │   │   ├── dndname.ts            # /dndname set|show|clear
│   │   │   ├── encounter.ts          # /encounter start|status|end|generate|spec|random|stats|audit
│   │   │   ├── character.ts          # /character register|show|view|admin
│   │   │   ├── roll.ts               # /roll
│   │   │   ├── actions.ts            # /actions
│   │   │   ├── xp.ts                 # /xp award
│   │   │   ├── encounters.ts         # /encounters (list/search from GraphMCP)
│   │   │   └── turn.ts               # /turn
│   │   ├── embeds/                   # Discord embed builders
│   │   │   ├── playerGate.ts
│   │   │   ├── skillCheck.ts         # Suspense + dice + roll buttons
│   │   │   ├── resolution.ts
│   │   │   ├── encounterDiscovery.ts
│   │   │   └── loreAnswer.ts
│   │   ├── handlers/                 # Event handlers / sidecar logic
│   │   │   ├── messageRouter.ts      # Encounter-thread message pipeline (heart of runtime)
│   │   │   ├── mentionHandler.ts     # @Zalram persona replies
│   │   │   ├── rollHandler.ts        # Button / modal submit roll resolution
│   │   │   ├── generationQueue.ts    # Debounce + LLM turn scheduling
│   │   │   ├── queueCap.ts           # Burst cap → drop notice
│   │   │   ├── reactionManager.ts    # 👀 reaction lifecycle (scheduled/processing/complete)
│   │   │   └── responseFilter.ts     # Post-LLM response scrubbing
│   │   └── lib/welcomeDM.ts
│   ├── harness/                      # LLM orchestration
│   │   ├── promptBuilder.ts          # System prompt assembly (XML sections)
│   │   ├── contextAssembler.ts       # Pin/slide history + token budget trim
│   │   ├── llmClient.ts              # LiteLLM primary → Ollama fallback
│   │   ├── litellmClient.ts          # OpenAI-compatible HTTP client
│   │   ├── ollamaClient.ts           # Native ollama npm + direct HTTP
│   │   ├── toolParser.ts             # Extract ```tool_call``` blocks
│   │   ├── toolRegistry.ts           # Plugin registry + active-set filtering
│   │   ├── toolDispatcher.ts         # Per-encounter tool validation + dispatch
│   │   └── tools/                    # 6 tool plugins (see §5)
│   ├── session/                      # Redis-backed state
│   │   ├── playerRegistry.ts         # guildId+discordId → Player
│   │   ├── characterRegistry.ts      # Character profile + pronouns + Foundry UUID
│   │   ├── sessionManager.ts         # threadId → SessionState (pinned/sliding history)
│   │   ├── encounterLog.ts           # Filesystem tally + summary writer
│   │   └── xpAwarder.ts              # XP grant via VTT relay
│   ├── graphmcp/                     # GraphMCP JSON-RPC client
│   │   ├── client.ts                 # 6 RPC calls + NPC memory formatter
│   │   ├── ingest.ts                 # Publish to Redis stream (raw.messages)
│   │   ├── loreResolver.ts           # /encounter generate helper
│   │   └── vocabularyResolver.ts     # spec randomizable: vocabulary source
│   ├── vtt/                          # Foundry VTT integration
│   │   ├── foundryClient.ts          # HTTP client, formatters
│   │   └── relaySession.ts           # RSA-OAEP handshake + headless spin-up
│   ├── db/redis.ts                   # ioredis singleton (lazy connect)
│   ├── spec/loader.ts                # YAML loader + Zod schema
│   ├── persona/loader.ts             # persona.yaml loader for @mention
│   ├── lib/logger.ts                 # custom tag+message logger (plaintext stdout)
│   ├── config.ts                     # Zod env schema + parsed config singleton
│   ├── scripts/deploy-commands.ts    # Slash command registration (REST v10)
│   └── types/index.ts                # Shared interfaces + CONTEXT_BUDGET const
├── specs/                            # 8 encounter YAML files
│   ├── SPEC_FORMAT.md
│   ├── market-thief.yaml
│   ├── cog-claw-debt.yaml
│   ├── mawfang-pursuit.yaml
│   ├── silt-leak.yaml
│   ├── stormscar-pilgrim.yaml
│   ├── velvet-auction.yaml
│   └── whispering-stone.yaml
├── data/                             # Runtime data (gitignored in practice)
│   ├── tally.json                    # Per-spec run counts
│   └── summaries/                    # One .txt per encounter
├── tests/
│   ├── unit/                         # 21 unit test files
│   └── integration/                  # 1 integration test
├── Docs/                             # Pre-existing project docs
│   ├── mardonar-encounter-engine.md  # ⚠ Out of date — describes Go architecture
│   ├── mardonar-build-plan.md
│   ├── epics.md
│   ├── stories/
│   └── ux-designs/
├── lore/                             # Game-world reference material
├── persona.yaml                      # Zalram Cloudwalker (bot's @mention persona)
├── prd.md                            # Active PRD: Dynamic Goal Registration
├── Dockerfile                        # Multi-stage node:22-alpine
├── docker-compose.dev.yml            # Builds the bot image; expects Redis + GraphMCP on the external `mardonar-internal` network
├── package.json
├── tsconfig.json
└── vitest.config.ts

3. Architecture Pattern

Layered backend with a plugin registry:

┌──────────────────────────────────────────────────────────────────┐
│  Discord (Gateway WebSocket)                                     │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/bot/                                                         │
│  ┌────────────────────┐  ┌────────────────┐  ┌──────────────┐  │
│  │ commands/          │  │ handlers/      │  │ embeds/      │  │
│  │ (slash cmd)        │  │ (event loops)  │  │ (UI shape)   │  │
│  └────────────────────┘  └────────────────┘  └──────────────┘  │
│         messageRouter is the runtime heart                       │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/harness/                                                     │
│  assembleContext → llmClient (LiteLLM → Ollama)                 │
│       ↓                                                            │
│  parseToolCall → dispatchTool → active tool plugins              │
└──────────────────────────────────────────────────────────────────┘
                            │                       │
                            ▼                       ▼
┌─────────────────────┐  ┌─────────────────┐  ┌──────────────────┐
│ src/session/        │  │ src/db/         │  │ src/graphmcp/    │
│  (Redis state)      │  │  (ioredis)      │  │  (JSON-RPC)      │
└─────────────────────┘  └─────────────────┘  └──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/vtt/  →  External Foundry VTT relay                         │
│  src/persona/  →  persona.yaml for @mentions                     │
│  src/spec/  →  specs/*.yaml loaded per encounter                 │
└──────────────────────────────────────────────────────────────────┘

3.1 Message flow (encounter thread)

Discord messageCreate → bot/index.ts → handleMessage in handlers/messageRouter.ts
Channel guard: must be a thread whose parent is in DISCORD_ALLOWED_CHANNELS
Player gate: if discordId not in playerRegistry, post ephemeral gate embed, hold message in SessionState.heldMessages, return
Roll guard: if pendingSkillCheck is set, increment attempt counter; auto-fail after PENDING_ROLL_LIMIT (5) skipped messages
Burst cap: queueCap rejects + sends drop notice if too many messages arrived before last LLM response
Append user message to history, fire 👀 reaction (fire-and-forget)
Publish to GraphMCP via graphmcp/ingest.ts (Redis stream raw.messages)
Debounced (500ms) → generationQueue.scheduleLLMTurn
runLLMTurn:
- assembleContext builds message list (system + pinned + trimmed sliding)
- callLLM → LiteLLM with Ollama fallback
- parseToolCall splits narrative from tool_call block
- filterLLMResponse rejects fabricated rolls / echoed system tags → injects [FILTER CORRECTION] and retries once
- Narrative posted to thread; assistant message appended to history
- If tool call present → dispatchTool → plugin handler → system message appended
- If result.resolved set → phase = 'resolved', archive thread after ENCOUNTER_ARCHIVE_DELAY_MS
reactionManager upgrades 👀 state to complete and clears burst counter

3.2 Tool dispatch

The tool layer uses a plugin registry (harness/toolRegistry.ts) with per-encounter active-set filtering. Each ToolPlugin declares:

{
  name: string;
  description: string;
  args: Record<string, { type: 'string' | 'number' | 'boolean'; description: string }>;
  contextDocs?: (spec: EncounterSpec) => string;
  handler: (args, ctx: ToolContext) => Promise<DispatchResult>;
}

A spec's tools: [...] array declares which plugins are active for that encounter. Tools are loaded by side-effect from harness/tools/index.ts:

import './skillCheckEmit.js';
import './encounterResolve.js';
import './contextRecall.js';
import './goalRegister.js';
import './foundryLookup.js';
import './foundryReward.js';

The LLM emits a tool call by appending a fenced tool_call JSON block. Three parser patterns (in order): fenced ```tool_call block, bare tool_call header, then a fuzzy bare-JSON fallback. Unrecognized tools or malformed args are logged and ignored — the narrative is preserved.

The system prompt section buildToolManifest(spec) injects only the active set's tool definitions into the prompt contract, so each encounter's LLM only sees tools it can use.

4. Data Architecture

4.1 Redis (transient state)

Key pattern	Value	TTL	Owner
`session:{threadId}`	`JSON.stringify(SessionState)`	`SESSION_TTL_HOURS` (12h)	`sessionManager`
`guild_threads:{guildId}`	Set of thread IDs	inherits	`sessionManager`
`players:{guildId}` (legacy design)	discordId → dndName	—	`playerRegistry` (current impl uses different scheme)
`raw.messages`	Redis stream	—	`graphmcp/ingest.ts`

SessionState (src/types/index.ts) is the central shape:

{
  encounterId, threadId, guildId,
  spec: EncounterSpec,
  players: Record<discordId, Player>,
  history: ChatMessage[],         // mix of pinned + sliding
  phase: 'open' | 'active' | 'resolved',
  heldMessages: HeldMessage[],    // for unregistered players
  outcome?, outcomeSummary?,
  npcMemories?: Record<npcId, string>,
  resolvedContext?: Record<key, string>,
  pendingSkillCheck?: { player, prompt, dc, messageId, modifier?, skill?, advantage?, disadvantage? },
  pendingSkillCheckAttempts?: number,
  createdAt, updatedAt,
}

4.2 Filesystem (`data/`)

tally.json — { [specName]: { runs, lastRun } }. Incremented at each encounter start.
summaries/{encounterId}-{ISO timestamp}.txt — one per resolved encounter, written by encounterLog.writeSummary().

4.3 GraphMCP / Neo4j (via JSON-RPC)

The bot never queries Neo4j directly. All graph access goes through GRAPHMCP_URL/mcp with JSON-RPC 2.0:

Tool	Args	Returns
`query_as_npc`	`npc_name, question, limit`	NPCQueryResult (chunks + graph_context)
`semantic_search`	`query, limit`	SemanticSearchResult
`log_encounter`	`title, participants, summary, location?, type?`	LogEncounterResult
`list_encounters`	`limit`	EncounterResultItem[]
`search_encounters`	`query?, location?, participant?, limit?`	EncounterResultItem[]
`get_encounter`	`id`	EncounterDetails

NPC memory is injected into the system prompt via formatNPCMemory() — past encounters witnessed + top-3 lore chunks above GRAPHMCP_SCORE_THRESHOLD.

4.4 Context window budget

src/types/index.ts exports a CONTEXT_BUDGET constant used by both contextAssembler and sessionManager:

Zone	Tokens
System prompt (narrator + NPCs + tools + goals)	4,000
Pinned (opening narrative, goal block)	2,000
Sliding history	118,000
Safety buffer	3,500
Total	128,000

History trimming drops the oldest non-pinned turn pair when over budget, with a hard floor of 6 messages. Token estimates use gpt-tokenizer with a 1.15× buffer to approximate Gemma's tokenizer.

5. API Surface

This project exposes its functionality as two different APIs:

5.1 Discord slash commands (player/admin surface)

Registered via src/scripts/deploy-commands.ts using Discord REST v10.

Command	Subcommands	Purpose
`/dndname`	`set <name>`, `show`, `clear`	Character name registration
`/character`	`register foundry\|custom`, `show`, `view`, `clear`, `admin list\|remove\|give`	Full character profile + Foundry link
`/encounter`	`start <spec>`, `random`, `status`, `stats`, `audit`, `end [notes]`, `list`, `generate <theme>`, `spec`	Encounter session lifecycle
`/encounters`	(Select menu + search modal)	Search the encounter log via GraphMCP
`/roll`	`action`	Manual dice roll
`/actions`	—	In-character action shortcuts
`/turn`	—	Turn management
`/xp`	`award <amount>`	Award XP (relay → VTT)

Plus button + modal interactions: skill-check roll buttons, give item, custom character registration, Foundry link, encounter select menu, search modal.

5.2 Tool plugins (LLM surface)

Defined in src/harness/tools/ and registered at module load. Each spec filters the active set via its tools: array.

Tool	Purpose	Args
`skill_check_emit`	Posts a dice-roll embed to the thread; blocks player input until resolved	`player, prompt, skill?, dc, advantage?, disadvantage?`
`encounter_resolve`	Marks encounter complete; writes summary; archives thread	(args handled in `tools/encounterResolve.ts`)
`context_recall`	Look up canonical session facts stored in `resolvedContext`
`goal_register`	Add a new goal mid-encounter (the `prd.md` "dynamic goal registration" feature)
`foundry_lookup`	Pull live character data from VTT relay
`foundry_reward`	Award XP/items to a character via VTT

⚠ Note: the Docs/mardonar-encounter-engine.md lists skill_check_resolve, event_log_append, npc_memory_read, npc_memory_write as tools. These have been removed — replaced by the per-encounter event log + GraphMCP log_encounter tool. The current tool set is the one above.

6. Deployment Architecture

6.1 Local development

docker compose -f docker-compose.dev.yml up -d   # Builds + runs bot; relies on Redis + GraphMCP already running on the `mardonar-internal` Docker network (see `docs/deployment-guide.md`)
npm install
npm run deploy-commands                          # registers slash commands with Discord
npm run dev                                      # tsx watch mode

6.2 Production (multi-stage Dockerfile)

Dockerfile (Node 22 alpine):

Builder stage — npm ci --ignore-scripts, copy src + tsconfig.json, npm run build → dist/
Runtime stage — npm ci --omit=dev --ignore-scripts, copy dist/, specs/, lore/, persona.yaml
CMD ["node", "dist/bot/index.js"]

docker-compose.dev.yml defines two services (for the mardonar-internal external Docker network that also hosts Redis + an MCP server from the GraphMCP-Example stack): deploy-commands (one-shot) and bot (long-running, with data/ mounted as a volume).

Gap: There is no production docker-compose.yml. The .env.example is the source of truth for runtime config.

6.3 Operational

Session state has a 12h TTL by default — stale encounters auto-expire
Bot connects to Redis on main() startup (redis.connect())
VTT relay auto-spins up a headless Foundry session on connection failure (RSA-OAEP encrypted handshake)
Logging: src/lib/logger.ts writes plaintext to stdout. No LOG_LEVEL env knob; callers pick the level per-call. (Earlier docs claimed pino + structured JSON — that was aspirational; the pino deps were unused and have been removed.)

7. Development & Testing

7.1 Local commands

Command	Effect
`npm run dev`	`tsx watch src/bot/index.ts` — auto-reload dev
`npm run build`	`tsc` → `dist/`
`npm run start`	`node dist/bot/index.js`
`npm run deploy-commands`	One-shot slash command registration
`npm run test`	All tests (vitest)
`npm run test:unit`	Unit tests only (no external services)
`npm run test:int`	Integration tests (requires Docker services)

7.2 Test coverage

33 unit test files in tests/unit/ (393 tests, 2 skipped)
1 integration test (tests/integration/phase1.test.ts)
tests/fixtures/spec.ts — shared encounter spec fixture

Notable test surfaces: promptBuilder, contextAssembler, historyTrim, toolParser, toolDispatcher, toolRegistry, sessionManager, playerRegistry, characterRegistry, specLoader, rollHandler, rollDetection, responseFilter, queueCap, generationQueue, reactionManager, encounterLog, encounterDiscoveryEmbed, loreAnswerEmbed, skillCheckEmbed, graphmcpClient, foundryClientRetry, foundryClientFormatters, goalRegister, relaySession, litellmClient, ollamaClient, personaLoader, foundryReward, xpAwarder, redisErrorPath, messageRouterRunLLMTurn, specsToolsConsistency (the last is a structural-consistency guard, not a module surface).

8. Design Decisions (Living)

Decision	Why
LiteLLM as primary, Ollama as fallback	OpenAI-compatible proxy gives model flexibility without code changes; Ollama fallback ensures the bot still runs when the proxy is down
Prompt-based tool calls (not native)	Gemma 4 IT at e2b is unreliable with native function calling; fenced JSON block parsing is deterministic
Tool plugin registry with per-spec active set	New tools can be added without touching the dispatch core; specs opt into only the tools they need
Pinned + sliding history	Opening narrative and goal block must survive trimming or the LLM loses its anchor
Goals in system prompt, not as a tool	Goals rarely change mid-encounter; embedding them reduces tool round-trips
Redis for active state, GraphMCP for memory	Redis is fast and ephemeral for live sessions; the graph holds long-term NPC lore
Player name gate via embed, not DMs	Keeps the conversation in-thread; ephemeral embed auto-deletes after 30s
Story generator via `/encounter generate`	Separates creative authoring from real-time inference — generator can use a stronger model later
VTT relay auto-spin-up	Lets the bot operate when the relay has been cold-stopped; uses RSA-OAEP for password handoff
In-world voice rule for player-facing strings	See `feedback-in-world-voice` — no utility/jargon in bot messages

9. Open Issues / Drift

Items the deep scan surfaced that aren't bugs but should be tracked:

Drift: Docs/mardonar-encounter-engine.md describes a Go bot with an embedded MCP layer; the actual code is TypeScript with an external JSON-RPC GraphMCP server. Treat the doc as historical/aspirational.
✅ Resolved 2026-06-19 — README.md's "Project Structure" tree referenced src/mcp/ and the old 2-command layout. README now reflects the actual 8-command structure, src/graphmcp/ (Neo4j/src/mcp/ retired), and includes a callout noting Docs/mardonar-encounter-engine.md is historical.
✅ Resolved 2026-06-19 — Duplicate trimHistory logic in src/session/sessionManager.ts and src/harness/contextAssembler.ts was extracted to src/lib/historyTrim.ts. tests/unit/historyTrim.test.ts covers the shared module at 100%.
No production compose file — only docker-compose.dev.yml. The Dockerfile is production-ready but deployment is ad-hoc.
✅ Resolved 2026-06-19 — No CI/CD — .gitea/workflows/test.yml runs tsc --noEmit, npm run test:unit, and npm run test:coverage on push/PR to main (Node 22, cached npm).
DISCORD_ALLOWED_USERS is empty by default → anyone in allowed channels can run /encounter start. The access control is channel-scoped, not user-scoped; admins need to set the env var explicitly.
OLLAMA_BASE_URL defaults to localhost — fine for dev, but production needs the LAN IP or proxy URL set.
✅ Resolved 2026-06-19 — Spec tool list must be kept in sync — tests/unit/specsToolsConsistency.test.ts walks every specs/*.yaml, asserts each entry in tools: [...] is registered in the tool plugin registry, and fails loudly with the file and unknown name if drift appears. Also asserts every registered tool is referenced by at least one spec.
✅ Resolved 2026-06-19 — Schema mismatch risk: src/types/index.ts now re-exports EncounterSpec (and its sub-shapes) derived from z.infer<typeof EncounterSpecSchema>. The static type and the runtime validator are now the same source of truth — drift is structurally impossible. Side effect: loadSpec now also validates xpReward as a number (was previously typed but unenforced).
✅ Resolved 2026-06-19 — Logging drift: the architecture previously claimed pino + pino-pretty + structured JSON. The actual logger is the custom src/lib/logger.ts (plaintext stdout, no env-driven level filter). The unused pino and pino-pretty dependencies were removed from package.json; §2.1, §2.2, and §6.3 now describe reality.
✅ Resolved 2026-06-19 — README drift: README.md was significantly out of date: it told new contributors to set a no-op LOG_LEVEL=debug, run the non-existent npm run validate-spec, and look at src/mcp/ (renamed to src/graphmcp/) and src/db/neo4j.ts (no Neo4j in the project). It also linked Docs/mardonar-encounter-engine.md (Go architecture, historical) as the current architecture doc. The dead top-level scripts/deploy-commands.ts — a stale duplicate of src/scripts/deploy-commands.ts that only knew about 2 of 8 commands — was removed. The README now reflects the actual layout, command set, and persistence layer.

Document generated by bmad-document-project initial scan, deep level. Project state recorded in docs/project-scan-report.json.

27 KiB Raw Permalink Blame History Unescape Escape