Files

tests / Unit tests (Node 22) (push) Failing after 2m13s

Details

Add unit tests for LLM clients, persona loader, and XP/Foundry rewards

Expands the unit test suite from 320 to 380 tests (+60) and adds a
Gitea Actions CI workflow. Closes all six follow-up recommendations
from the test-architecture validation report.

New tests (tests/unit/):
  - ollamaClient.test.ts          — Ollama SDK wrapper, options passthrough
  - litellmClient.test.ts         — OpenAI SDK wrapper, model fallback
  - personaLoader.test.ts         — Zod validation + cache invalidation
  - foundryReward.test.ts         — Tool plugin: lookup, errors, partial grants
  - xpAwarder.test.ts             — Bulk XP awards + per-player skip reasons
  - redisErrorPath.test.ts        — Singleton error handler does not crash
  - messageRouterRunLLMTurn.test.ts — 18 cases for the runtime heart:
    narrative-only path, tool dispatch, filter correction, retry loop
    guard, missed-skill-check heuristic, typing indicator interval,
    LLM error fallback, archive on resolve.

Coverage (line %):
  - harness/litellmClient.ts      0 → 100
  - harness/ollamaClient.ts       0 → 100
  - harness/tools/foundryReward.ts 0 → 100
  - session/xpAwarder.ts          0 → 100
  - persona/loader.ts             0 → 100
  - db/redis.ts                   0 → 100
  - bot/handlers/messageRouter.ts 0 → 39.86 (runLLMTurn now covered)

Tooling:
  - package.json: + test:coverage, test:watch scripts
  - devDep: @vitest/coverage-v8@^3.1.0
  - tests/README.md: conventions, anti-patterns, template map
  - .gitignore: exclude coverage/
  - .gitea/workflows/test.yml: Node 22, npm cache, tsc --noEmit gate

Documentation (from earlier /bmad-document-project run, now committed):
  - docs/index.md
  - docs/project-overview.md
  - docs/architecture.md
  - docs/deployment-guide.md
  - docs/api-contracts.md
  - docs/data-models.md
  - docs/source-tree-analysis.md
  - docs/component-inventory.md
  - docs/development-guide.md
  - _bmad-output/test-artifacts/automate-validation-report.md

Co-Authored-By: Claude <noreply@anthropic.com>

2026-06-19 05:59:13 +00:00

25 KiB

Raw Blame History

Mardonar Encounter Engine — Architecture

Single-part backend project. Discord-native, LLM-driven D&D encounter engine. Generated 2026-06-19 from a deep scan of /home/kaykayyali/hosting/mardonar-npcs.

Executive Summary

The Mardonar Encounter Engine is a Discord bot that runs structured D&D encounters. Each Discord thread is an encounter session. An LLM (Gemma 4 IT e2b via LiteLLM with Ollama fallback) narrates the scene, voices NPCs, drives skill checks, and steers the encounter toward hidden outcomes defined in a YAML spec. NPC memory, lore context, and encounter history are persisted in a graph database (Neo4j) accessed through a JSON-RPC MCP server (GraphMCP). Active session state lives in Redis with a TTL. The bot can also reach into Foundry VTT to resolve character stats and award XP via an external relay.

Key constraint: the harness controls everything the LLM sees. The 128k context window is partitioned into hard zones (system / pinned / sliding / safety) and the assembly pipeline is deterministic. Tool calls are extracted from fenced tool_call JSON blocks, not via native function calling — Gemma at e2b quantization isn't reliable for native tools.

1. Technology Stack

Layer	Technology	Version	Notes
Runtime	Node.js	22 (alpine)	ESM modules, NodeNext resolution
Language	TypeScript	5.8	strict mode, declaration + sourcemap output
Discord	discord.js	v14.18	Slash commands + embeds + threads
LLM primary	LiteLLM proxy	(env: `LITELLM_BASE_URL`)	OpenAI-compatible
LLM fallback	Ollama	env: `OLLAMA_BASE_URL`	gemma4-it:e2b, 128k context
Session cache	Redis (ioredis)	5.4	TTL = `SESSION_TTL_HOURS` (default 12h)
Graph DB	Neo4j	5	via GraphMCP JSON-RPC, not direct
Lore / NPC memory	GraphMCP HTTP JSON-RPC	(env: `GRAPHMCP_URL`)	6 RPC tools exposed
Foundry VTT	VTT relay HTTPS	(env: `VTT_RELAY_URL`)	Optional, requires API key
Validation	Zod	3.24	env + encounter spec
Logging	pino + pino-pretty	9.6 / 13	structured JSON in prod
Testing	Vitest	3.1	`tests/unit` + `tests/integration`
Build	tsc → dist/	5.8	multi-stage Dockerfile

Architecture pattern: layered backend with a plugin-style tool registry. Three layers: bot (Discord I/O), harness (LLM orchestration), session + db + graphmcp + vtt (data + integrations).

2. Source Tree

mardonar-bot/
├── src/
│   ├── bot/                          # Discord I/O layer
│   │   ├── index.ts                  # Entry: Client setup, event wiring
│   │   ├── commands/                 # 8 slash command modules
│   │   │   ├── dndname.ts            # /dndname set|show|clear
│   │   │   ├── encounter.ts          # /encounter start|status|end|generate|spec|random|stats|audit
│   │   │   ├── character.ts          # /character register|show|view|admin
│   │   │   ├── roll.ts               # /roll
│   │   │   ├── actions.ts            # /actions
│   │   │   ├── xp.ts                 # /xp award
│   │   │   ├── encounters.ts         # /encounters (list/search from GraphMCP)
│   │   │   └── turn.ts               # /turn
│   │   ├── embeds/                   # Discord embed builders
│   │   │   ├── playerGate.ts
│   │   │   ├── skillCheck.ts         # Suspense + dice + roll buttons
│   │   │   ├── resolution.ts
│   │   │   ├── encounterDiscovery.ts
│   │   │   └── loreAnswer.ts
│   │   ├── handlers/                 # Event handlers / sidecar logic
│   │   │   ├── messageRouter.ts      # Encounter-thread message pipeline (heart of runtime)
│   │   │   ├── mentionHandler.ts     # @Zalram persona replies
│   │   │   ├── rollHandler.ts        # Button / modal submit roll resolution
│   │   │   ├── generationQueue.ts    # Debounce + LLM turn scheduling
│   │   │   ├── queueCap.ts           # Burst cap → drop notice
│   │   │   ├── reactionManager.ts    # 👀 reaction lifecycle (scheduled/processing/complete)
│   │   │   └── responseFilter.ts     # Post-LLM response scrubbing
│   │   └── lib/welcomeDM.ts
│   ├── harness/                      # LLM orchestration
│   │   ├── promptBuilder.ts          # System prompt assembly (XML sections)
│   │   ├── contextAssembler.ts       # Pin/slide history + token budget trim
│   │   ├── llmClient.ts              # LiteLLM primary → Ollama fallback
│   │   ├── litellmClient.ts          # OpenAI-compatible HTTP client
│   │   ├── ollamaClient.ts           # Native ollama npm + direct HTTP
│   │   ├── toolParser.ts             # Extract ```tool_call``` blocks
│   │   ├── toolRegistry.ts           # Plugin registry + active-set filtering
│   │   ├── toolDispatcher.ts         # Per-encounter tool validation + dispatch
│   │   └── tools/                    # 6 tool plugins (see §5)
│   ├── session/                      # Redis-backed state
│   │   ├── playerRegistry.ts         # guildId+discordId → Player
│   │   ├── characterRegistry.ts      # Character profile + pronouns + Foundry UUID
│   │   ├── sessionManager.ts         # threadId → SessionState (pinned/sliding history)
│   │   ├── encounterLog.ts           # Filesystem tally + summary writer
│   │   └── xpAwarder.ts              # XP grant via VTT relay
│   ├── graphmcp/                     # GraphMCP JSON-RPC client
│   │   ├── client.ts                 # 6 RPC calls + NPC memory formatter
│   │   ├── ingest.ts                 # Publish to Redis stream (raw.messages)
│   │   ├── loreResolver.ts           # /encounter generate helper
│   │   └── vocabularyResolver.ts     # spec randomizable: vocabulary source
│   ├── vtt/                          # Foundry VTT integration
│   │   ├── foundryClient.ts          # HTTP client, formatters
│   │   └── relaySession.ts           # RSA-OAEP handshake + headless spin-up
│   ├── db/redis.ts                   # ioredis singleton (lazy connect)
│   ├── spec/loader.ts                # YAML loader + Zod schema
│   ├── persona/loader.ts             # persona.yaml loader for @mention
│   ├── lib/logger.ts                 # pino wrapper
│   ├── config.ts                     # Zod env schema + parsed config singleton
│   ├── scripts/deploy-commands.ts    # Slash command registration (REST v10)
│   └── types/index.ts                # Shared interfaces + CONTEXT_BUDGET const
├── specs/                            # 8 encounter YAML files
│   ├── SPEC_FORMAT.md
│   ├── market-thief.yaml
│   ├── cog-claw-debt.yaml
│   ├── mawfang-pursuit.yaml
│   ├── silt-leak.yaml
│   ├── stormscar-pilgrim.yaml
│   ├── velvet-auction.yaml
│   └── whispering-stone.yaml
├── data/                             # Runtime data (gitignored in practice)
│   ├── tally.json                    # Per-spec run counts
│   └── summaries/                    # One .txt per encounter
├── tests/
│   ├── unit/                         # 21 unit test files
│   └── integration/                  # 1 integration test
├── Docs/                             # Pre-existing project docs
│   ├── mardonar-encounter-engine.md  # ⚠ Out of date — describes Go architecture
│   ├── mardonar-build-plan.md
│   ├── epics.md
│   ├── stories/
│   └── ux-designs/
├── lore/                             # Game-world reference material
├── persona.yaml                      # Zalram Cloudwalker (bot's @mention persona)
├── prd.md                            # Active PRD: Dynamic Goal Registration
├── Dockerfile                        # Multi-stage node:22-alpine
├── docker-compose.dev.yml            # Local Redis + Neo4j
├── package.json
├── tsconfig.json
└── vitest.config.ts

3. Architecture Pattern

Layered backend with a plugin registry:

┌──────────────────────────────────────────────────────────────────┐
│  Discord (Gateway WebSocket)                                     │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/bot/                                                         │
│  ┌────────────────────┐  ┌────────────────┐  ┌──────────────┐  │
│  │ commands/          │  │ handlers/      │  │ embeds/      │  │
│  │ (slash cmd)        │  │ (event loops)  │  │ (UI shape)   │  │
│  └────────────────────┘  └────────────────┘  └──────────────┘  │
│         messageRouter is the runtime heart                       │
└──────────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/harness/                                                     │
│  assembleContext → llmClient (LiteLLM → Ollama)                 │
│       ↓                                                            │
│  parseToolCall → dispatchTool → active tool plugins              │
└──────────────────────────────────────────────────────────────────┘
                            │                       │
                            ▼                       ▼
┌─────────────────────┐  ┌─────────────────┐  ┌──────────────────┐
│ src/session/        │  │ src/db/         │  │ src/graphmcp/    │
│  (Redis state)      │  │  (ioredis)      │  │  (JSON-RPC)      │
└─────────────────────┘  └─────────────────┘  └──────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────────────┐
│  src/vtt/  →  External Foundry VTT relay                         │
│  src/persona/  →  persona.yaml for @mentions                     │
│  src/spec/  →  specs/*.yaml loaded per encounter                 │
└──────────────────────────────────────────────────────────────────┘

3.1 Message flow (encounter thread)

Discord messageCreate → bot/index.ts → handleMessage in handlers/messageRouter.ts
Channel guard: must be a thread whose parent is in DISCORD_ALLOWED_CHANNELS
Player gate: if discordId not in playerRegistry, post ephemeral gate embed, hold message in SessionState.heldMessages, return
Roll guard: if pendingSkillCheck is set, increment attempt counter; auto-fail after PENDING_ROLL_LIMIT (5) skipped messages
Burst cap: queueCap rejects + sends drop notice if too many messages arrived before last LLM response
Append user message to history, fire 👀 reaction (fire-and-forget)
Publish to GraphMCP via graphmcp/ingest.ts (Redis stream raw.messages)
Debounced (500ms) → generationQueue.scheduleLLMTurn
runLLMTurn:
- assembleContext builds message list (system + pinned + trimmed sliding)
- callLLM → LiteLLM with Ollama fallback
- parseToolCall splits narrative from tool_call block
- filterLLMResponse rejects fabricated rolls / echoed system tags → injects [FILTER CORRECTION] and retries once
- Narrative posted to thread; assistant message appended to history
- If tool call present → dispatchTool → plugin handler → system message appended
- If result.resolved set → phase = 'resolved', archive thread after ENCOUNTER_ARCHIVE_DELAY_MS
reactionManager upgrades 👀 state to complete and clears burst counter

3.2 Tool dispatch

The tool layer uses a plugin registry (harness/toolRegistry.ts) with per-encounter active-set filtering. Each ToolPlugin declares:

{
  name: string;
  description: string;
  args: Record<string, { type: 'string' | 'number' | 'boolean'; description: string }>;
  contextDocs?: (spec: EncounterSpec) => string;
  handler: (args, ctx: ToolContext) => Promise<DispatchResult>;
}

A spec's tools: [...] array declares which plugins are active for that encounter. Tools are loaded by side-effect from harness/tools/index.ts:

import './skillCheckEmit.js';
import './encounterResolve.js';
import './contextRecall.js';
import './goalRegister.js';
import './foundryLookup.js';
import './foundryReward.js';

The LLM emits a tool call by appending a fenced tool_call JSON block. Three parser patterns (in order): fenced ```tool_call block, bare tool_call header, then a fuzzy bare-JSON fallback. Unrecognized tools or malformed args are logged and ignored — the narrative is preserved.

The system prompt section buildToolManifest(spec) injects only the active set's tool definitions into the prompt contract, so each encounter's LLM only sees tools it can use.

4. Data Architecture

4.1 Redis (transient state)

Key pattern	Value	TTL	Owner
`session:{threadId}`	`JSON.stringify(SessionState)`	`SESSION_TTL_HOURS` (12h)	`sessionManager`
`guild_threads:{guildId}`	Set of thread IDs	inherits	`sessionManager`
`players:{guildId}` (legacy design)	discordId → dndName	—	`playerRegistry` (current impl uses different scheme)
`raw.messages`	Redis stream	—	`graphmcp/ingest.ts`

SessionState (src/types/index.ts) is the central shape:

{
  encounterId, threadId, guildId,
  spec: EncounterSpec,
  players: Record<discordId, Player>,
  history: ChatMessage[],         // mix of pinned + sliding
  phase: 'open' | 'active' | 'resolved',
  heldMessages: HeldMessage[],    // for unregistered players
  outcome?, outcomeSummary?,
  npcMemories?: Record<npcId, string>,
  resolvedContext?: Record<key, string>,
  pendingSkillCheck?: { player, prompt, dc, messageId, modifier?, skill?, advantage?, disadvantage? },
  pendingSkillCheckAttempts?: number,
  createdAt, updatedAt,
}

4.2 Filesystem (`data/`)

tally.json — { [specName]: { runs, lastRun } }. Incremented at each encounter start.
summaries/{encounterId}-{ISO timestamp}.txt — one per resolved encounter, written by encounterLog.writeSummary().

4.3 GraphMCP / Neo4j (via JSON-RPC)

The bot never queries Neo4j directly. All graph access goes through GRAPHMCP_URL/mcp with JSON-RPC 2.0:

Tool	Args	Returns
`query_as_npc`	`npc_name, question, limit`	NPCQueryResult (chunks + graph_context)
`semantic_search`	`query, limit`	SemanticSearchResult
`log_encounter`	`title, participants, summary, location?, type?`	LogEncounterResult
`list_encounters`	`limit`	EncounterResultItem[]
`search_encounters`	`query?, location?, participant?, limit?`	EncounterResultItem[]
`get_encounter`	`id`	EncounterDetails

NPC memory is injected into the system prompt via formatNPCMemory() — past encounters witnessed + top-3 lore chunks above GRAPHMCP_SCORE_THRESHOLD.

4.4 Context window budget

src/types/index.ts exports a CONTEXT_BUDGET constant used by both contextAssembler and sessionManager:

Zone	Tokens
System prompt (narrator + NPCs + tools + goals)	4,000
Pinned (opening narrative, goal block)	2,000
Sliding history	118,000
Safety buffer	3,500
Total	128,000

History trimming drops the oldest non-pinned turn pair when over budget, with a hard floor of 6 messages. Token estimates use gpt-tokenizer with a 1.15× buffer to approximate Gemma's tokenizer.

5. API Surface

This project exposes its functionality as two different APIs:

5.1 Discord slash commands (player/admin surface)

Registered via src/scripts/deploy-commands.ts using Discord REST v10.

Command	Subcommands	Purpose
`/dndname`	`set <name>`, `show`, `clear`	Character name registration
`/character`	`register foundry\|custom`, `show`, `view`, `clear`, `admin list\|remove\|give`	Full character profile + Foundry link
`/encounter`	`start <spec>`, `random`, `status`, `stats`, `audit`, `end [notes]`, `list`, `generate <theme>`, `spec`	Encounter session lifecycle
`/encounters`	(Select menu + search modal)	Search the encounter log via GraphMCP
`/roll`	`action`	Manual dice roll
`/actions`	—	In-character action shortcuts
`/turn`	—	Turn management
`/xp`	`award <amount>`	Award XP (relay → VTT)

Plus button + modal interactions: skill-check roll buttons, give item, custom character registration, Foundry link, encounter select menu, search modal.

5.2 Tool plugins (LLM surface)

Defined in src/harness/tools/ and registered at module load. Each spec filters the active set via its tools: array.

Tool	Purpose	Args
`skill_check_emit`	Posts a dice-roll embed to the thread; blocks player input until resolved	`player, prompt, skill?, dc, advantage?, disadvantage?`
`encounter_resolve`	Marks encounter complete; writes summary; archives thread	(args handled in `tools/encounterResolve.ts`)
`context_recall`	Look up canonical session facts stored in `resolvedContext`
`goal_register`	Add a new goal mid-encounter (the `prd.md` "dynamic goal registration" feature)
`foundry_lookup`	Pull live character data from VTT relay
`foundry_reward`	Award XP/items to a character via VTT

⚠ Note: the Docs/mardonar-encounter-engine.md lists skill_check_resolve, event_log_append, npc_memory_read, npc_memory_write as tools. These have been removed — replaced by the per-encounter event log + GraphMCP log_encounter tool. The current tool set is the one above.

6. Deployment Architecture

6.1 Local development

docker compose -f docker-compose.dev.yml up -d   # Redis + Neo4j
npm install
npm run deploy-commands                          # registers slash commands with Discord
npm run dev                                      # tsx watch mode

6.2 Production (multi-stage Dockerfile)

Dockerfile (Node 22 alpine):

Builder stage — npm ci --ignore-scripts, copy src + tsconfig.json, npm run build → dist/
Runtime stage — npm ci --omit=dev --ignore-scripts, copy dist/, specs/, lore/, persona.yaml
CMD ["node", "dist/bot/index.js"]

docker-compose.dev.yml defines two services (for the mardonar-internal external Docker network that also hosts Redis + an MCP server from the GraphMCP-Example stack): deploy-commands (one-shot) and bot (long-running, with data/ mounted as a volume).

Gap: There is no production docker-compose.yml. The .env.example is the source of truth for runtime config.

6.3 Operational

Session state has a 12h TTL by default — stale encounters auto-expire
Bot connects to Redis on main() startup (redis.connect())
VTT relay auto-spins up a headless Foundry session on connection failure (RSA-OAEP encrypted handshake)
LOG_LEVEL=info in prod; pino writes structured JSON

7. Development & Testing

7.1 Local commands

Command	Effect
`npm run dev`	`tsx watch src/bot/index.ts` — auto-reload dev
`npm run build`	`tsc` → `dist/`
`npm run start`	`node dist/bot/index.js`
`npm run deploy-commands`	One-shot slash command registration
`npm run test`	All tests (vitest)
`npm run test:unit`	Unit tests only (no external services)
`npm run test:int`	Integration tests (requires Docker services)

7.2 Test coverage

21 unit test files in tests/unit/
1 integration test (tests/integration/phase1.test.ts)
tests/fixtures/spec.ts — shared encounter spec fixture

Notable test surfaces: promptBuilder, contextAssembler, toolParser, toolDispatcher, sessionManager, playerRegistry, characterRegistry, specLoader, rollHandler, rollDetection, responseFilter, queueCap, generationQueue, reactionManager, encounterLog, encounterDiscoveryEmbed, loreAnswerEmbed, skillCheckEmbed, graphmcpClient, foundryClientRetry, foundryClientFormatters, goalRegister, relaySession.

8. Design Decisions (Living)

Decision	Why
LiteLLM as primary, Ollama as fallback	OpenAI-compatible proxy gives model flexibility without code changes; Ollama fallback ensures the bot still runs when the proxy is down
Prompt-based tool calls (not native)	Gemma 4 IT at e2b is unreliable with native function calling; fenced JSON block parsing is deterministic
Tool plugin registry with per-spec active set	New tools can be added without touching the dispatch core; specs opt into only the tools they need
Pinned + sliding history	Opening narrative and goal block must survive trimming or the LLM loses its anchor
Goals in system prompt, not as a tool	Goals rarely change mid-encounter; embedding them reduces tool round-trips
Redis for active state, GraphMCP for memory	Redis is fast and ephemeral for live sessions; the graph holds long-term NPC lore
Player name gate via embed, not DMs	Keeps the conversation in-thread; ephemeral embed auto-deletes after 30s
Story generator via `/encounter generate`	Separates creative authoring from real-time inference — generator can use a stronger model later
VTT relay auto-spin-up	Lets the bot operate when the relay has been cold-stopped; uses RSA-OAEP for password handoff
In-world voice rule for player-facing strings	See `feedback-in-world-voice` — no utility/jargon in bot messages

9. Open Issues / Drift

Items the deep scan surfaced that aren't bugs but should be tracked:

Drift: Docs/mardonar-encounter-engine.md describes a Go bot with an embedded MCP layer; the actual code is TypeScript with an external JSON-RPC GraphMCP server. Treat the doc as historical/aspirational.
Drift: README.md's "Project Structure" tree references src/mcp/ and the old src/bot/commands/{dndname,encounter}.ts layout. Update README, or trim it to a pointer to the index.
Duplicate trimHistory logic in src/session/sessionManager.ts and src/harness/contextAssembler.ts (identical body). Could be extracted to src/lib/historyTrim.ts.
No production compose file — only docker-compose.dev.yml. The Dockerfile is production-ready but deployment is ad-hoc.
No CI/CD — .github/workflows/ does not exist.
DISCORD_ALLOWED_USERS is empty by default → anyone in allowed channels can run /encounter start. The access control is channel-scoped, not user-scoped; admins need to set the env var explicitly.
OLLAMA_BASE_URL defaults to localhost — fine for dev, but production needs the LAN IP or proxy URL set.
Spec tool list must be kept in sync — specs/*.yaml declare tools: [...], but no test verifies every referenced tool is registered. A stale spec name silently filters to no active tools.
Schema mismatch risk: types/index.ts EncounterSpec and spec/loader.ts Zod schema have diverged slightly — EncounterSpec is missing tone, tools, randomizable, and npcs.nameKey. assembleContext reads spec.tone; loader doesn't validate it. Consider regenerating types/index.ts from the Zod schema via z.infer.

Document generated by bmad-document-project initial scan, deep level. Project state recorded in docs/project-scan-report.json.

25 KiB Raw Blame History Unescape Escape