Expands the unit test suite from 320 to 380 tests (+60) and adds a
Gitea Actions CI workflow. Closes all six follow-up recommendations
from the test-architecture validation report.
New tests (tests/unit/):
- ollamaClient.test.ts — Ollama SDK wrapper, options passthrough
- litellmClient.test.ts — OpenAI SDK wrapper, model fallback
- personaLoader.test.ts — Zod validation + cache invalidation
- foundryReward.test.ts — Tool plugin: lookup, errors, partial grants
- xpAwarder.test.ts — Bulk XP awards + per-player skip reasons
- redisErrorPath.test.ts — Singleton error handler does not crash
- messageRouterRunLLMTurn.test.ts — 18 cases for the runtime heart:
narrative-only path, tool dispatch, filter correction, retry loop
guard, missed-skill-check heuristic, typing indicator interval,
LLM error fallback, archive on resolve.
Coverage (line %):
- harness/litellmClient.ts 0 → 100
- harness/ollamaClient.ts 0 → 100
- harness/tools/foundryReward.ts 0 → 100
- session/xpAwarder.ts 0 → 100
- persona/loader.ts 0 → 100
- db/redis.ts 0 → 100
- bot/handlers/messageRouter.ts 0 → 39.86 (runLLMTurn now covered)
Tooling:
- package.json: + test:coverage, test:watch scripts
- devDep: @vitest/coverage-v8@^3.1.0
- tests/README.md: conventions, anti-patterns, template map
- .gitignore: exclude coverage/
- .gitea/workflows/test.yml: Node 22, npm cache, tsc --noEmit gate
Documentation (from earlier /bmad-document-project run, now committed):
- docs/index.md
- docs/project-overview.md
- docs/architecture.md
- docs/deployment-guide.md
- docs/api-contracts.md
- docs/data-models.md
- docs/source-tree-analysis.md
- docs/component-inventory.md
- docs/development-guide.md
- _bmad-output/test-artifacts/automate-validation-report.md
Co-Authored-By: Claude <noreply@anthropic.com>
7.5 KiB
Deployment Guide
Deploying the Mardonar Encounter Engine. Generated 2026-06-19.
Architecture
The bot is a single long-running Node.js process. It connects to:
- Discord over WebSocket (discord.js v14)
- Redis for session and player/character registries
- GraphMCP (HTTP JSON-RPC) for NPC memory, lore search, and encounter log writes
- LiteLLM (preferred) or Ollama for LLM inference
- VTT relay (optional) for Foundry VTT integration
The Dockerfile is multi-stage Node 22 alpine. There is currently no production docker-compose.yml — only the dev one (docker-compose.dev.yml). Production deploys use the Dockerfile directly with whatever orchestrator is in use.
Build
npm ci --ignore-scripts
npm run build # tsc → dist/
The build is reproducible from a clean node_modules. The Dockerfile's builder stage does exactly this.
Container image
Dockerfile:
- Builder (
node:22-alpine):npm ci --ignore-scripts, copysrc+tsconfig.json, runnpm run build - Runtime (
node:22-alpine):npm ci --omit=dev --ignore-scripts, copydist/,specs/,lore/,persona.yaml - CMD:
["node", "dist/bot/index.js"]
To build locally:
docker build -t mardonar-bot:latest .
The data/ directory is not copied into the image — it must be mounted as a volume in production so tally and summaries persist across restarts.
Local dev (Docker Compose)
docker-compose.dev.yml is the only compose file in the repo. It declares the mardonar-internal Docker network as external: true — it expects the GraphMCP-Example stack (Redis + MCP server) to be running first.
docker compose -f docker-compose.dev.yml up -d
docker compose -f docker-compose.dev.yml logs -f bot
Two services:
deploy-commands— one-shot container that runsnode dist/scripts/deploy-commands.js.restart: "no".bot— long-running container.restart: unless-stopped. Mounts./data:/app/dataso tally and summaries persist.depends_on: deploy-commands: service_completed_successfullyensures commands are registered before the bot starts serving traffic.
Production deployment
There is no production compose file. Pick one:
Option A: Plain Docker
docker build -t mardonar-bot:latest .
docker run -d \
--name mardonar-bot \
--restart unless-stopped \
--env-file .env \
-v /var/lib/mardonar/data:/app/data \
--network mardonar-internal \
mardonar-bot:latest
Register commands once before the bot serves traffic (either via the deploy-commands service or by running the same image with a different command):
docker run --rm \
--env-file .env \
--network mardonar-internal \
mardonar-bot:latest \
node dist/scripts/deploy-commands.js
Option B: systemd (Linux host)
# /etc/systemd/system/mardonar-bot.service
[Unit]
Description=Mardonar Encounter Engine
After=network.target redis-server.service
[Service]
Type=simple
User=mardonar
WorkingDirectory=/opt/mardonar
EnvironmentFile=/opt/mardonar/.env
ExecStart=/usr/bin/node /opt/mardonar/dist/bot/index.js
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload
sudo systemctl enable --now mardonar-bot
sudo journalctl -u mardonar-bot -f
Environment
All runtime configuration is via environment variables, validated by Zod (src/config.ts). The full list is in development-guide.md.
Production essentials:
DISCORD_TOKEN=...
DISCORD_CLIENT_ID=...
DISCORD_GUILD_ID=... # instant command registration
# Network isolation: only respond in specific channels
DISCORD_ALLOWED_CHANNELS=123456789012345678,987654321098765432
# User restriction: only allow specific users to run /encounter
DISCORD_ALLOWED_USERS=111111111111111111
# LiteLLM (preferred)
LITELLM_BASE_URL=http://your-litellm-host:4000
LITELLM_API_KEY=...
LITELLM_MODEL=ollama-cloud
# Ollama fallback
OLLAMA_BASE_URL=http://your-ollama-host:11434
OLLAMA_MODEL=gemma4-it:e2b
# GraphMCP (must be reachable)
GRAPHMCP_URL=http://mcp-server:9000
GRAPHMCP_SCORE_THRESHOLD=0.68
GRAPHMCP_INGEST_STREAM=raw.messages
# Persisted state
DATA_DIR=/app/data # or wherever you mount the volume
# Logging
LOG_LEVEL=info
⚠ Security note:
DISCORD_ALLOWED_CHANNELSis empty by default, which means the bot will respond in no channels. This is secure-by-default but easy to misconfigure. Set it explicitly.
Persistent state
Two kinds of state to back up:
data/tally.json— per-spec run counts. Useful for analytics, not load-bearing.data/summaries/— one.txtper resolved encounter. Permanent record.
Session state lives in Redis with a 12h TTL. If Redis is wiped, in-flight sessions are lost but Discord threads themselves remain — the bot will simply not find a session for that thread on next message. No data corruption risk.
Health checks
The bot does not currently expose an HTTP health endpoint. Suggested liveness probe patterns:
- Discord WebSocket liveness — the bot logs
[bot] Logged in as <tag>on ready. Scrape stdout for this. - Redis — already externally monitored. The bot logs
[redis] connection erroron failure. - GraphMCP — first call after startup will fail loudly if unreachable.
- Custom probe — call
/encounter statusin a known thread and check the response (the bot only responds inDISCORD_ALLOWED_CHANNELS).
A simple docker healthcheck using Discord WebSocket isn't trivially scriptable. If you need an HTTP probe, add a small Express server in a future iteration that responds 200 while the Discord client is ready and Redis is connected.
Logging
The bot uses pino. In dev, pino-pretty formats to a human-readable stream. In prod, pino emits structured JSON to stdout — pipe to your log shipper (Loki, CloudWatch, etc.).
Useful fields to index:
level,time,msgthreadId,encounterId(for encounter-specific queries)latencyMs(for LLM and tool latency)error(for failure analysis)
Operational runbook
Restart the bot
docker restart mardonar-bot
# or: systemctl restart mardonar-bot
Rotate the Discord token
- Generate a new token in the Discord developer portal
- Update the env var (or secret store)
- Restart the bot
- Old token is invalidated immediately
Re-register slash commands
After changing any src/bot/commands/*.ts:
docker run --rm --env-file .env --network mardonar-internal mardonar-bot:latest \
node dist/scripts/deploy-commands.js
Or in dev: npm run deploy-commands
Reset a stuck session
A bot restart clears all in-memory state (including reaction managers and burst counters). Redis session state persists. If a session is genuinely stuck (e.g. a tool dispatched but the response was lost), use /encounter end in-thread to force-resolve.
Drain Redis (nuclear option)
docker exec -it <redis-container> redis-cli FLUSHDB
Open deployment gaps
These are real but not blockers:
- No production compose file — only
docker-compose.dev.yml. Production deploy is ad-hoc. - No CI/CD — no
.github/workflows/. Build and deploy are manual. - No health endpoint — no HTTP probe target.
- No metrics export — pino logs are the only observability surface.
docker-compose.dev.ymlreferences an external Docker network (mardonar-internal) — fine for the dev stack it's designed for, but a fresh deployment needs to either join the same network or remove the reference.