Files
lore-engine-poc/docs/SMOKE.md
kaykayyali fab33213de docs(v2): T1 — push v1 + open v2 milestone board
T1 deliverables for the v2 iteration:
- .gitignore — keep __pycache__, .smoke-*.log, and editor noise out
- docs/SMOKE.md — single source of truth for bringing up the v1 stack
  from a fresh clone. Documents the 5-command path, expected output at
  each step, the CI runner, and a troubleshooting table.
- scripts/ci-smoke.sh — CI smoke runner. Brings the stack up, waits for
  all four services healthy, polls /healthz, idempotently seeds (skip
  if data already present), runs test.sh. Exits 0 on pass. Supports
  --keep-up and --skip-build for dev iteration. Shell-only because the
  Gitea instance has no Actions runner wired up yet — the script is
  the same shape a future Actions step will wrap.

Gitea milestones T1-T9 are created on kaykayyali/lore-engine-poc via
'tea milestones create'. See docs/SMOKE.md for the milestone → task
mapping (T7, T8 are deferred per the T9 integration task's body).

Verification:
- ./scripts/ci-smoke.sh --skip-build --keep-up  → SMOKE PASSED
  (12 v1 tools registered, 11/11 test.sh sections green)
- tea milestones list → all 9 milestones present

Co-located with the v1 baseline commit already on wt/t1-gitea-push
(commit 8e8503e adds the T3 consistency-skeleton plugin on top of v1;
that is T3's work, surfaced here only because T1, T2, T3 share a
single working dir). The T1 deliverables are additive: SMOKE.md,
ci-smoke.sh, .gitignore.
2026-06-16 14:23:31 +00:00

8.8 KiB
Raw Permalink Blame History

SMOKE — bring up the stack from a fresh clone and prove it works

This document is the single source of truth for "does the v1 stack actually work end-to-end on a clean machine." It exists so v2 workers and CI can both hit a known-good bring-up path without rediscovering the incantation.

TL;DR (the 5 commands)

git clone https://git.homelab.local/kaykayyali/lore-engine-poc.git
cd lore-engine-poc
docker compose up -d --build
./scripts/ci-smoke.sh         # waits for health, runs test.sh, tears down

./scripts/ci-smoke.sh is the authoritative smoke runner. It exits 0 on success and non-zero with a clear error on the first failure. See scripts/ci-smoke.sh for the exact step ordering. This document explains what each step does and what the expected output looks like.

Prerequisites

  • Docker Engine 24+ with the compose plugin (docker compose version ≥ 2.20)
  • python3 available on the host (for seed.py and test.sh)
  • curl for the MCP JSON-RPC calls in test.sh
  • Outbound HTTPS to git.homelab.local (self-signed cert; clone URL is HTTPS so the Gitea homelab cert path works)
  • Ports 7474, 7687, 5432, 9000-9001, 8765 free on the host
  • ~2 GB of free disk for the Docker images (neo4j + postgres + minio + gateway)

Step-by-step (annotated)

1. Clone

git clone https://git.homelab.local/kaykayyali/lore-engine-poc.git
cd lore-engine-poc

Expected output: a lore-engine-poc/ directory with README.md, docker-compose.yml, seed.py, test.sh, gateway/, plugins/, neo4j/, postgres/, docs/, scripts/.

2. Build + start the stack

docker compose up -d --build

Expected output:

[+] Running 5/5
 ✔ Network lore-engine-poc_default      Created
 ✔ Volume lore-engine-poc_neo4j-data    Created
 ✔ Container lore-neo4j                 Started
 ✔ Container lore-postgres              Started
 ✔ Container lore-minio                Started
 ✔ Container lore-gateway               Started

What happens under the hood:

  • lore-neo4j runs neo4j:5.26-community with APOC enabled. The neo4j/init.cypher file is mounted at /var/lib/neo4j/import/ and loaded by seed.py on step 4 (not by the container itself — the container only exposes the Bolt port 7687).
  • lore-postgres runs postgres:16-alpine. postgres/init.sql defines the trade_log, image_manifest, and audit tables; it's loaded by seed.py on step 4.
  • lore-minio runs minio/minio:latest with bucket auto-create via the MINIO_BROWSER_REDIRECT_URL and MINIO_BUCKET=lore-images env.
  • lore-gateway is built locally from gateway/Dockerfile and runs python server.py on port 8765. It auto-loads every *.py file in plugins/ at startup.

Health check timing: neo4j takes ~15-25 s to become ready (initial APOC scan), postgres ~3-5 s, minio ~5 s, gateway ~2 s. The scripts/ci-smoke.sh runner waits for all four to report healthy before proceeding (uses docker compose ps + a 60 s deadline per service). On a slow first build, allow 2-3 min total.

3. Verify all four services are healthy

docker compose ps

Expected output (key column is STATUS):

NAME            IMAGE                       STATUS
lore-neo4j      neo4j:5.26-community        Up X minutes (healthy)
lore-postgres   postgres:16-alpine          Up X minutes (healthy)
lore-minio      minio/minio:latest          Up X minutes (healthy)
lore-gateway    lore-engine-poc-gateway     Up X minutes

lore-gateway has no healthcheck (it just answers HTTP); the scripts/ci-smoke.sh runner polls GET /healthz on the gateway instead (see gateway/server.py).

4. Seed the world

python3 seed.py

Expected output (last 5 lines):

  ✔ Seeded 4 images
  ✔ Seeded 1 lineage group
  ✔ Seeded ~20 time-bounded relations
  ✔ Done in <X.Xs>

✅ seed complete — bash test.sh is ready to run

seed.py is idempotent (uses Cypher MERGE and SQL ON CONFLICT). Re-running it is safe; counts will not double.

5. Run the end-to-end test

bash test.sh

Expected output: 11 sections, each printing a JSON response from the gateway's MCP endpoint. The last line is the green check:

✅ all tool types tested

test.sh exits 0 on success. The 11 sections, in order:

  1. entity_context(Aldric Raventhorne) — one-hop summary JSON
  2. was_true_at(House Vyr allied Merchants Guild @ 2nd_age.year_230)true
  3. was_true_at(Crimson Pact allied House Vyr @ 2nd_age.year_230)false
  4. state_at(Aldric Raventhorne @ 2nd_age.year_260) — state snapshot JSON
  5. ancestors_of(Aldric Raventhorne, 5 generations) — non-empty ancestors list
  6. lineage_of(Aldric Raventhorne) — lineage summary
  7. log_trade(...){"logged": true, "total_price": <computed>}
  8. market_price(pale_ledger){"item_id": "pale_ledger", "sample_size": ≥1, ...}
  9. recall_images(entity_id=aldric)image count: 1, presigned URL resolves to a real PNG (HTTP 200, image/png, 9106 bytes, 512×768 RGB)
  10. search_images_by_caption(q=aldric) — at least 1 match
  11. register_image(...){"registered": true, "image_id": "img_test"}

If any section fails, test.sh exits non-zero with the failing JSON response on stderr.

CI runner — scripts/ci-smoke.sh

The CI runner wraps steps 2-5 in a single script that:

  1. Runs docker compose up -d --build
  2. Polls docker compose ps until all four services are healthy (60 s deadline per service, fails loudly on timeout)
  3. Polls curl -sf http://localhost:8765/healthz until the gateway responds (the /healthz endpoint lists registered tools — its 200 OK + non-empty body proves the gateway auto-loaded plugins)
  4. Runs python3 seed.py
  5. Runs bash test.sh
  6. Exits 0 if all five stages passed, non-zero on the first failure

The script deliberately does NOT tear the stack down on failure — that makes post-mortem debugging easier. The caller (CI runner or developer) is responsible for docker compose down -v after inspecting the result.

Why a shell script, not a GitHub Actions YAML?

This repo is hosted on a self-hosted Gitea instance (git.homelab.local) without a Gitea Actions runner wired up yet. A pure-shell script is the smallest possible CI primitive — it runs identically on the developer's laptop, on a CI VM, and in a one-off bash invocation, with no extra moving parts. When Gitea Actions is configured for this repo, the script becomes a single - run: ./scripts/ci-smoke.sh step. See docs/ARCHITECTURE.md (TODO) for the eventual CI topology.

Tear down

docker compose down -v

-v removes the named volumes (lore-engine-poc_neo4j-data, lore-engine-poc_pg-data, lore-engine-poc_minio-data) so the next bring-up starts from a clean slate. Omit -v to keep state across restarts.

Troubleshooting

Symptom Likely cause Fix
lore-neo4j stuck in (health: starting) First-boot APOC scan; needs more RAM Wait 60 s. If still starting, check docker logs lore-neo4j for OOM. The compose heap max is 1g — bump NEO4J_server_memory_heap_max__size if your host has it.
seed.py fails on MERGE with constraint violation The schema was already initialized in a prior run with a conflicting constraint docker compose down -v and re-run from step 2
test.sh section 9 returns image count: 0 MinIO bucket not initialized docker exec lore-minio mc alias set local http://localhost:9000 lorelore <secret> then mc mb -p local/lore-images. Re-run seed.py.
test.sh section 2 returns false for the Vyr/Merchants alliance A prior run seeded a conflicting fact The seed is idempotent; if you mutated the data, docker compose down -v and reseed.
git clone fails with SSL certificate problem git.homelab.local uses a self-signed cert git config --global http.sslVerify false (dev only), or add the cert to your system trust store. The repo's HTTPS URL is intentional — the cert path is documented in the user's homelab setup.

What this smoke proves

After ./scripts/ci-smoke.sh exits 0, you've proven:

  • All four Docker images build from the committed docker-compose.yml and Dockerfiles
  • Neo4j accepts Bolt connections and the neo4j/init.cypher schema applies cleanly
  • Postgres accepts connections and the postgres/init.sql schema applies cleanly
  • MinIO starts and the lore-images bucket is reachable
  • The gateway starts, auto-loads all 4 plugins (world, lineage, trade, images), and serves MCP JSON-RPC on :8765
  • seed.py is idempotent and populates the expected graph + tables + bucket objects
  • Every one of the 11 tool invocations in test.sh returns a sane response

That is the v1 contract. If ./scripts/ci-smoke.sh is green, v2 work (T2 pgvector, T3 consistency skeleton, etc.) can build on top.