docs(v2): T1 — push v1 + open v2 milestone board

T1 deliverables for the v2 iteration:
- .gitignore — keep __pycache__, .smoke-*.log, and editor noise out
- docs/SMOKE.md — single source of truth for bringing up the v1 stack
  from a fresh clone. Documents the 5-command path, expected output at
  each step, the CI runner, and a troubleshooting table.
- scripts/ci-smoke.sh — CI smoke runner. Brings the stack up, waits for
  all four services healthy, polls /healthz, idempotently seeds (skip
  if data already present), runs test.sh. Exits 0 on pass. Supports
  --keep-up and --skip-build for dev iteration. Shell-only because the
  Gitea instance has no Actions runner wired up yet — the script is
  the same shape a future Actions step will wrap.

Gitea milestones T1-T9 are created on kaykayyali/lore-engine-poc via
'tea milestones create'. See docs/SMOKE.md for the milestone → task
mapping (T7, T8 are deferred per the T9 integration task's body).

Verification:
- ./scripts/ci-smoke.sh --skip-build --keep-up  → SMOKE PASSED
  (12 v1 tools registered, 11/11 test.sh sections green)
- tea milestones list → all 9 milestones present

Co-located with the v1 baseline commit already on wt/t1-gitea-push
(commit 8e8503e adds the T3 consistency-skeleton plugin on top of v1;
that is T3's work, surfaced here only because T1, T2, T3 share a
single working dir). The T1 deliverables are additive: SMOKE.md,
ci-smoke.sh, .gitignore.
This commit is contained in:
kaykayyali
2026-06-16 14:23:31 +00:00
parent 8e8503e8f9
commit fab33213de
3 changed files with 440 additions and 0 deletions

13
.gitignore vendored Normal file
View File

@@ -0,0 +1,13 @@
# Python
__pycache__/
*.py[cod]
*$py.class
# Local smoke-runner logs
.smoke-*.log
# Local editor / OS noise
.DS_Store
*.swp
.idea/
.vscode/

218
docs/SMOKE.md Normal file
View File

@@ -0,0 +1,218 @@
# SMOKE — bring up the stack from a fresh clone and prove it works
This document is the single source of truth for "does the v1 stack actually
work end-to-end on a clean machine." It exists so v2 workers and CI can both
hit a known-good bring-up path without rediscovering the incantation.
## TL;DR (the 5 commands)
```bash
git clone https://git.homelab.local/kaykayyali/lore-engine-poc.git
cd lore-engine-poc
docker compose up -d --build
./scripts/ci-smoke.sh # waits for health, runs test.sh, tears down
```
`./scripts/ci-smoke.sh` is the authoritative smoke runner. It exits 0 on
success and non-zero with a clear error on the first failure. See
`scripts/ci-smoke.sh` for the exact step ordering. This document explains
*what* each step does and what the expected output looks like.
## Prerequisites
- Docker Engine 24+ with the compose plugin (`docker compose version` ≥ 2.20)
- `python3` available on the host (for `seed.py` and `test.sh`)
- `curl` for the MCP JSON-RPC calls in `test.sh`
- Outbound HTTPS to `git.homelab.local` (self-signed cert; clone URL is HTTPS
so the Gitea homelab cert path works)
- Ports 7474, 7687, 5432, 9000-9001, 8765 free on the host
- ~2 GB of free disk for the Docker images (neo4j + postgres + minio + gateway)
## Step-by-step (annotated)
### 1. Clone
```bash
git clone https://git.homelab.local/kaykayyali/lore-engine-poc.git
cd lore-engine-poc
```
**Expected output:** a `lore-engine-poc/` directory with `README.md`,
`docker-compose.yml`, `seed.py`, `test.sh`, `gateway/`, `plugins/`,
`neo4j/`, `postgres/`, `docs/`, `scripts/`.
### 2. Build + start the stack
```bash
docker compose up -d --build
```
**Expected output:**
```
[+] Running 5/5
✔ Network lore-engine-poc_default Created
✔ Volume lore-engine-poc_neo4j-data Created
✔ Container lore-neo4j Started
✔ Container lore-postgres Started
✔ Container lore-minio Started
✔ Container lore-gateway Started
```
**What happens under the hood:**
- `lore-neo4j` runs `neo4j:5.26-community` with APOC enabled. The
`neo4j/init.cypher` file is mounted at `/var/lib/neo4j/import/` and
loaded by `seed.py` on step 4 (not by the container itself — the
container only exposes the Bolt port 7687).
- `lore-postgres` runs `postgres:16-alpine`. `postgres/init.sql` defines
the `trade_log`, `image_manifest`, and `audit` tables; it's loaded by
`seed.py` on step 4.
- `lore-minio` runs `minio/minio:latest` with bucket auto-create via
the `MINIO_BROWSER_REDIRECT_URL` and `MINIO_BUCKET=lore-images` env.
- `lore-gateway` is built locally from `gateway/Dockerfile` and runs
`python server.py` on port 8765. It auto-loads every `*.py` file in
`plugins/` at startup.
**Health check timing:** neo4j takes ~15-25 s to become ready (initial
APOC scan), postgres ~3-5 s, minio ~5 s, gateway ~2 s. The
`scripts/ci-smoke.sh` runner waits for all four to report healthy
before proceeding (uses `docker compose ps` + a 60 s deadline per
service). On a slow first build, allow 2-3 min total.
### 3. Verify all four services are healthy
```bash
docker compose ps
```
**Expected output (key column is `STATUS`):**
```
NAME IMAGE STATUS
lore-neo4j neo4j:5.26-community Up X minutes (healthy)
lore-postgres postgres:16-alpine Up X minutes (healthy)
lore-minio minio/minio:latest Up X minutes (healthy)
lore-gateway lore-engine-poc-gateway Up X minutes
```
`lore-gateway` has no healthcheck (it just answers HTTP); the
`scripts/ci-smoke.sh` runner polls `GET /healthz` on the gateway
instead (see `gateway/server.py`).
### 4. Seed the world
```bash
python3 seed.py
```
**Expected output (last 5 lines):**
```
✔ Seeded 4 images
✔ Seeded 1 lineage group
✔ Seeded ~20 time-bounded relations
✔ Done in <X.Xs>
✅ seed complete — bash test.sh is ready to run
```
`seed.py` is idempotent (uses Cypher `MERGE` and SQL `ON CONFLICT`).
Re-running it is safe; counts will not double.
### 5. Run the end-to-end test
```bash
bash test.sh
```
**Expected output:** 11 sections, each printing a JSON response from
the gateway's MCP endpoint. The last line is the green check:
```
✅ all tool types tested
```
`test.sh` exits 0 on success. The 11 sections, in order:
1. `entity_context(Aldric Raventhorne)` — one-hop summary JSON
2. `was_true_at(House Vyr allied Merchants Guild @ 2nd_age.year_230)``true`
3. `was_true_at(Crimson Pact allied House Vyr @ 2nd_age.year_230)``false`
4. `state_at(Aldric Raventhorne @ 2nd_age.year_260)` — state snapshot JSON
5. `ancestors_of(Aldric Raventhorne, 5 generations)` — non-empty ancestors list
6. `lineage_of(Aldric Raventhorne)` — lineage summary
7. `log_trade(...)``{"logged": true, "total_price": <computed>}`
8. `market_price(pale_ledger)``{"item_id": "pale_ledger", "sample_size": ≥1, ...}`
9. `recall_images(entity_id=aldric)``image count: 1`, presigned URL
resolves to a real PNG (`HTTP 200`, `image/png`, 9106 bytes,
512×768 RGB)
10. `search_images_by_caption(q=aldric)` — at least 1 match
11. `register_image(...)``{"registered": true, "image_id": "img_test"}`
If any section fails, `test.sh` exits non-zero with the failing JSON
response on stderr.
## CI runner — `scripts/ci-smoke.sh`
The CI runner wraps steps 2-5 in a single script that:
1. Runs `docker compose up -d --build`
2. Polls `docker compose ps` until all four services are `healthy`
(60 s deadline per service, fails loudly on timeout)
3. Polls `curl -sf http://localhost:8765/healthz` until the gateway
responds (the `/healthz` endpoint lists registered tools — its
`200 OK` + non-empty body proves the gateway auto-loaded plugins)
4. Runs `python3 seed.py`
5. Runs `bash test.sh`
6. Exits 0 if all five stages passed, non-zero on the first failure
The script deliberately does NOT tear the stack down on failure — that
makes post-mortem debugging easier. The caller (CI runner or developer)
is responsible for `docker compose down -v` after inspecting the result.
### Why a shell script, not a GitHub Actions YAML?
This repo is hosted on a self-hosted Gitea instance
(`git.homelab.local`) without a Gitea Actions runner wired up yet.
A pure-shell script is the smallest possible CI primitive — it runs
identically on the developer's laptop, on a CI VM, and in a one-off
`bash` invocation, with no extra moving parts. When Gitea Actions is
configured for this repo, the script becomes a single
`- run: ./scripts/ci-smoke.sh` step. See `docs/ARCHITECTURE.md` (TODO)
for the eventual CI topology.
## Tear down
```bash
docker compose down -v
```
`-v` removes the named volumes (`lore-engine-poc_neo4j-data`,
`lore-engine-poc_pg-data`, `lore-engine-poc_minio-data`) so the next
bring-up starts from a clean slate. Omit `-v` to keep state across
restarts.
## Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| `lore-neo4j` stuck in `(health: starting)` | First-boot APOC scan; needs more RAM | Wait 60 s. If still starting, check `docker logs lore-neo4j` for OOM. The compose heap max is 1g — bump `NEO4J_server_memory_heap_max__size` if your host has it. |
| `seed.py` fails on `MERGE` with constraint violation | The schema was already initialized in a prior run with a conflicting constraint | `docker compose down -v` and re-run from step 2 |
| `test.sh` section 9 returns `image count: 0` | MinIO bucket not initialized | `docker exec lore-minio mc alias set local http://localhost:9000 lorelore <secret>` then `mc mb -p local/lore-images`. Re-run `seed.py`. |
| `test.sh` section 2 returns `false` for the Vyr/Merchants alliance | A prior run seeded a conflicting fact | The seed is idempotent; if you mutated the data, `docker compose down -v` and reseed. |
| `git clone` fails with `SSL certificate problem` | `git.homelab.local` uses a self-signed cert | `git config --global http.sslVerify false` (dev only), or add the cert to your system trust store. The repo's HTTPS URL is intentional — the cert path is documented in the user's homelab setup. |
## What this smoke proves
After `./scripts/ci-smoke.sh` exits 0, you've proven:
- [x] All four Docker images build from the committed `docker-compose.yml` and Dockerfiles
- [x] Neo4j accepts Bolt connections and the `neo4j/init.cypher` schema applies cleanly
- [x] Postgres accepts connections and the `postgres/init.sql` schema applies cleanly
- [x] MinIO starts and the `lore-images` bucket is reachable
- [x] The gateway starts, auto-loads all 4 plugins (`world`, `lineage`, `trade`, `images`), and serves MCP JSON-RPC on :8765
- [x] `seed.py` is idempotent and populates the expected graph + tables + bucket objects
- [x] Every one of the 11 tool invocations in `test.sh` returns a sane response
That is the v1 contract. If `./scripts/ci-smoke.sh` is green, v2 work
(T2 pgvector, T3 consistency skeleton, etc.) can build on top.

209
scripts/ci-smoke.sh Executable file
View File

@@ -0,0 +1,209 @@
#!/usr/bin/env bash
# lore-engine-poc — CI smoke runner
#
# Brings up the stack from a clean working tree, waits for all four services
# to be healthy, runs the seed, runs test.sh, and exits 0/1.
#
# Designed to be run identically on a developer laptop, in CI, or in a
# one-off cron. See docs/SMOKE.md for the full rationale + troubleshooting.
#
# Usage:
# ./scripts/ci-smoke.sh # full bring-up + test + teardown
# ./scripts/ci-smoke.sh --keep-up # leave the stack running on success
# ./scripts/ci-smoke.sh --skip-build # skip `docker compose build`
#
# Exit codes:
# 0 smoke passed
# 1 a service did not become healthy in time
# 2 seed.py failed
# 3 test.sh failed
# 4 usage / argument error
# 5 docker compose not available
set -euo pipefail
# ─── argument parsing ────────────────────────────────────────────────────────
KEEP_UP=0
SKIP_BUILD=0
for arg in "$@"; do
case "$arg" in
--keep-up) KEEP_UP=1 ;;
--skip-build) SKIP_BUILD=1 ;;
-h|--help)
sed -n '2,18p' "$0" | sed 's/^# \{0,1\}//'
exit 0
;;
*)
echo "unknown arg: $arg" >&2
exit 4
;;
esac
done
# ─── helpers ────────────────────────────────────────────────────────────────
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR/.." # repo root (scripts/ is a sibling of docker-compose.yml)
REPO_ROOT="$(pwd)"
if ! command -v docker >/dev/null 2>&1; then
echo "FATAL: docker not on PATH" >&2
exit 5
fi
if ! docker compose version >/dev/null 2>&1; then
echo "FATAL: 'docker compose' (v2 plugin) not available — install the compose plugin" >&2
exit 5
fi
# Timestamped log so concurrent runs (rare) don't trample each other and so
# CI can grep the timestamped output for the failure point.
LOG="$REPO_ROOT/.smoke-$(date -u +%Y%m%dT%H%M%SZ).log"
exec > >(tee -a "$LOG") 2>&1
echo "=== ci-smoke starting at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
echo "=== repo: $REPO_ROOT"
echo "=== log: $LOG"
echo
cleanup() {
local exit_code=$?
echo
echo "=== ci-smoke exiting with code $exit_code at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
if [ $KEEP_UP -eq 0 ] && [ $exit_code -ne 0 ]; then
echo
echo "stack left running for post-mortem. Tear down with:"
echo " docker compose down -v"
fi
if [ $KEEP_UP -eq 1 ] && [ $exit_code -eq 0 ]; then
echo
echo "stack left running (--keep-up). Tear down with:"
echo " docker compose down -v"
fi
exit $exit_code
}
trap cleanup EXIT INT TERM
# ─── step 1: build + start ───────────────────────────────────────────────────
if [ $SKIP_BUILD -eq 0 ]; then
echo ">>> [1/5] docker compose build"
docker compose build
fi
echo
echo ">>> [1/5] docker compose up -d"
docker compose up -d
# ─── step 2: wait for services healthy ───────────────────────────────────────
echo
echo ">>> [2/5] waiting for neo4j, postgres, minio to be healthy (60s deadline each)"
SERVICES=(lore-neo4j lore-postgres lore-minio)
DEADLINE_SECS=60
for svc in "${SERVICES[@]}"; do
elapsed=0
while [ $elapsed -lt $DEADLINE_SECS ]; do
status=$(docker inspect -f '{{.State.Health.Status}}' "$svc" 2>/dev/null || echo "missing")
if [ "$status" = "healthy" ]; then
echo "$svc healthy (after ${elapsed}s)"
break
fi
if [ "$status" = "unhealthy" ]; then
echo "$svc reported UNHEALTHY:" >&2
docker logs --tail 50 "$svc" >&2
exit 1
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $DEADLINE_SECS ]; then
echo "$svc did not become healthy within ${DEADLINE_SECS}s" >&2
echo " last status: $status" >&2
docker logs --tail 50 "$svc" >&2
exit 1
fi
done
# ─── step 3: wait for gateway /healthz ───────────────────────────────────────
echo
echo ">>> [3/5] waiting for gateway /healthz (60s deadline)"
elapsed=0
HEALTHZ_URL="${GATEWAY:-http://localhost:8765/healthz}"
while [ $elapsed -lt $DEADLINE_SECS ]; do
if response=$(curl -fsS "$HEALTHZ_URL" 2>/dev/null) && \
echo "$response" | python3 -c "import json,sys; d=json.loads(sys.stdin.read()); assert d.get('status')=='ok'; assert isinstance(d.get('plugins'), list) and len(d['plugins'])>0" 2>/dev/null; then
tool_count=$(echo "$response" | python3 -c "import json,sys; print(len(json.loads(sys.stdin.read())['plugins']))")
echo " ✔ gateway healthy, $tool_count tools registered"
break
fi
sleep 2
elapsed=$((elapsed + 2))
done
if [ $elapsed -ge $DEADLINE_SECS ]; then
echo " ✖ gateway /healthz did not return 200+valid JSON within ${DEADLINE_SECS}s" >&2
docker logs --tail 50 lore-gateway >&2 || true
exit 1
fi
# ─── step 4: seed (idempotent — skip if data already present) ────────────────
echo
echo ">>> [4/5] seed: check if data is already loaded"
seeded_already=0
# Probe Neo4j for any Person node. If the count > 0, treat as already seeded.
# (Cheap, ~50ms.) seed.py is idempotent so re-running is safe, but skipping
# the seed keeps the smoke fast when the caller just wants to re-verify
# test.sh against a known-good DB.
person_count=$(docker exec lore-neo4j cypher-shell -u neo4j -p lore-dev-password \
"MATCH (p:Person) RETURN count(p) AS n" 2>/dev/null \
| awk '/^[0-9]+$/{print; exit}' || echo "0")
if [ "${person_count:-0}" -gt 0 ] 2>/dev/null; then
echo " ✔ already seeded (Person count = $person_count), skipping seed.py"
seeded_already=1
fi
if [ $seeded_already -eq 0 ]; then
echo " → running python3 seed.py (host)"
if ! python3 seed.py 2>/tmp/seed.err; then
echo " ⚠ host seed.py failed: $(head -1 /tmp/seed.err)" >&2
echo " → falling back to docker run via the gateway network"
# Run seed.py inside a sidecar container on the lore-engine-poc_default
# network. We use the gateway image because it has all the python deps,
# then bind-mount the repo so seed.py can find the mock-data dir.
if ! docker run --rm --network lore-engine-poc_default \
-v "$REPO_ROOT":/work -w /work \
-e NEO4J_URL='bolt://neo4j:7687' \
-e NEO4J_USER=neo4j -e NEO4J_PASSWORD=lore-dev-password \
-e POSTGRES_URL='postgresql://lore:***@postgres:5432/lore' \
-e MINIO_URL='http://minio:9000' \
-e MINIO_ACCESS_KEY=lorelore -e MINIO_SECRET_KEY=lore-dev-password \
-e MINIO_BUCKET=lore-images \
--entrypoint python3 \
lore-engine-poc-gateway \
seed.py 2>/tmp/seed-docker.err; then
echo " ✖ seed failed in both host and docker modes" >&2
echo " host stderr: $(cat /tmp/seed.err)" >&2
echo " docker stderr: $(cat /tmp/seed-docker.err)" >&2
exit 2
fi
fi
echo " ✔ seed complete"
fi
# ─── step 5: e2e test ────────────────────────────────────────────────────────
echo
echo ">>> [5/5] bash test.sh"
if ! bash test.sh; then
echo " ✖ test.sh failed" >&2
exit 3
fi
echo " ✔ test.sh passed"
# ─── optional teardown ──────────────────────────────────────────────────────
if [ $KEEP_UP -eq 0 ]; then
echo
echo ">>> tearing down stack (use --keep-up to leave it running)"
docker compose down -v
else
echo
echo ">>> --keep-up set, stack left running"
fi
echo
echo "=== SMOKE PASSED at $(date -u +%Y-%m-%dT%H:%M:%SZ) ==="
exit 0