Refactor: google-image-generation skill with sub-files for imagen and nano-banana

2026-05-24 02:36:40 +00:00
parent 550349be2d
commit 6f9405ceee
4 changed files with 383 additions and 123 deletions
--- a/skills/creative/SKILL.md
+++ b/skills/creative/SKILL.md
@@ -1,123 +0,0 @@
---
-name: nano-banana
-description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
-version: 1.0.0
-author: Kay Kayyali + Hermes Agent
-license: MIT
-metadata:
-  hermes:
-    tags: [image-refinement, img2img, nano-banana, gemini]
-    category: creative
---
-
-# Nano Banana (Gemini Flash/Pro Image)
-
-**Powered by:** Google GenAI SDK (`@google/genai`)
-
-**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
-
-**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
-
-## Quick Start
-
-```bash
-# Set API key
-export GOOGLE_API_KEY="your-key-here"
-
-# Refine an existing image
-hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
-```
-
-## Models
-
-| Model | Best For | Speed |
-|-------|----------|-------|
-| `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
-| `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
-| `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |
-
-## Supported Parameters
-
- `--aspect-ratio` — `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
- `--image-size` — `512`, `1K`, `2K`, `4K` (resolution control)
- `--grounding` — `web`, `image`, or `both` (Google Search grounding)
- `--thinking-level` — `minimal` or `high`
- `--include-thoughts` — Show model reasoning steps
-
-## Core Workflows
-
-### 1. Image Refinement
-"Keep everything the same but change X"
-
-Examples:
- "Make the sky sunset orange"
- "Add more contrast and saturation"
- "Change the lighting to golden hour"
- "Remove the background clutter"
-
-### 2. img2img / Style Transfer
-Provide reference image(s) + text prompt:
- "Apply this art style to my scene"
- "Make it look like a 1950s poster"
- "Convert to watercolor painting style"
-
-### 3. Text-in-Image
-Add logos, posters, signs, menus:
- Keep text under 25 characters
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
- Use keywords: `poster`, `logo`, `magazine cover`, `menu`
- Include size: `small`, `medium`, `large`
- May need 2-3 iterations
-
-### 4. Conversational Editing
-Iterate naturally:
-1. "Generate a forest scene"
-2. "Make it mistier"
-3. "Add a stone altar in the center"
-4. "Now place a glowing sword on the altar"
-
-### 5. Multi-Reference Composition
-Up to 14 reference images for character consistency, scene composition, or style blending.
-
-### 6. Google Search Grounding
-Real-time data in images:
- `--grounding web` — Current events, weather, news
- `--grounding image` — Visual search results
- `--grounding both` — Combined search
-
-## Aspect Ratios
-
-| Hermes | Google |
-|--------|--------|
-| `landscape` | `16:9` |
-| `square` | `1:1` |
-| `portrait` | `9:16` |
-
-Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
-
-## Prompt Engineering
-
-**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"
-
-**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"
-
-**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"
-
-**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`
-
-## Limitations
-
- English prompts only (plus select languages)
- Max 480 tokens per prompt
- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
- All images include SynthID watermark
- No transparent backgrounds
- Text rendering may need 2-3 attempts
-
-## Switch to Google Imagen For
-
- Initial text-to-image generation from scratch
- When you need precise aspect ratio control from the start
- High-volume batch generation (faster, simpler)
-
-Use `google-imagen` skill for initial generation workflows.
--- a/skills/creative/google-image-generation/SKILL.md
+++ b/skills/creative/google-image-generation/SKILL.md
@@ -0,0 +1,106 @@
+---
+name: google-image-generation
+description: "Generate and refine images using Google Imagen 4.0 and Nano Banana"
+version: 4.0.0
+author: Kay Kayyali + Hermes Agent
+license: MIT
+metadata:
+  hermes:
+    tags: [image-generation, google-imagen, nano-banana, text-to-image, img2img]
+    category: creative
+---
+
+# Google Image Generation
+
+Two tools for image work. Pick the right one for your task.
+
+## Prerequisites
+
+**1. API Key** — Verify before starting:
+```bash
+grep "^GOOGLE_API_KEY=" ~/.hermes/.env || echo "Missing! Set with: echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env"
+```
+
+**2. Scripts Available** — Both CLI tools are installed:
+- `/usr/local/lib/hermes-agent/plugins/image_gen/google-imagen/imagen.ts` — Initial generation
+- `/usr/local/lib/hermes-agent/plugins/image_gen/nano-banana/nano-banana.ts` — Refinement
+
+If missing, the plugin wasn't installed correctly.
+
+## Which Tool to Use
+
+### Use **Imagen** for: Initial Generation
+- Text prompt → new image
+- Starting from scratch
+- Need precise aspect ratio control
+- Batch generation (multiple variants)
+
+→ See `imagen.md` for detailed usage
+
+### Use **Nano Banana** for: Refinement
+- "Keep everything but change X"
+- img2img / style transfer
+- Adding text/logos to existing images
+- Iterative conversational editing
+- Multi-reference composition (up to 14 images)
+
+→ See `nano-banana.md` for detailed usage
+
+## Quick Decision Tree
+
+```
+Do you have an existing image to work with?
+├─ NO → Use Imagen (generate from text)
+└─ YES → What do you want to do?
+   ├─ "Change this one thing" → Nano Banana
+   ├─ "Apply this style" → Nano Banana
+   ├─ "Add text/logo" → Nano Banana
+   └─ "Make a completely new one" → Imagen
+```
+
+## Common Workflows
+
+### Workflow 1: Generate → Refine
+1. Generate base image with Imagen at desired aspect ratio
+2. Switch to Nano Banana for refinements (preserves dimensions)
+3. Iterate conversationally: "make it darker", "add more contrast", etc.
+
+### Workflow 2: Style Transfer
+1. Generate or provide reference image with desired style
+2. Use Nano Banana: "Apply this art style to [my scene]"
+
+### Workflow 3: Text-in-Image
+1. Generate base with Imagen (clean composition)
+2. Use Nano Banana: "Add a sign that says 'X' in the corner"
+3. May need 2-3 iterations for clean text
+
+## Prompt Engineering (Both Tools)
+
+**Describe, don't list.** Narrative paragraphs beat keyword soup.
+
+**Bad:** `pixel art tank snow bleak cold`
+
+**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV tank sits motionless on the frozen Russian tundra under a grey sky. Snow drifts against its tracks. Orange glow from a dying campfire. Limited color palette — dark greys, muted blues, pale whites, one point of warm orange. No text.`
+
+## Gritty Dark Style (Iron Requiem)
+
+Kay prefers **dark, gritty, weathered** aesthetic:
+
+```
+dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
+dark moody lighting with deep shadows, mud-splattered, smoke rising,
+muted desaturated colors, dirty browns and rusted greys and oil blacks,
+grim war-torn atmosphere
+```
+
+## Limitations (Both)
+
+- English prompts only
+- Max 480 tokens per prompt
+- Person generation: `allow_adult` default
+- All images include SynthID watermark
+- No transparent backgrounds
+
+---
+
+**Next:** Load `imagen.md` or `nano-banana.md` for detailed script usage.
--- a/skills/creative/google-image-generation/imagen.md
+++ b/skills/creative/google-image-generation/imagen.md
@@ -0,0 +1,117 @@
+# Imagen 4.0 — Initial Image Generation
+
+**Use when:** Generating new images from text prompts (no existing image to work with).
+
+## CLI Usage
+
+```bash
+cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
+npx ts-node imagen.ts --prompt "YOUR PROMPT" --aspect-ratio 16:9 --output /path/to/output.png
+```
+
+## Parameters
+
+| Flag | Required | Default | Description |
+|------|----------|---------|-------------|
+| `--prompt` | **Yes** | — | Text description of the image |
+| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1` |
+| `--output` | No | `output.png` | Output file path |
+| `--sample-count` | No | `1` | Generate multiple variants (2-10) |
+| `--negative-prompt` | No | — | What to exclude from the image |
+| `--style-reference` | No | — | Path to reference image for style transfer |
+| `--person-generation` | No | `allow_adult` | `allow_adult` or `dont_allow` |
+
+## Examples
+
+### Basic Generation
+```bash
+npx ts-node imagen.ts \
+  --prompt "A pixel art scene: a weathered tank on frozen tundra, grey sky, orange campfire glow" \
+  --aspect-ratio 16:9 \
+  --output /tmp/tank-scene.png
+```
+
+### Multiple Variants
+```bash
+npx ts-node imagen.ts \
+  --prompt "A medieval castle on a cliff, dramatic lighting" \
+  --sample-count 4 \
+  --output /tmp/castle-variants.png
+```
+
+### With Negative Prompt
+```bash
+npx ts-node imagen.ts \
+  --prompt "A serene forest clearing with sunlight filtering through trees" \
+  --negative-prompt "people, animals, buildings, text, watermarks" \
+  --output /tmp/forest.png
+```
+
+### Style Reference
+```bash
+npx ts-node imagen.ts \
+  --prompt "A futuristic cityscape at dusk" \
+  --style-reference /path/to/style-reference.png \
+  --output /tmp/city.png
+```
+
+## Prompt Engineering
+
+### Structure
+1. **Medium/Style first:** `pixel art, 16-bit style`, `A photo of`, `A watercolor painting of`
+2. **Subject:** What's in the image
+3. **Setting/Context:** Where, when, atmosphere
+4. **Lighting/Color:** Mood, palette, time of day
+5. **Technical modifiers:** `no anti-aliasing`, `limited color palette`, `crisp edges`
+
+### Style Recipes
+
+**Pixel Art:**
+```
+pixel art, 16-bit style, [palette description], crisp clean edges,
+no anti-aliasing, limited color palette, [era] video game aesthetic, sprite art
+```
+
+**Photography:**
+```
+A photo of [subject], [lens type], [lighting], [camera angle], [detail/focus], [mood]
+```
+Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`, `macro lens`, `aerial shot`, `bokeh`
+
+**Illustration:**
+```
+A [art style] of [subject] in the style of [artist/movement], [medium]
+```
+Styles: `pencil sketch`, `charcoal drawing`, `watercolor`, `digital art`, `isometric 3D`, `art deco poster`
+
+### Gritty Dark Style (Iron Requiem)
+```
+dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
+dark moody lighting with deep shadows, mud-splattered, smoke rising,
+muted desaturated colors, dirty browns and rusted greys and oil blacks,
+grim war-torn atmosphere
+```
+
+## When to Switch to Nano Banana
+
+After generating with Imagen, switch to Nano Banana (`nano-banana.md`) when you want to:
+- Refine the image ("make it darker", "more contrast")
+- Add or modify specific elements
+- Add text/logos
+- Apply a different style
+- Iterate conversationally
+
+## Troubleshooting
+
+**Error: No candidates in response**
+- Prompt may be too vague or violate content policy
+- Try simplifying or rephrasing
+
+**Error: GOOGLE_API_KEY not set**
+- Run: `echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env`
+- Restart Hermes gateway: `hermes gateway restart`
+
+**Image quality issues**
+- Increase detail in prompt
+- Try different aspect ratio
+- Use `--sample-count 4` and pick the best variant
--- a/skills/creative/google-image-generation/nano-banana.md
+++ b/skills/creative/google-image-generation/nano-banana.md
@@ -0,0 +1,160 @@
+# Nano Banana — Image Refinement & img2img
+
+**Use when:** You have an existing image to refine, modify, or build upon.
+
+**⚠️ NOT for initial generation.** Requires at least one reference image.
+
+## CLI Usage
+
+```bash
+cd /usr/local/lib/hermes-agent/plugins/image_gen/nano-banana
+npx ts-node nano-banana.ts --prompt "YOUR PROMPT" --refine /path/to/image.png --output /path/to/output.png
+```
+
+## Parameters
+
+| Flag | Required | Default | Description |
+|------|----------|---------|-------------|
+| `--prompt` | **Yes** | — | What to change/add/do |
+| `--refine` | **Yes*** | — | Path to reference image (can use multiple times, max 14) |
+| `--output` | No | `output.png` | Output file path |
+| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc. |
+| `--image-size` | No | `1K` | Resolution: `512`, `1K`, `2K`, `4K` |
+| `--model` | No | `gemini-3.1-flash-image-preview` | Model to use |
+| `--grounding` | No | — | `web`, `image`, or `both` (Google Search grounding) |
+| `--thinking-level` | No | — | `minimal` or `high` |
+| `--include-thoughts` | No | — | Show model reasoning steps |
+
+*Required unless using only text prompt (not recommended — always provide at least one reference)
+
+## Models
+
+| Model | Speed | Best For |
+|-------|-------|----------|
+| `gemini-3.1-flash-image-preview` | ~5-15s | Default — refinement, img2img, text-in-image |
+| `gemini-3-pro-image-preview` | ~15-45s | Professional quality, complex text |
+| `gemini-2.5-flash-image` | ~3-10s | High-volume, fastest |
+
+## Examples
+
+### Basic Refinement
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Make the sky sunset orange, add more dramatic clouds" \
+  --refine /tmp/base-image.png \
+  --output /tmp/refined.png
+```
+
+### Multiple Reference Images
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Combine these characters into a group scene, same art style" \
+  --refine /tmp/char1.png \
+  --refine /tmp/char2.png \
+  --refine /tmp/char3.png \
+  --output /tmp/group.png
+```
+
+### Style Transfer
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Apply this watercolor style to my scene" \
+  --refine /tmp/my-scene.png \
+  --refine /tmp/watercolor-style.png \
+  --output /tmp/styled.png
+```
+
+### Add Text/Logo
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Add a wooden sign in the bottom corner that says 'TAVERN' in bold rustic letters" \
+  --refine /tmp/building.png \
+  --output /tmp/with-sign.png
+```
+
+### With Grounding (Real-time Data)
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Create an image showing today's weather in New York" \
+  --grounding web \
+  --output /tmp/weather.png
+```
+
+### High-Quality Output
+```bash
+npx ts-node nano-banana.ts \
+  --prompt "Refine this character portrait with professional quality" \
+  --refine /tmp/draft.png \
+  --model gemini-3-pro-image-preview \
+  --image-size 4K \
+  --output /tmp/final.png
+```
+
+## Core Workflows
+
+### 1. Iterative Refinement
+```
+1. Generate base with Imagen
+2. "Make it darker, more contrast" → Nano Banana
+3. "Add a character in the foreground" → Nano Banana (use previous output as ref)
+4. "Change the lighting to golden hour" → Nano Banana
+```
+
+### 2. Character Consistency
+Provide multiple reference images of the same character from different angles:
+```bash
+--refine char-front.png --refine char-side.png --refine char-back.png
+--prompt "Generate a new scene with this character from a 3/4 view"
+```
+
+### 3. Text-in-Image (Best Practices)
+- Keep text under 25 characters
+- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
+- Include context: `poster`, `logo`, `magazine cover`, `menu`
+- Specify size: `small`, `medium`, `large`
+- May need 2-3 iterations for clean text
+
+### 4. Google Search Grounding
+- `--grounding web` — Current events, weather, news, real-time data
+- `--grounding image` — Visual search results for composition
+- `--grounding both` — Combined search
+
+## Prompt Engineering for Refinement
+
+### Be Specific About Changes
+**Vague:** "Make it better"
+**Specific:** "Increase saturation by 20%, add rim lighting on the subject, deepen the shadows"
+
+### Preserve Context
+**Bad:** "A tank" (loses all previous detail)
+**Good:** "Keep the exact same tank and composition, but change the time of day to dusk with orange sky"
+
+### Semantic Negatives
+Describe what you **want**, not what you don't:
+- ✅ "an empty deserted street"
+- ❌ "no cars, no people"
+
+## When to Switch to Imagen
+
+Switch back to Imagen (`imagen.md`) when:
+- Starting a completely new image
+- Need precise aspect ratio from scratch
+- Batch generation (multiple variants of same prompt)
+- Nano Banana keeps losing important details
+
+## Troubleshooting
+
+**Text rendering is garbled**
+- Try shorter text (under 15 chars)
+- Specify font style explicitly
+- Use `gemini-3-pro-image-preview` for complex text
+- May need 2-3 iterations
+
+**Image loses important details**
+- Be more explicit: "Keep [X] exactly the same, only change [Y]"
+- Provide multiple reference images
+- Try `--thinking-level high`
+
+**Error: No candidates in response**
+- Ensure at least one valid `--refine` image is provided
+- Check prompt doesn't violate content policy