Refactor: split google-image-gen-prompting into google-imagen and nano-banana skills

2026-05-24 02:28:53 +00:00
parent c490570d6e
commit 550349be2d
1 changed files with 71 additions and 293 deletions
--- a/skills/creative/SKILL.md
+++ b/skills/creative/SKILL.md
@@ -1,345 +1,123 @@
 ---
-name: google-image-gen-prompting
-description: "Generate or refine images in any artstyle, and in multiple formats"
-version: 3.1.0
+name: nano-banana
+description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
+version: 1.0.0
 author: Kay Kayyali + Hermes Agent
 license: MIT
 metadata:
  hermes:
-    tags: [image-generation, google-imagen, nano-banana, prompt-engineering, pixel-art, two-plugin-workflow, google-genai-sdk]
+    tags: [image-refinement, img2img, nano-banana, gemini]
    category: creative
 ---

-# Google Image Generation Prompting Guide
+# Nano Banana (Gemini Flash/Pro Image)

-**Powered by:** Google GenAI SDK (`@google/genai`) — TypeScript CLI handles all API calls.
+**Powered by:** Google GenAI SDK (`@google/genai`)

-## Quick Setup & Verification
+**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
+
+**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
+
+## Quick Start

-**1. Set API key** (if not already in `~/.hermes/.env`):
 ```bash
+# Set API key
 export GOOGLE_API_KEY="your-key-here"
-# Or add to ~/.hermes/.env: GOOGLE_API_KEY=your-key-here
+
+# Refine an existing image
+hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
 ```

-**2. Verify setup** — run a smoke test:
-```bash
-# Using the image_generate tool (recommended):
-hermes chat -q "Generate a test image: a red cube on white background"
+## Models

-# Or direct TypeScript CLI test:
-cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
-npx ts-node google-image-gen.ts --imagen --prompt "test" --output /tmp/test.png
-```
-
-**3. If skill doesn't appear in `hermes skills list`**:
-```bash
-# Reload skills from disk:
-hermes chat -q "/reload-skills"
-# Or restart the gateway: hermes gateway restart
-```
-
-Best practices for generating high-quality images with Google Imagen 4
-and Nano Banana (Gemini Flash/Pro Image) models. These models are accessed
-via the `image_generate` tool configured with `image_gen.provider: google`.
-
-## Two-Plugin Workflow
-
-**⚠️ Critical:** `nano-banana` is **NOT** for initial text-to-image generation. It is an **image refinement** tool only.
-
-**Use `google-imagen` for:** Initial text-to-image generation. This is your primary image gen plugin.
- Text prompts → images
- Best for: getting a base image from scratch
- Model: `imagen-4.0-generate-001`
- Supports: `--aspect-ratio` (1:1, 16:9, 9:16, 4:3, 3:2, etc.), `--sample-count`, `--negative-prompt`, `--style-reference`
-
-**Use `nano-banana` for:** Image refinement, img2img, and text-in-image. **Requires an existing image to refine.**
- "Keep everything the same but change X"
- Style transfer from a reference image
- Adding text/logos to images
- Iterative conversational editing
- Models: `gemini-3.1-flash-image-preview` or `gemini-3-pro-image-preview`
- Supports: `--aspect-ratio`, `--image-size` (512, 1K, 2K, 4K), `--grounding`, `--thinking-level`, `--include-thoughts`
-
-**Typical workflow:**
-1. Generate base image with `google-imagen` at your desired aspect ratio
-2. Switch to `nano-banana` for refinements (keeps same dimensions)
-3. Switch back to `google-imagen` for new generations
-
-**Config:**
-```yaml
-image_gen:
-  provider: google-imagen  # or: nano-banana
-```
-
-**Switch providers:**
-```bash
-# For initial generation:
-hermes config set image_gen.provider google-imagen
-
-# For refinement:
-hermes config set image_gen.provider nano-banana
-
-# Or one-off via env:
-GOOGLE_IMAGE_PROVIDER=nano-banana image_generate --prompt "refine this..."
-```
-
-## Quick Reference
-
-### google-imagen (Imagen 4.0)
-| Model | Best For | Speed |
-|-------|----------|-------|
-| `imagen-4.0-generate-001` | Photorealism, high detail, text in images | ~5-15s |
-
-### nano-banana (Gemini Flash/Pro Image)
 | Model | Best For | Speed |
 |-------|----------|-------|
 | `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
 | `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
 | `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |

-## Core Principle: Describe, Don't List
+## Supported Parameters

-A narrative paragraph beats keyword soup every time. These models excel
-at language understanding. Example:
+- `--aspect-ratio` — `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
+- `--image-size` — `512`, `1K`, `2K`, `4K` (resolution control)
+- `--grounding` — `web`, `image`, or `both` (Google Search grounding)
+- `--thinking-level` — `minimal` or `high`
+- `--include-thoughts` — Show model reasoning steps

-**Bad:** `pixel art tank snow bleak cold`
+## Core Workflows

-**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV
-tank sits motionless on the frozen Russian tundra under a grey sky. Snow
-drifts against its tracks. Orange glow from a dying campfire. Limited
-color palette — dark greys, muted blues, pale whites, one point of warm
-orange. No text.`
+### 1. Image Refinement
+"Keep everything the same but change X"

-## Style Recipes
+Examples:
+- "Make the sky sunset orange"
+- "Add more contrast and saturation"
+- "Change the lighting to golden hour"
+- "Remove the background clutter"

-### Pixel Art
-For Iron Requiem and other pixel art games:
+### 2. img2img / Style Transfer
+Provide reference image(s) + text prompt:
+- "Apply this art style to my scene"
+- "Make it look like a 1950s poster"
+- "Convert to watercolor painting style"

-```
-pixel art, 16-bit style, [specific palette description], crisp clean
-edges, no anti-aliasing, limited color palette, [era] video game
-aesthetic, sprite art
-```
-
-Key modifiers:
- `no anti-aliasing` — keeps hard edges
- `limited color palette` — enforces pixel look
- `N-bit style` (8-bit, 16-bit, 32-bit) — era control
- `sprite art` — character/enemy focus
- `tile-based` — background emphasis
-
-### Photography
-```
-A photo of [subject], [lens type], [lighting], [camera angle],
-[detail/focus], [mood], [orientation]
-```
-
-Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`,
-`macro lens`, `aerial shot`, `fisheye`, `motion blur`, `bokeh`,
-`black and white`, `polaroid`
-
-### Illustration & Art
-```
-A [art style] of [subject] in the style of [artist/movement], [medium]
-```
-
-Styles: `pencil sketch`, `charcoal drawing`, `pastel painting`,
-`watercolor`, `digital art`, `isometric 3D`, `art deco poster`,
-`impressionist painting`, `renaissance painting`
-
-### Product Mockups
-```
-A studio photograph of [product], [material], on [surface].
-[Lighting setup]. [Camera angle]. [Background]
-```
-
-### Text in Images (Nano Banana)
+### 3. Text-in-Image
+Add logos, posters, signs, menus:
 - Keep text under 25 characters
- Specify font style descriptively: `bold sans-serif`, `elegant serif`,
-  `handwritten script`
+- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
 - Use keywords: `poster`, `logo`, `magazine cover`, `menu`
- Include font size: `small`, `medium`, `large`
- Iterate — text rendering may need 2-3 attempts
+- Include size: `small`, `medium`, `large`
+- May need 2-3 iterations
+
+### 4. Conversational Editing
+Iterate naturally:
+1. "Generate a forest scene"
+2. "Make it mistier"
+3. "Add a stone altar in the center"
+4. "Now place a glowing sword on the altar"
+
+### 5. Multi-Reference Composition
+Up to 14 reference images for character consistency, scene composition, or style blending.
+
+### 6. Google Search Grounding
+Real-time data in images:
+- `--grounding web` — Current events, weather, news
+- `--grounding image` — Visual search results
+- `--grounding both` — Combined search

 ## Aspect Ratios

-Hermes `aspect_ratio` → Google format:
 | Hermes | Google |
 |--------|--------|
 | `landscape` | `16:9` |
 | `square` | `1:1` |
 | `portrait` | `9:16` |

-Nano Banana also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
+Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`

-## Nano Banana Superpowers
+## Prompt Engineering

-Nano Banana (Gemini Flash/Pro Image) has capabilities Imagen lacks:
+**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"

-1. **Image refinement** — take an initial image and ask for changes:
-   "Keep everything the same but change the sky to sunset orange"
+**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"

-2. **img2img / style transfer** — provide reference image + text prompt
+**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"

-3. **Text in images** — logos, posters, menus, infographics
-
-4. **Google Search grounding** — real-time data in images (weather, news, stocks)
-
-5. **Multi-reference composition** — up to 14 images for character consistency
-
-6. **Step-by-step instructions** — "First, create a misty forest. Then add
-   a stone altar. Finally, place a glowing sword on the altar."
-
-## Prompt Engineering Tips
-
-1. **Be hyper-specific** — "ornate elven plate armor etched with silver
-   leaf patterns" beats "fantasy armor"
-
-2. **Provide context and intent** — "Create a logo for a high-end,
-   minimalist skincare brand" beats "Create a logo"
-
-3. **Iterate conversationally** (Nano Banana) — "That's great, but can
-   you make the lighting warmer?"
-
-4. **Use semantic negatives** — describe what you WANT, not what you
-   don't: "an empty deserted street" not "no cars"
-
-5. **Control the camera** — `wide-angle shot`, `macro shot`,
-   `low-angle perspective`
-
-6. **Max prompt length**: 480 tokens for Imagen
-
-## Gritty Dark Style (Iron Requiem Art Direction)
-
-Kay prefers a **dark, gritty, weathered** aesthetic over clean pixel art.
-The tank should look battle-scarred — rust, scorch marks, mud, oil stains,
-smoke. Lighting is moody with deep shadows. Color palette is desaturated:
-dirty browns, rusted greys, oil blacks. This is a war machine that has
-been fighting for weeks in frozen hell.
-
-Key gritty prompt modifiers:
- `dark gritty pixel art`, `weathered battle-damaged`, `rust and scorch marks`
- `dark moody lighting with deep shadows`, `mud-splattered`, `smoke rising`
- `muted desaturated colors`, `dirty browns and rusted greys and oil blacks`
- `grim war-torn atmosphere`
-
-Proven prompts that produced good results:
-
-```
-dark gritty pixel art, side view of a weathered battle-damaged tank,
-rust and scorch marks on armor, dark moody lighting with deep shadows,
-mud-splattered tracks, smoke rising from engine deck, grim war-torn
-atmosphere, muted desaturated colors, dirty browns and rusted greys
-and oil blacks, 2D game art style, no text
-```
-
-```
-dark gritty pixel art, top-down view of a battle-scarred tank,
-rusted armor plates, oil stains, deep shadows, mud and dirt texture
-on hull, open commander hatch showing darkness inside, muted war-torn
-color palette of rust browns, dirty greys, oil blacks, grim atmosphere,
-2D game sprite, no text, no background
-```
-
-## Iron Requiem Pixel Art Prompts
-
-Templates for the game designer:
-
-```
-Tank in tundra: pixel art, 16-bit, side view of a Panzer IV tank
-half-buried in snow on the Russian tundra, grey overcast sky,
-muzzle flash from the main gun, limited palette of steel greys,
-ice blues, off-whites, and one point of orange fire, no text,
-crisp edges, sprite art scale
-
-Enemy Type 59: pixel art, 16-bit, isometric view of a Chinese
-Type 59 tank advancing through snow, red star markings on turret,
-platoon formation visible in background, cold war aesthetic,
-limited palette adding olive green and red to the tundra tones,
-bullet hell projectiles as orange dots, no text
-
-Commander portrait: pixel art, 32-bit, portrait of a weary German
-tank commander, late 30s, stubble, hollow eyes, looking through
-a periscope, dim green glow from the optics, limited palette of
-dark greys and muted greens, dialogue box ready, visual novel style,
-no text
-```
-
-## Imagen API Response Field (Pitfall)
-
-The Imagen REST API (`:predict` endpoint) returns base64 image data in
-the field `bytesBase64Encoded`, **NOT** `imageBytes` or `image.imageBytes`.
-This is different from the Imagen GenAI SDK (which wraps it in an `image`
-object). When writing plugins or calling the REST API directly, use:
-
-```python
-# Correct:
-b64_bytes = pred["bytesBase64Encoded"]
-
-# WRONG (silently produces empty response):
-image_obj = pred.get("image", {})
-b64_bytes = image_obj.get("imageBytes", "")
-```
-
-**See `references/imagen-api-response-structure.md`** for the full response shape, common bug patterns, and verification commands.
-
-## TypeScript CLI Troubleshooting
-
-The shared TypeScript CLI (`google-image-gen.ts`) handles all API calls.
-Common issues:
-
-**TypeScript compile errors** — if you see `error TS18046: 'error' is of type 'unknown'`:
- Add type assertions: `as any` for JSON results, `(): unknown =>` for catch blocks
- The script uses ES modules — ensure ts-node is installed
-
-**Safety setting error** — `Error 400: Only block_low_and_above is supported`:
- The API requires `safetySetting: 'block_low_and_above'`
- Other values (`block_some`, `block_most`, etc.) are rejected
-
-**Empty response with no error** — check that `GOOGLE_API_KEY` is passed to the TS script:
-```bash
-GOOGLE_API_KEY="${GOOGLE_API_KEY}" npx ts-node google-image-gen.ts ...
-```
-
-**⚠️ Pitfall: Use the SDK, not REST API directly** — The Google GenAI SDK
-(`@google/genai`) handles all API transformation internally. Do NOT curl the
-REST endpoint directly — it accepts different parameter formats than the SDK.
-If you see `Invalid value at 'generation_config.response_format.image.aspect_ratio'`,
-you're using REST when you should be using the SDK. The SDK example:
-
-```typescript
-import { GoogleGenAI } from "@google/genai";
-const ai = new GoogleGenAI({});
-const response = await ai.models.generateContent({
-  model: 'gemini-3.1-flash-image-preview',
-  contents: [{ text: prompt }],
-  config: { responseFormat: { image: { aspectRatio: '16:9' } } },
-});
-```
-
-See `references/imagen-api-quirks.md` for full API quirks and working examples.
+**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`

 ## Limitations

- English prompts only (plus select languages for Nano Banana)
- Maximum 480 tokens per prompt
- Person generation: `allow_adult` default, block children in EU/UK/CH/MENA
+- English prompts only (plus select languages)
+- Max 480 tokens per prompt
+- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
 - All images include SynthID watermark
 - No transparent backgrounds
- Text in images works best after first generating the text then requesting
-  image rendering
- Imagen: no img2img, no conversational editing — use Nano Banana for that
+- Text rendering may need 2-3 attempts

-## Model Switching
+## Switch to Google Imagen For

-```bash
-# Set provider for all image_generate calls:
-hermes config set image_gen.provider google-imagen   # or: nano-banana
+- Initial text-to-image generation from scratch
+- When you need precise aspect ratio control from the start
+- High-volume batch generation (faster, simpler)

-# Nano Banana model selection (if using nano-banana provider):
-hermes config set image_gen.nano-banana.model gemini-3.1-flash-image-preview
-
-# Or one-off via env:
-GOOGLE_IMAGE_PROVIDER=nano-banana hermes -p game-designer ...
-```
+Use `google-imagen` skill for initial generation workflows.