Refactor: google-image-generation skill with sub-files for imagen and nano-banana

This commit is contained in:
2026-05-24 02:36:40 +00:00
parent 550349be2d
commit 6f9405ceee
4 changed files with 383 additions and 123 deletions

View File

@@ -1,123 +0,0 @@
---
name: nano-banana
description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
version: 1.0.0
author: Kay Kayyali + Hermes Agent
license: MIT
metadata:
hermes:
tags: [image-refinement, img2img, nano-banana, gemini]
category: creative
---
# Nano Banana (Gemini Flash/Pro Image)
**Powered by:** Google GenAI SDK (`@google/genai`)
**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
## Quick Start
```bash
# Set API key
export GOOGLE_API_KEY="your-key-here"
# Refine an existing image
hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
```
## Models
| Model | Best For | Speed |
|-------|----------|-------|
| `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
| `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
| `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |
## Supported Parameters
- `--aspect-ratio``1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
- `--image-size``512`, `1K`, `2K`, `4K` (resolution control)
- `--grounding``web`, `image`, or `both` (Google Search grounding)
- `--thinking-level``minimal` or `high`
- `--include-thoughts` — Show model reasoning steps
## Core Workflows
### 1. Image Refinement
"Keep everything the same but change X"
Examples:
- "Make the sky sunset orange"
- "Add more contrast and saturation"
- "Change the lighting to golden hour"
- "Remove the background clutter"
### 2. img2img / Style Transfer
Provide reference image(s) + text prompt:
- "Apply this art style to my scene"
- "Make it look like a 1950s poster"
- "Convert to watercolor painting style"
### 3. Text-in-Image
Add logos, posters, signs, menus:
- Keep text under 25 characters
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
- Use keywords: `poster`, `logo`, `magazine cover`, `menu`
- Include size: `small`, `medium`, `large`
- May need 2-3 iterations
### 4. Conversational Editing
Iterate naturally:
1. "Generate a forest scene"
2. "Make it mistier"
3. "Add a stone altar in the center"
4. "Now place a glowing sword on the altar"
### 5. Multi-Reference Composition
Up to 14 reference images for character consistency, scene composition, or style blending.
### 6. Google Search Grounding
Real-time data in images:
- `--grounding web` — Current events, weather, news
- `--grounding image` — Visual search results
- `--grounding both` — Combined search
## Aspect Ratios
| Hermes | Google |
|--------|--------|
| `landscape` | `16:9` |
| `square` | `1:1` |
| `portrait` | `9:16` |
Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
## Prompt Engineering
**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"
**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"
**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"
**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`
## Limitations
- English prompts only (plus select languages)
- Max 480 tokens per prompt
- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
- All images include SynthID watermark
- No transparent backgrounds
- Text rendering may need 2-3 attempts
## Switch to Google Imagen For
- Initial text-to-image generation from scratch
- When you need precise aspect ratio control from the start
- High-volume batch generation (faster, simpler)
Use `google-imagen` skill for initial generation workflows.

View File

@@ -0,0 +1,106 @@
---
name: google-image-generation
description: "Generate and refine images using Google Imagen 4.0 and Nano Banana"
version: 4.0.0
author: Kay Kayyali + Hermes Agent
license: MIT
metadata:
hermes:
tags: [image-generation, google-imagen, nano-banana, text-to-image, img2img]
category: creative
---
# Google Image Generation
Two tools for image work. Pick the right one for your task.
## Prerequisites
**1. API Key** — Verify before starting:
```bash
grep "^GOOGLE_API_KEY=" ~/.hermes/.env || echo "Missing! Set with: echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env"
```
**2. Scripts Available** — Both CLI tools are installed:
- `/usr/local/lib/hermes-agent/plugins/image_gen/google-imagen/imagen.ts` — Initial generation
- `/usr/local/lib/hermes-agent/plugins/image_gen/nano-banana/nano-banana.ts` — Refinement
If missing, the plugin wasn't installed correctly.
## Which Tool to Use
### Use **Imagen** for: Initial Generation
- Text prompt → new image
- Starting from scratch
- Need precise aspect ratio control
- Batch generation (multiple variants)
→ See `imagen.md` for detailed usage
### Use **Nano Banana** for: Refinement
- "Keep everything but change X"
- img2img / style transfer
- Adding text/logos to existing images
- Iterative conversational editing
- Multi-reference composition (up to 14 images)
→ See `nano-banana.md` for detailed usage
## Quick Decision Tree
```
Do you have an existing image to work with?
├─ NO → Use Imagen (generate from text)
└─ YES → What do you want to do?
├─ "Change this one thing" → Nano Banana
├─ "Apply this style" → Nano Banana
├─ "Add text/logo" → Nano Banana
└─ "Make a completely new one" → Imagen
```
## Common Workflows
### Workflow 1: Generate → Refine
1. Generate base image with Imagen at desired aspect ratio
2. Switch to Nano Banana for refinements (preserves dimensions)
3. Iterate conversationally: "make it darker", "add more contrast", etc.
### Workflow 2: Style Transfer
1. Generate or provide reference image with desired style
2. Use Nano Banana: "Apply this art style to [my scene]"
### Workflow 3: Text-in-Image
1. Generate base with Imagen (clean composition)
2. Use Nano Banana: "Add a sign that says 'X' in the corner"
3. May need 2-3 iterations for clean text
## Prompt Engineering (Both Tools)
**Describe, don't list.** Narrative paragraphs beat keyword soup.
**Bad:** `pixel art tank snow bleak cold`
**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV tank sits motionless on the frozen Russian tundra under a grey sky. Snow drifts against its tracks. Orange glow from a dying campfire. Limited color palette — dark greys, muted blues, pale whites, one point of warm orange. No text.`
## Gritty Dark Style (Iron Requiem)
Kay prefers **dark, gritty, weathered** aesthetic:
```
dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
dark moody lighting with deep shadows, mud-splattered, smoke rising,
muted desaturated colors, dirty browns and rusted greys and oil blacks,
grim war-torn atmosphere
```
## Limitations (Both)
- English prompts only
- Max 480 tokens per prompt
- Person generation: `allow_adult` default
- All images include SynthID watermark
- No transparent backgrounds
---
**Next:** Load `imagen.md` or `nano-banana.md` for detailed script usage.

View File

@@ -0,0 +1,117 @@
# Imagen 4.0 — Initial Image Generation
**Use when:** Generating new images from text prompts (no existing image to work with).
## CLI Usage
```bash
cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
npx ts-node imagen.ts --prompt "YOUR PROMPT" --aspect-ratio 16:9 --output /path/to/output.png
```
## Parameters
| Flag | Required | Default | Description |
|------|----------|---------|-------------|
| `--prompt` | **Yes** | — | Text description of the image |
| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1` |
| `--output` | No | `output.png` | Output file path |
| `--sample-count` | No | `1` | Generate multiple variants (2-10) |
| `--negative-prompt` | No | — | What to exclude from the image |
| `--style-reference` | No | — | Path to reference image for style transfer |
| `--person-generation` | No | `allow_adult` | `allow_adult` or `dont_allow` |
## Examples
### Basic Generation
```bash
npx ts-node imagen.ts \
--prompt "A pixel art scene: a weathered tank on frozen tundra, grey sky, orange campfire glow" \
--aspect-ratio 16:9 \
--output /tmp/tank-scene.png
```
### Multiple Variants
```bash
npx ts-node imagen.ts \
--prompt "A medieval castle on a cliff, dramatic lighting" \
--sample-count 4 \
--output /tmp/castle-variants.png
```
### With Negative Prompt
```bash
npx ts-node imagen.ts \
--prompt "A serene forest clearing with sunlight filtering through trees" \
--negative-prompt "people, animals, buildings, text, watermarks" \
--output /tmp/forest.png
```
### Style Reference
```bash
npx ts-node imagen.ts \
--prompt "A futuristic cityscape at dusk" \
--style-reference /path/to/style-reference.png \
--output /tmp/city.png
```
## Prompt Engineering
### Structure
1. **Medium/Style first:** `pixel art, 16-bit style`, `A photo of`, `A watercolor painting of`
2. **Subject:** What's in the image
3. **Setting/Context:** Where, when, atmosphere
4. **Lighting/Color:** Mood, palette, time of day
5. **Technical modifiers:** `no anti-aliasing`, `limited color palette`, `crisp edges`
### Style Recipes
**Pixel Art:**
```
pixel art, 16-bit style, [palette description], crisp clean edges,
no anti-aliasing, limited color palette, [era] video game aesthetic, sprite art
```
**Photography:**
```
A photo of [subject], [lens type], [lighting], [camera angle], [detail/focus], [mood]
```
Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`, `macro lens`, `aerial shot`, `bokeh`
**Illustration:**
```
A [art style] of [subject] in the style of [artist/movement], [medium]
```
Styles: `pencil sketch`, `charcoal drawing`, `watercolor`, `digital art`, `isometric 3D`, `art deco poster`
### Gritty Dark Style (Iron Requiem)
```
dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
dark moody lighting with deep shadows, mud-splattered, smoke rising,
muted desaturated colors, dirty browns and rusted greys and oil blacks,
grim war-torn atmosphere
```
## When to Switch to Nano Banana
After generating with Imagen, switch to Nano Banana (`nano-banana.md`) when you want to:
- Refine the image ("make it darker", "more contrast")
- Add or modify specific elements
- Add text/logos
- Apply a different style
- Iterate conversationally
## Troubleshooting
**Error: No candidates in response**
- Prompt may be too vague or violate content policy
- Try simplifying or rephrasing
**Error: GOOGLE_API_KEY not set**
- Run: `echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env`
- Restart Hermes gateway: `hermes gateway restart`
**Image quality issues**
- Increase detail in prompt
- Try different aspect ratio
- Use `--sample-count 4` and pick the best variant

View File

@@ -0,0 +1,160 @@
# Nano Banana — Image Refinement & img2img
**Use when:** You have an existing image to refine, modify, or build upon.
**⚠️ NOT for initial generation.** Requires at least one reference image.
## CLI Usage
```bash
cd /usr/local/lib/hermes-agent/plugins/image_gen/nano-banana
npx ts-node nano-banana.ts --prompt "YOUR PROMPT" --refine /path/to/image.png --output /path/to/output.png
```
## Parameters
| Flag | Required | Default | Description |
|------|----------|---------|-------------|
| `--prompt` | **Yes** | — | What to change/add/do |
| `--refine` | **Yes*** | — | Path to reference image (can use multiple times, max 14) |
| `--output` | No | `output.png` | Output file path |
| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc. |
| `--image-size` | No | `1K` | Resolution: `512`, `1K`, `2K`, `4K` |
| `--model` | No | `gemini-3.1-flash-image-preview` | Model to use |
| `--grounding` | No | — | `web`, `image`, or `both` (Google Search grounding) |
| `--thinking-level` | No | — | `minimal` or `high` |
| `--include-thoughts` | No | — | Show model reasoning steps |
*Required unless using only text prompt (not recommended — always provide at least one reference)
## Models
| Model | Speed | Best For |
|-------|-------|----------|
| `gemini-3.1-flash-image-preview` | ~5-15s | Default — refinement, img2img, text-in-image |
| `gemini-3-pro-image-preview` | ~15-45s | Professional quality, complex text |
| `gemini-2.5-flash-image` | ~3-10s | High-volume, fastest |
## Examples
### Basic Refinement
```bash
npx ts-node nano-banana.ts \
--prompt "Make the sky sunset orange, add more dramatic clouds" \
--refine /tmp/base-image.png \
--output /tmp/refined.png
```
### Multiple Reference Images
```bash
npx ts-node nano-banana.ts \
--prompt "Combine these characters into a group scene, same art style" \
--refine /tmp/char1.png \
--refine /tmp/char2.png \
--refine /tmp/char3.png \
--output /tmp/group.png
```
### Style Transfer
```bash
npx ts-node nano-banana.ts \
--prompt "Apply this watercolor style to my scene" \
--refine /tmp/my-scene.png \
--refine /tmp/watercolor-style.png \
--output /tmp/styled.png
```
### Add Text/Logo
```bash
npx ts-node nano-banana.ts \
--prompt "Add a wooden sign in the bottom corner that says 'TAVERN' in bold rustic letters" \
--refine /tmp/building.png \
--output /tmp/with-sign.png
```
### With Grounding (Real-time Data)
```bash
npx ts-node nano-banana.ts \
--prompt "Create an image showing today's weather in New York" \
--grounding web \
--output /tmp/weather.png
```
### High-Quality Output
```bash
npx ts-node nano-banana.ts \
--prompt "Refine this character portrait with professional quality" \
--refine /tmp/draft.png \
--model gemini-3-pro-image-preview \
--image-size 4K \
--output /tmp/final.png
```
## Core Workflows
### 1. Iterative Refinement
```
1. Generate base with Imagen
2. "Make it darker, more contrast" → Nano Banana
3. "Add a character in the foreground" → Nano Banana (use previous output as ref)
4. "Change the lighting to golden hour" → Nano Banana
```
### 2. Character Consistency
Provide multiple reference images of the same character from different angles:
```bash
--refine char-front.png --refine char-side.png --refine char-back.png
--prompt "Generate a new scene with this character from a 3/4 view"
```
### 3. Text-in-Image (Best Practices)
- Keep text under 25 characters
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
- Include context: `poster`, `logo`, `magazine cover`, `menu`
- Specify size: `small`, `medium`, `large`
- May need 2-3 iterations for clean text
### 4. Google Search Grounding
- `--grounding web` — Current events, weather, news, real-time data
- `--grounding image` — Visual search results for composition
- `--grounding both` — Combined search
## Prompt Engineering for Refinement
### Be Specific About Changes
**Vague:** "Make it better"
**Specific:** "Increase saturation by 20%, add rim lighting on the subject, deepen the shadows"
### Preserve Context
**Bad:** "A tank" (loses all previous detail)
**Good:** "Keep the exact same tank and composition, but change the time of day to dusk with orange sky"
### Semantic Negatives
Describe what you **want**, not what you don't:
- ✅ "an empty deserted street"
- ❌ "no cars, no people"
## When to Switch to Imagen
Switch back to Imagen (`imagen.md`) when:
- Starting a completely new image
- Need precise aspect ratio from scratch
- Batch generation (multiple variants of same prompt)
- Nano Banana keeps losing important details
## Troubleshooting
**Text rendering is garbled**
- Try shorter text (under 15 chars)
- Specify font style explicitly
- Use `gemini-3-pro-image-preview` for complex text
- May need 2-3 iterations
**Image loses important details**
- Be more explicit: "Keep [X] exactly the same, only change [Y]"
- Provide multiple reference images
- Try `--thinking-level high`
**Error: No candidates in response**
- Ensure at least one valid `--refine` image is provided
- Check prompt doesn't violate content policy