Refactor: google-image-generation skill with sub-files for imagen and nano-banana
This commit is contained in:
@@ -1,123 +0,0 @@
|
||||
---
|
||||
name: nano-banana
|
||||
description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
|
||||
version: 1.0.0
|
||||
author: Kay Kayyali + Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [image-refinement, img2img, nano-banana, gemini]
|
||||
category: creative
|
||||
---
|
||||
|
||||
# Nano Banana (Gemini Flash/Pro Image)
|
||||
|
||||
**Powered by:** Google GenAI SDK (`@google/genai`)
|
||||
|
||||
**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
|
||||
|
||||
**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
|
||||
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# Set API key
|
||||
export GOOGLE_API_KEY="your-key-here"
|
||||
|
||||
# Refine an existing image
|
||||
hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
|
||||
```
|
||||
|
||||
## Models
|
||||
|
||||
| Model | Best For | Speed |
|
||||
|-------|----------|-------|
|
||||
| `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
|
||||
| `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
|
||||
| `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |
|
||||
|
||||
## Supported Parameters
|
||||
|
||||
- `--aspect-ratio` — `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
|
||||
- `--image-size` — `512`, `1K`, `2K`, `4K` (resolution control)
|
||||
- `--grounding` — `web`, `image`, or `both` (Google Search grounding)
|
||||
- `--thinking-level` — `minimal` or `high`
|
||||
- `--include-thoughts` — Show model reasoning steps
|
||||
|
||||
## Core Workflows
|
||||
|
||||
### 1. Image Refinement
|
||||
"Keep everything the same but change X"
|
||||
|
||||
Examples:
|
||||
- "Make the sky sunset orange"
|
||||
- "Add more contrast and saturation"
|
||||
- "Change the lighting to golden hour"
|
||||
- "Remove the background clutter"
|
||||
|
||||
### 2. img2img / Style Transfer
|
||||
Provide reference image(s) + text prompt:
|
||||
- "Apply this art style to my scene"
|
||||
- "Make it look like a 1950s poster"
|
||||
- "Convert to watercolor painting style"
|
||||
|
||||
### 3. Text-in-Image
|
||||
Add logos, posters, signs, menus:
|
||||
- Keep text under 25 characters
|
||||
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
|
||||
- Use keywords: `poster`, `logo`, `magazine cover`, `menu`
|
||||
- Include size: `small`, `medium`, `large`
|
||||
- May need 2-3 iterations
|
||||
|
||||
### 4. Conversational Editing
|
||||
Iterate naturally:
|
||||
1. "Generate a forest scene"
|
||||
2. "Make it mistier"
|
||||
3. "Add a stone altar in the center"
|
||||
4. "Now place a glowing sword on the altar"
|
||||
|
||||
### 5. Multi-Reference Composition
|
||||
Up to 14 reference images for character consistency, scene composition, or style blending.
|
||||
|
||||
### 6. Google Search Grounding
|
||||
Real-time data in images:
|
||||
- `--grounding web` — Current events, weather, news
|
||||
- `--grounding image` — Visual search results
|
||||
- `--grounding both` — Combined search
|
||||
|
||||
## Aspect Ratios
|
||||
|
||||
| Hermes | Google |
|
||||
|--------|--------|
|
||||
| `landscape` | `16:9` |
|
||||
| `square` | `1:1` |
|
||||
| `portrait` | `9:16` |
|
||||
|
||||
Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
|
||||
|
||||
## Prompt Engineering
|
||||
|
||||
**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"
|
||||
|
||||
**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"
|
||||
|
||||
**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"
|
||||
|
||||
**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`
|
||||
|
||||
## Limitations
|
||||
|
||||
- English prompts only (plus select languages)
|
||||
- Max 480 tokens per prompt
|
||||
- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
|
||||
- All images include SynthID watermark
|
||||
- No transparent backgrounds
|
||||
- Text rendering may need 2-3 attempts
|
||||
|
||||
## Switch to Google Imagen For
|
||||
|
||||
- Initial text-to-image generation from scratch
|
||||
- When you need precise aspect ratio control from the start
|
||||
- High-volume batch generation (faster, simpler)
|
||||
|
||||
Use `google-imagen` skill for initial generation workflows.
|
||||
106
skills/creative/google-image-generation/SKILL.md
Normal file
106
skills/creative/google-image-generation/SKILL.md
Normal file
@@ -0,0 +1,106 @@
|
||||
---
|
||||
name: google-image-generation
|
||||
description: "Generate and refine images using Google Imagen 4.0 and Nano Banana"
|
||||
version: 4.0.0
|
||||
author: Kay Kayyali + Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [image-generation, google-imagen, nano-banana, text-to-image, img2img]
|
||||
category: creative
|
||||
---
|
||||
|
||||
# Google Image Generation
|
||||
|
||||
Two tools for image work. Pick the right one for your task.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
**1. API Key** — Verify before starting:
|
||||
```bash
|
||||
grep "^GOOGLE_API_KEY=" ~/.hermes/.env || echo "Missing! Set with: echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env"
|
||||
```
|
||||
|
||||
**2. Scripts Available** — Both CLI tools are installed:
|
||||
- `/usr/local/lib/hermes-agent/plugins/image_gen/google-imagen/imagen.ts` — Initial generation
|
||||
- `/usr/local/lib/hermes-agent/plugins/image_gen/nano-banana/nano-banana.ts` — Refinement
|
||||
|
||||
If missing, the plugin wasn't installed correctly.
|
||||
|
||||
## Which Tool to Use
|
||||
|
||||
### Use **Imagen** for: Initial Generation
|
||||
- Text prompt → new image
|
||||
- Starting from scratch
|
||||
- Need precise aspect ratio control
|
||||
- Batch generation (multiple variants)
|
||||
|
||||
→ See `imagen.md` for detailed usage
|
||||
|
||||
### Use **Nano Banana** for: Refinement
|
||||
- "Keep everything but change X"
|
||||
- img2img / style transfer
|
||||
- Adding text/logos to existing images
|
||||
- Iterative conversational editing
|
||||
- Multi-reference composition (up to 14 images)
|
||||
|
||||
→ See `nano-banana.md` for detailed usage
|
||||
|
||||
## Quick Decision Tree
|
||||
|
||||
```
|
||||
Do you have an existing image to work with?
|
||||
├─ NO → Use Imagen (generate from text)
|
||||
└─ YES → What do you want to do?
|
||||
├─ "Change this one thing" → Nano Banana
|
||||
├─ "Apply this style" → Nano Banana
|
||||
├─ "Add text/logo" → Nano Banana
|
||||
└─ "Make a completely new one" → Imagen
|
||||
```
|
||||
|
||||
## Common Workflows
|
||||
|
||||
### Workflow 1: Generate → Refine
|
||||
1. Generate base image with Imagen at desired aspect ratio
|
||||
2. Switch to Nano Banana for refinements (preserves dimensions)
|
||||
3. Iterate conversationally: "make it darker", "add more contrast", etc.
|
||||
|
||||
### Workflow 2: Style Transfer
|
||||
1. Generate or provide reference image with desired style
|
||||
2. Use Nano Banana: "Apply this art style to [my scene]"
|
||||
|
||||
### Workflow 3: Text-in-Image
|
||||
1. Generate base with Imagen (clean composition)
|
||||
2. Use Nano Banana: "Add a sign that says 'X' in the corner"
|
||||
3. May need 2-3 iterations for clean text
|
||||
|
||||
## Prompt Engineering (Both Tools)
|
||||
|
||||
**Describe, don't list.** Narrative paragraphs beat keyword soup.
|
||||
|
||||
**Bad:** `pixel art tank snow bleak cold`
|
||||
|
||||
**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV tank sits motionless on the frozen Russian tundra under a grey sky. Snow drifts against its tracks. Orange glow from a dying campfire. Limited color palette — dark greys, muted blues, pale whites, one point of warm orange. No text.`
|
||||
|
||||
## Gritty Dark Style (Iron Requiem)
|
||||
|
||||
Kay prefers **dark, gritty, weathered** aesthetic:
|
||||
|
||||
```
|
||||
dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
|
||||
dark moody lighting with deep shadows, mud-splattered, smoke rising,
|
||||
muted desaturated colors, dirty browns and rusted greys and oil blacks,
|
||||
grim war-torn atmosphere
|
||||
```
|
||||
|
||||
## Limitations (Both)
|
||||
|
||||
- English prompts only
|
||||
- Max 480 tokens per prompt
|
||||
- Person generation: `allow_adult` default
|
||||
- All images include SynthID watermark
|
||||
- No transparent backgrounds
|
||||
|
||||
---
|
||||
|
||||
**Next:** Load `imagen.md` or `nano-banana.md` for detailed script usage.
|
||||
117
skills/creative/google-image-generation/imagen.md
Normal file
117
skills/creative/google-image-generation/imagen.md
Normal file
@@ -0,0 +1,117 @@
|
||||
# Imagen 4.0 — Initial Image Generation
|
||||
|
||||
**Use when:** Generating new images from text prompts (no existing image to work with).
|
||||
|
||||
## CLI Usage
|
||||
|
||||
```bash
|
||||
cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
|
||||
npx ts-node imagen.ts --prompt "YOUR PROMPT" --aspect-ratio 16:9 --output /path/to/output.png
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
| Flag | Required | Default | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `--prompt` | **Yes** | — | Text description of the image |
|
||||
| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1` |
|
||||
| `--output` | No | `output.png` | Output file path |
|
||||
| `--sample-count` | No | `1` | Generate multiple variants (2-10) |
|
||||
| `--negative-prompt` | No | — | What to exclude from the image |
|
||||
| `--style-reference` | No | — | Path to reference image for style transfer |
|
||||
| `--person-generation` | No | `allow_adult` | `allow_adult` or `dont_allow` |
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Generation
|
||||
```bash
|
||||
npx ts-node imagen.ts \
|
||||
--prompt "A pixel art scene: a weathered tank on frozen tundra, grey sky, orange campfire glow" \
|
||||
--aspect-ratio 16:9 \
|
||||
--output /tmp/tank-scene.png
|
||||
```
|
||||
|
||||
### Multiple Variants
|
||||
```bash
|
||||
npx ts-node imagen.ts \
|
||||
--prompt "A medieval castle on a cliff, dramatic lighting" \
|
||||
--sample-count 4 \
|
||||
--output /tmp/castle-variants.png
|
||||
```
|
||||
|
||||
### With Negative Prompt
|
||||
```bash
|
||||
npx ts-node imagen.ts \
|
||||
--prompt "A serene forest clearing with sunlight filtering through trees" \
|
||||
--negative-prompt "people, animals, buildings, text, watermarks" \
|
||||
--output /tmp/forest.png
|
||||
```
|
||||
|
||||
### Style Reference
|
||||
```bash
|
||||
npx ts-node imagen.ts \
|
||||
--prompt "A futuristic cityscape at dusk" \
|
||||
--style-reference /path/to/style-reference.png \
|
||||
--output /tmp/city.png
|
||||
```
|
||||
|
||||
## Prompt Engineering
|
||||
|
||||
### Structure
|
||||
1. **Medium/Style first:** `pixel art, 16-bit style`, `A photo of`, `A watercolor painting of`
|
||||
2. **Subject:** What's in the image
|
||||
3. **Setting/Context:** Where, when, atmosphere
|
||||
4. **Lighting/Color:** Mood, palette, time of day
|
||||
5. **Technical modifiers:** `no anti-aliasing`, `limited color palette`, `crisp edges`
|
||||
|
||||
### Style Recipes
|
||||
|
||||
**Pixel Art:**
|
||||
```
|
||||
pixel art, 16-bit style, [palette description], crisp clean edges,
|
||||
no anti-aliasing, limited color palette, [era] video game aesthetic, sprite art
|
||||
```
|
||||
|
||||
**Photography:**
|
||||
```
|
||||
A photo of [subject], [lens type], [lighting], [camera angle], [detail/focus], [mood]
|
||||
```
|
||||
Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`, `macro lens`, `aerial shot`, `bokeh`
|
||||
|
||||
**Illustration:**
|
||||
```
|
||||
A [art style] of [subject] in the style of [artist/movement], [medium]
|
||||
```
|
||||
Styles: `pencil sketch`, `charcoal drawing`, `watercolor`, `digital art`, `isometric 3D`, `art deco poster`
|
||||
|
||||
### Gritty Dark Style (Iron Requiem)
|
||||
```
|
||||
dark gritty pixel art, weathered battle-damaged, rust and scorch marks,
|
||||
dark moody lighting with deep shadows, mud-splattered, smoke rising,
|
||||
muted desaturated colors, dirty browns and rusted greys and oil blacks,
|
||||
grim war-torn atmosphere
|
||||
```
|
||||
|
||||
## When to Switch to Nano Banana
|
||||
|
||||
After generating with Imagen, switch to Nano Banana (`nano-banana.md`) when you want to:
|
||||
- Refine the image ("make it darker", "more contrast")
|
||||
- Add or modify specific elements
|
||||
- Add text/logos
|
||||
- Apply a different style
|
||||
- Iterate conversationally
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Error: No candidates in response**
|
||||
- Prompt may be too vague or violate content policy
|
||||
- Try simplifying or rephrasing
|
||||
|
||||
**Error: GOOGLE_API_KEY not set**
|
||||
- Run: `echo 'GOOGLE_API_KEY=your-key' >> ~/.hermes/.env`
|
||||
- Restart Hermes gateway: `hermes gateway restart`
|
||||
|
||||
**Image quality issues**
|
||||
- Increase detail in prompt
|
||||
- Try different aspect ratio
|
||||
- Use `--sample-count 4` and pick the best variant
|
||||
160
skills/creative/google-image-generation/nano-banana.md
Normal file
160
skills/creative/google-image-generation/nano-banana.md
Normal file
@@ -0,0 +1,160 @@
|
||||
# Nano Banana — Image Refinement & img2img
|
||||
|
||||
**Use when:** You have an existing image to refine, modify, or build upon.
|
||||
|
||||
**⚠️ NOT for initial generation.** Requires at least one reference image.
|
||||
|
||||
## CLI Usage
|
||||
|
||||
```bash
|
||||
cd /usr/local/lib/hermes-agent/plugins/image_gen/nano-banana
|
||||
npx ts-node nano-banana.ts --prompt "YOUR PROMPT" --refine /path/to/image.png --output /path/to/output.png
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
| Flag | Required | Default | Description |
|
||||
|------|----------|---------|-------------|
|
||||
| `--prompt` | **Yes** | — | What to change/add/do |
|
||||
| `--refine` | **Yes*** | — | Path to reference image (can use multiple times, max 14) |
|
||||
| `--output` | No | `output.png` | Output file path |
|
||||
| `--aspect-ratio` | No | `1:1` | `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc. |
|
||||
| `--image-size` | No | `1K` | Resolution: `512`, `1K`, `2K`, `4K` |
|
||||
| `--model` | No | `gemini-3.1-flash-image-preview` | Model to use |
|
||||
| `--grounding` | No | — | `web`, `image`, or `both` (Google Search grounding) |
|
||||
| `--thinking-level` | No | — | `minimal` or `high` |
|
||||
| `--include-thoughts` | No | — | Show model reasoning steps |
|
||||
|
||||
*Required unless using only text prompt (not recommended — always provide at least one reference)
|
||||
|
||||
## Models
|
||||
|
||||
| Model | Speed | Best For |
|
||||
|-------|-------|----------|
|
||||
| `gemini-3.1-flash-image-preview` | ~5-15s | Default — refinement, img2img, text-in-image |
|
||||
| `gemini-3-pro-image-preview` | ~15-45s | Professional quality, complex text |
|
||||
| `gemini-2.5-flash-image` | ~3-10s | High-volume, fastest |
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Refinement
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Make the sky sunset orange, add more dramatic clouds" \
|
||||
--refine /tmp/base-image.png \
|
||||
--output /tmp/refined.png
|
||||
```
|
||||
|
||||
### Multiple Reference Images
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Combine these characters into a group scene, same art style" \
|
||||
--refine /tmp/char1.png \
|
||||
--refine /tmp/char2.png \
|
||||
--refine /tmp/char3.png \
|
||||
--output /tmp/group.png
|
||||
```
|
||||
|
||||
### Style Transfer
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Apply this watercolor style to my scene" \
|
||||
--refine /tmp/my-scene.png \
|
||||
--refine /tmp/watercolor-style.png \
|
||||
--output /tmp/styled.png
|
||||
```
|
||||
|
||||
### Add Text/Logo
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Add a wooden sign in the bottom corner that says 'TAVERN' in bold rustic letters" \
|
||||
--refine /tmp/building.png \
|
||||
--output /tmp/with-sign.png
|
||||
```
|
||||
|
||||
### With Grounding (Real-time Data)
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Create an image showing today's weather in New York" \
|
||||
--grounding web \
|
||||
--output /tmp/weather.png
|
||||
```
|
||||
|
||||
### High-Quality Output
|
||||
```bash
|
||||
npx ts-node nano-banana.ts \
|
||||
--prompt "Refine this character portrait with professional quality" \
|
||||
--refine /tmp/draft.png \
|
||||
--model gemini-3-pro-image-preview \
|
||||
--image-size 4K \
|
||||
--output /tmp/final.png
|
||||
```
|
||||
|
||||
## Core Workflows
|
||||
|
||||
### 1. Iterative Refinement
|
||||
```
|
||||
1. Generate base with Imagen
|
||||
2. "Make it darker, more contrast" → Nano Banana
|
||||
3. "Add a character in the foreground" → Nano Banana (use previous output as ref)
|
||||
4. "Change the lighting to golden hour" → Nano Banana
|
||||
```
|
||||
|
||||
### 2. Character Consistency
|
||||
Provide multiple reference images of the same character from different angles:
|
||||
```bash
|
||||
--refine char-front.png --refine char-side.png --refine char-back.png
|
||||
--prompt "Generate a new scene with this character from a 3/4 view"
|
||||
```
|
||||
|
||||
### 3. Text-in-Image (Best Practices)
|
||||
- Keep text under 25 characters
|
||||
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
|
||||
- Include context: `poster`, `logo`, `magazine cover`, `menu`
|
||||
- Specify size: `small`, `medium`, `large`
|
||||
- May need 2-3 iterations for clean text
|
||||
|
||||
### 4. Google Search Grounding
|
||||
- `--grounding web` — Current events, weather, news, real-time data
|
||||
- `--grounding image` — Visual search results for composition
|
||||
- `--grounding both` — Combined search
|
||||
|
||||
## Prompt Engineering for Refinement
|
||||
|
||||
### Be Specific About Changes
|
||||
**Vague:** "Make it better"
|
||||
**Specific:** "Increase saturation by 20%, add rim lighting on the subject, deepen the shadows"
|
||||
|
||||
### Preserve Context
|
||||
**Bad:** "A tank" (loses all previous detail)
|
||||
**Good:** "Keep the exact same tank and composition, but change the time of day to dusk with orange sky"
|
||||
|
||||
### Semantic Negatives
|
||||
Describe what you **want**, not what you don't:
|
||||
- ✅ "an empty deserted street"
|
||||
- ❌ "no cars, no people"
|
||||
|
||||
## When to Switch to Imagen
|
||||
|
||||
Switch back to Imagen (`imagen.md`) when:
|
||||
- Starting a completely new image
|
||||
- Need precise aspect ratio from scratch
|
||||
- Batch generation (multiple variants of same prompt)
|
||||
- Nano Banana keeps losing important details
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
**Text rendering is garbled**
|
||||
- Try shorter text (under 15 chars)
|
||||
- Specify font style explicitly
|
||||
- Use `gemini-3-pro-image-preview` for complex text
|
||||
- May need 2-3 iterations
|
||||
|
||||
**Image loses important details**
|
||||
- Be more explicit: "Keep [X] exactly the same, only change [Y]"
|
||||
- Provide multiple reference images
|
||||
- Try `--thinking-level high`
|
||||
|
||||
**Error: No candidates in response**
|
||||
- Ensure at least one valid `--refine` image is provided
|
||||
- Check prompt doesn't violate content policy
|
||||
Reference in New Issue
Block a user