Refactor: split google-image-gen-prompting into google-imagen and nano-banana skills

This commit is contained in:
2026-05-24 02:28:53 +00:00
parent c490570d6e
commit 550349be2d

View File

@@ -1,345 +1,123 @@
---
name: google-image-gen-prompting
description: "Generate or refine images in any artstyle, and in multiple formats"
version: 3.1.0
name: nano-banana
description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
version: 1.0.0
author: Kay Kayyali + Hermes Agent
license: MIT
metadata:
hermes:
tags: [image-generation, google-imagen, nano-banana, prompt-engineering, pixel-art, two-plugin-workflow, google-genai-sdk]
tags: [image-refinement, img2img, nano-banana, gemini]
category: creative
---
# Google Image Generation Prompting Guide
# Nano Banana (Gemini Flash/Pro Image)
**Powered by:** Google GenAI SDK (`@google/genai`) — TypeScript CLI handles all API calls.
**Powered by:** Google GenAI SDK (`@google/genai`)
## Quick Setup & Verification
**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
## Quick Start
**1. Set API key** (if not already in `~/.hermes/.env`):
```bash
# Set API key
export GOOGLE_API_KEY="your-key-here"
# Or add to ~/.hermes/.env: GOOGLE_API_KEY=your-key-here
# Refine an existing image
hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
```
**2. Verify setup** — run a smoke test:
```bash
# Using the image_generate tool (recommended):
hermes chat -q "Generate a test image: a red cube on white background"
## Models
# Or direct TypeScript CLI test:
cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
npx ts-node google-image-gen.ts --imagen --prompt "test" --output /tmp/test.png
```
**3. If skill doesn't appear in `hermes skills list`**:
```bash
# Reload skills from disk:
hermes chat -q "/reload-skills"
# Or restart the gateway: hermes gateway restart
```
Best practices for generating high-quality images with Google Imagen 4
and Nano Banana (Gemini Flash/Pro Image) models. These models are accessed
via the `image_generate` tool configured with `image_gen.provider: google`.
## Two-Plugin Workflow
**⚠️ Critical:** `nano-banana` is **NOT** for initial text-to-image generation. It is an **image refinement** tool only.
**Use `google-imagen` for:** Initial text-to-image generation. This is your primary image gen plugin.
- Text prompts → images
- Best for: getting a base image from scratch
- Model: `imagen-4.0-generate-001`
- Supports: `--aspect-ratio` (1:1, 16:9, 9:16, 4:3, 3:2, etc.), `--sample-count`, `--negative-prompt`, `--style-reference`
**Use `nano-banana` for:** Image refinement, img2img, and text-in-image. **Requires an existing image to refine.**
- "Keep everything the same but change X"
- Style transfer from a reference image
- Adding text/logos to images
- Iterative conversational editing
- Models: `gemini-3.1-flash-image-preview` or `gemini-3-pro-image-preview`
- Supports: `--aspect-ratio`, `--image-size` (512, 1K, 2K, 4K), `--grounding`, `--thinking-level`, `--include-thoughts`
**Typical workflow:**
1. Generate base image with `google-imagen` at your desired aspect ratio
2. Switch to `nano-banana` for refinements (keeps same dimensions)
3. Switch back to `google-imagen` for new generations
**Config:**
```yaml
image_gen:
provider: google-imagen # or: nano-banana
```
**Switch providers:**
```bash
# For initial generation:
hermes config set image_gen.provider google-imagen
# For refinement:
hermes config set image_gen.provider nano-banana
# Or one-off via env:
GOOGLE_IMAGE_PROVIDER=nano-banana image_generate --prompt "refine this..."
```
## Quick Reference
### google-imagen (Imagen 4.0)
| Model | Best For | Speed |
|-------|----------|-------|
| `imagen-4.0-generate-001` | Photorealism, high detail, text in images | ~5-15s |
### nano-banana (Gemini Flash/Pro Image)
| Model | Best For | Speed |
|-------|----------|-------|
| `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
| `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
| `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |
## Core Principle: Describe, Don't List
## Supported Parameters
A narrative paragraph beats keyword soup every time. These models excel
at language understanding. Example:
- `--aspect-ratio``1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
- `--image-size``512`, `1K`, `2K`, `4K` (resolution control)
- `--grounding``web`, `image`, or `both` (Google Search grounding)
- `--thinking-level``minimal` or `high`
- `--include-thoughts` — Show model reasoning steps
**Bad:** `pixel art tank snow bleak cold`
## Core Workflows
**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV
tank sits motionless on the frozen Russian tundra under a grey sky. Snow
drifts against its tracks. Orange glow from a dying campfire. Limited
color palette — dark greys, muted blues, pale whites, one point of warm
orange. No text.`
### 1. Image Refinement
"Keep everything the same but change X"
## Style Recipes
Examples:
- "Make the sky sunset orange"
- "Add more contrast and saturation"
- "Change the lighting to golden hour"
- "Remove the background clutter"
### Pixel Art
For Iron Requiem and other pixel art games:
### 2. img2img / Style Transfer
Provide reference image(s) + text prompt:
- "Apply this art style to my scene"
- "Make it look like a 1950s poster"
- "Convert to watercolor painting style"
```
pixel art, 16-bit style, [specific palette description], crisp clean
edges, no anti-aliasing, limited color palette, [era] video game
aesthetic, sprite art
```
Key modifiers:
- `no anti-aliasing` — keeps hard edges
- `limited color palette` — enforces pixel look
- `N-bit style` (8-bit, 16-bit, 32-bit) — era control
- `sprite art` — character/enemy focus
- `tile-based` — background emphasis
### Photography
```
A photo of [subject], [lens type], [lighting], [camera angle],
[detail/focus], [mood], [orientation]
```
Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`,
`macro lens`, `aerial shot`, `fisheye`, `motion blur`, `bokeh`,
`black and white`, `polaroid`
### Illustration & Art
```
A [art style] of [subject] in the style of [artist/movement], [medium]
```
Styles: `pencil sketch`, `charcoal drawing`, `pastel painting`,
`watercolor`, `digital art`, `isometric 3D`, `art deco poster`,
`impressionist painting`, `renaissance painting`
### Product Mockups
```
A studio photograph of [product], [material], on [surface].
[Lighting setup]. [Camera angle]. [Background]
```
### Text in Images (Nano Banana)
### 3. Text-in-Image
Add logos, posters, signs, menus:
- Keep text under 25 characters
- Specify font style descriptively: `bold sans-serif`, `elegant serif`,
`handwritten script`
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
- Use keywords: `poster`, `logo`, `magazine cover`, `menu`
- Include font size: `small`, `medium`, `large`
- Iterate — text rendering may need 2-3 attempts
- Include size: `small`, `medium`, `large`
- May need 2-3 iterations
### 4. Conversational Editing
Iterate naturally:
1. "Generate a forest scene"
2. "Make it mistier"
3. "Add a stone altar in the center"
4. "Now place a glowing sword on the altar"
### 5. Multi-Reference Composition
Up to 14 reference images for character consistency, scene composition, or style blending.
### 6. Google Search Grounding
Real-time data in images:
- `--grounding web` — Current events, weather, news
- `--grounding image` — Visual search results
- `--grounding both` — Combined search
## Aspect Ratios
Hermes `aspect_ratio` → Google format:
| Hermes | Google |
|--------|--------|
| `landscape` | `16:9` |
| `square` | `1:1` |
| `portrait` | `9:16` |
Nano Banana also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
## Nano Banana Superpowers
## Prompt Engineering
Nano Banana (Gemini Flash/Pro Image) has capabilities Imagen lacks:
**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"
1. **Image refinement** — take an initial image and ask for changes:
"Keep everything the same but change the sky to sunset orange"
**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"
2. **img2img / style transfer** — provide reference image + text prompt
**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"
3. **Text in images** — logos, posters, menus, infographics
4. **Google Search grounding** — real-time data in images (weather, news, stocks)
5. **Multi-reference composition** — up to 14 images for character consistency
6. **Step-by-step instructions** — "First, create a misty forest. Then add
a stone altar. Finally, place a glowing sword on the altar."
## Prompt Engineering Tips
1. **Be hyper-specific** — "ornate elven plate armor etched with silver
leaf patterns" beats "fantasy armor"
2. **Provide context and intent** — "Create a logo for a high-end,
minimalist skincare brand" beats "Create a logo"
3. **Iterate conversationally** (Nano Banana) — "That's great, but can
you make the lighting warmer?"
4. **Use semantic negatives** — describe what you WANT, not what you
don't: "an empty deserted street" not "no cars"
5. **Control the camera**`wide-angle shot`, `macro shot`,
`low-angle perspective`
6. **Max prompt length**: 480 tokens for Imagen
## Gritty Dark Style (Iron Requiem Art Direction)
Kay prefers a **dark, gritty, weathered** aesthetic over clean pixel art.
The tank should look battle-scarred — rust, scorch marks, mud, oil stains,
smoke. Lighting is moody with deep shadows. Color palette is desaturated:
dirty browns, rusted greys, oil blacks. This is a war machine that has
been fighting for weeks in frozen hell.
Key gritty prompt modifiers:
- `dark gritty pixel art`, `weathered battle-damaged`, `rust and scorch marks`
- `dark moody lighting with deep shadows`, `mud-splattered`, `smoke rising`
- `muted desaturated colors`, `dirty browns and rusted greys and oil blacks`
- `grim war-torn atmosphere`
Proven prompts that produced good results:
```
dark gritty pixel art, side view of a weathered battle-damaged tank,
rust and scorch marks on armor, dark moody lighting with deep shadows,
mud-splattered tracks, smoke rising from engine deck, grim war-torn
atmosphere, muted desaturated colors, dirty browns and rusted greys
and oil blacks, 2D game art style, no text
```
```
dark gritty pixel art, top-down view of a battle-scarred tank,
rusted armor plates, oil stains, deep shadows, mud and dirt texture
on hull, open commander hatch showing darkness inside, muted war-torn
color palette of rust browns, dirty greys, oil blacks, grim atmosphere,
2D game sprite, no text, no background
```
## Iron Requiem Pixel Art Prompts
Templates for the game designer:
```
Tank in tundra: pixel art, 16-bit, side view of a Panzer IV tank
half-buried in snow on the Russian tundra, grey overcast sky,
muzzle flash from the main gun, limited palette of steel greys,
ice blues, off-whites, and one point of orange fire, no text,
crisp edges, sprite art scale
Enemy Type 59: pixel art, 16-bit, isometric view of a Chinese
Type 59 tank advancing through snow, red star markings on turret,
platoon formation visible in background, cold war aesthetic,
limited palette adding olive green and red to the tundra tones,
bullet hell projectiles as orange dots, no text
Commander portrait: pixel art, 32-bit, portrait of a weary German
tank commander, late 30s, stubble, hollow eyes, looking through
a periscope, dim green glow from the optics, limited palette of
dark greys and muted greens, dialogue box ready, visual novel style,
no text
```
## Imagen API Response Field (Pitfall)
The Imagen REST API (`:predict` endpoint) returns base64 image data in
the field `bytesBase64Encoded`, **NOT** `imageBytes` or `image.imageBytes`.
This is different from the Imagen GenAI SDK (which wraps it in an `image`
object). When writing plugins or calling the REST API directly, use:
```python
# Correct:
b64_bytes = pred["bytesBase64Encoded"]
# WRONG (silently produces empty response):
image_obj = pred.get("image", {})
b64_bytes = image_obj.get("imageBytes", "")
```
**See `references/imagen-api-response-structure.md`** for the full response shape, common bug patterns, and verification commands.
## TypeScript CLI Troubleshooting
The shared TypeScript CLI (`google-image-gen.ts`) handles all API calls.
Common issues:
**TypeScript compile errors** — if you see `error TS18046: 'error' is of type 'unknown'`:
- Add type assertions: `as any` for JSON results, `(): unknown =>` for catch blocks
- The script uses ES modules — ensure ts-node is installed
**Safety setting error**`Error 400: Only block_low_and_above is supported`:
- The API requires `safetySetting: 'block_low_and_above'`
- Other values (`block_some`, `block_most`, etc.) are rejected
**Empty response with no error** — check that `GOOGLE_API_KEY` is passed to the TS script:
```bash
GOOGLE_API_KEY="${GOOGLE_API_KEY}" npx ts-node google-image-gen.ts ...
```
**⚠️ Pitfall: Use the SDK, not REST API directly** — The Google GenAI SDK
(`@google/genai`) handles all API transformation internally. Do NOT curl the
REST endpoint directly — it accepts different parameter formats than the SDK.
If you see `Invalid value at 'generation_config.response_format.image.aspect_ratio'`,
you're using REST when you should be using the SDK. The SDK example:
```typescript
import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
model: 'gemini-3.1-flash-image-preview',
contents: [{ text: prompt }],
config: { responseFormat: { image: { aspectRatio: '16:9' } } },
});
```
See `references/imagen-api-quirks.md` for full API quirks and working examples.
**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`
## Limitations
- English prompts only (plus select languages for Nano Banana)
- Maximum 480 tokens per prompt
- Person generation: `allow_adult` default, block children in EU/UK/CH/MENA
- English prompts only (plus select languages)
- Max 480 tokens per prompt
- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
- All images include SynthID watermark
- No transparent backgrounds
- Text in images works best after first generating the text then requesting
image rendering
- Imagen: no img2img, no conversational editing — use Nano Banana for that
- Text rendering may need 2-3 attempts
## Model Switching
## Switch to Google Imagen For
```bash
# Set provider for all image_generate calls:
hermes config set image_gen.provider google-imagen # or: nano-banana
- Initial text-to-image generation from scratch
- When you need precise aspect ratio control from the start
- High-volume batch generation (faster, simpler)
# Nano Banana model selection (if using nano-banana provider):
hermes config set image_gen.nano-banana.model gemini-3.1-flash-image-preview
# Or one-off via env:
GOOGLE_IMAGE_PROVIDER=nano-banana hermes -p game-designer ...
```
Use `google-imagen` skill for initial generation workflows.