Refactor: split google-image-gen-prompting into google-imagen and nano-banana skills
This commit is contained in:
@@ -1,345 +1,123 @@
|
||||
---
|
||||
name: google-image-gen-prompting
|
||||
description: "Generate or refine images in any artstyle, and in multiple formats"
|
||||
version: 3.1.0
|
||||
name: nano-banana
|
||||
description: "Image refinement, img2img, and text-in-image with Gemini Flash/Pro Image"
|
||||
version: 1.0.0
|
||||
author: Kay Kayyali + Hermes Agent
|
||||
license: MIT
|
||||
metadata:
|
||||
hermes:
|
||||
tags: [image-generation, google-imagen, nano-banana, prompt-engineering, pixel-art, two-plugin-workflow, google-genai-sdk]
|
||||
tags: [image-refinement, img2img, nano-banana, gemini]
|
||||
category: creative
|
||||
---
|
||||
|
||||
# Google Image Generation Prompting Guide
|
||||
# Nano Banana (Gemini Flash/Pro Image)
|
||||
|
||||
**Powered by:** Google GenAI SDK (`@google/genai`) — TypeScript CLI handles all API calls.
|
||||
**Powered by:** Google GenAI SDK (`@google/genai`)
|
||||
|
||||
## Quick Setup & Verification
|
||||
**Use for:** Image refinement, img2img, text-in-image, and conversational editing.
|
||||
|
||||
**⚠️ NOT for initial text-to-image generation.** Requires an existing image to refine.
|
||||
|
||||
## Quick Start
|
||||
|
||||
**1. Set API key** (if not already in `~/.hermes/.env`):
|
||||
```bash
|
||||
# Set API key
|
||||
export GOOGLE_API_KEY="your-key-here"
|
||||
# Or add to ~/.hermes/.env: GOOGLE_API_KEY=your-key-here
|
||||
|
||||
# Refine an existing image
|
||||
hermes chat -q "Refine this image to be more vibrant" --attachment /path/to/image.png
|
||||
```
|
||||
|
||||
**2. Verify setup** — run a smoke test:
|
||||
```bash
|
||||
# Using the image_generate tool (recommended):
|
||||
hermes chat -q "Generate a test image: a red cube on white background"
|
||||
## Models
|
||||
|
||||
# Or direct TypeScript CLI test:
|
||||
cd /usr/local/lib/hermes-agent/plugins/image_gen/google-imagen
|
||||
npx ts-node google-image-gen.ts --imagen --prompt "test" --output /tmp/test.png
|
||||
```
|
||||
|
||||
**3. If skill doesn't appear in `hermes skills list`**:
|
||||
```bash
|
||||
# Reload skills from disk:
|
||||
hermes chat -q "/reload-skills"
|
||||
# Or restart the gateway: hermes gateway restart
|
||||
```
|
||||
|
||||
Best practices for generating high-quality images with Google Imagen 4
|
||||
and Nano Banana (Gemini Flash/Pro Image) models. These models are accessed
|
||||
via the `image_generate` tool configured with `image_gen.provider: google`.
|
||||
|
||||
## Two-Plugin Workflow
|
||||
|
||||
**⚠️ Critical:** `nano-banana` is **NOT** for initial text-to-image generation. It is an **image refinement** tool only.
|
||||
|
||||
**Use `google-imagen` for:** Initial text-to-image generation. This is your primary image gen plugin.
|
||||
- Text prompts → images
|
||||
- Best for: getting a base image from scratch
|
||||
- Model: `imagen-4.0-generate-001`
|
||||
- Supports: `--aspect-ratio` (1:1, 16:9, 9:16, 4:3, 3:2, etc.), `--sample-count`, `--negative-prompt`, `--style-reference`
|
||||
|
||||
**Use `nano-banana` for:** Image refinement, img2img, and text-in-image. **Requires an existing image to refine.**
|
||||
- "Keep everything the same but change X"
|
||||
- Style transfer from a reference image
|
||||
- Adding text/logos to images
|
||||
- Iterative conversational editing
|
||||
- Models: `gemini-3.1-flash-image-preview` or `gemini-3-pro-image-preview`
|
||||
- Supports: `--aspect-ratio`, `--image-size` (512, 1K, 2K, 4K), `--grounding`, `--thinking-level`, `--include-thoughts`
|
||||
|
||||
**Typical workflow:**
|
||||
1. Generate base image with `google-imagen` at your desired aspect ratio
|
||||
2. Switch to `nano-banana` for refinements (keeps same dimensions)
|
||||
3. Switch back to `google-imagen` for new generations
|
||||
|
||||
**Config:**
|
||||
```yaml
|
||||
image_gen:
|
||||
provider: google-imagen # or: nano-banana
|
||||
```
|
||||
|
||||
**Switch providers:**
|
||||
```bash
|
||||
# For initial generation:
|
||||
hermes config set image_gen.provider google-imagen
|
||||
|
||||
# For refinement:
|
||||
hermes config set image_gen.provider nano-banana
|
||||
|
||||
# Or one-off via env:
|
||||
GOOGLE_IMAGE_PROVIDER=nano-banana image_generate --prompt "refine this..."
|
||||
```
|
||||
|
||||
## Quick Reference
|
||||
|
||||
### google-imagen (Imagen 4.0)
|
||||
| Model | Best For | Speed |
|
||||
|-------|----------|-------|
|
||||
| `imagen-4.0-generate-001` | Photorealism, high detail, text in images | ~5-15s |
|
||||
|
||||
### nano-banana (Gemini Flash/Pro Image)
|
||||
| Model | Best For | Speed |
|
||||
|-------|----------|-------|
|
||||
| `gemini-3.1-flash-image-preview` | img2img, refinement, text-in-image | ~5-15s |
|
||||
| `gemini-3-pro-image-preview` | Professional quality, complex text | ~15-45s |
|
||||
| `gemini-2.5-flash-image` | Fastest, high-volume | ~3-10s |
|
||||
|
||||
## Core Principle: Describe, Don't List
|
||||
## Supported Parameters
|
||||
|
||||
A narrative paragraph beats keyword soup every time. These models excel
|
||||
at language understanding. Example:
|
||||
- `--aspect-ratio` — `1:1`, `16:9`, `9:16`, `4:3`, `3:2`, `21:9`, etc.
|
||||
- `--image-size` — `512`, `1K`, `2K`, `4K` (resolution control)
|
||||
- `--grounding` — `web`, `image`, or `both` (Google Search grounding)
|
||||
- `--thinking-level` — `minimal` or `high`
|
||||
- `--include-thoughts` — Show model reasoning steps
|
||||
|
||||
**Bad:** `pixel art tank snow bleak cold`
|
||||
## Core Workflows
|
||||
|
||||
**Good:** `A pixel art scene in 16-bit style: a weathered German Panzer IV
|
||||
tank sits motionless on the frozen Russian tundra under a grey sky. Snow
|
||||
drifts against its tracks. Orange glow from a dying campfire. Limited
|
||||
color palette — dark greys, muted blues, pale whites, one point of warm
|
||||
orange. No text.`
|
||||
### 1. Image Refinement
|
||||
"Keep everything the same but change X"
|
||||
|
||||
## Style Recipes
|
||||
Examples:
|
||||
- "Make the sky sunset orange"
|
||||
- "Add more contrast and saturation"
|
||||
- "Change the lighting to golden hour"
|
||||
- "Remove the background clutter"
|
||||
|
||||
### Pixel Art
|
||||
For Iron Requiem and other pixel art games:
|
||||
### 2. img2img / Style Transfer
|
||||
Provide reference image(s) + text prompt:
|
||||
- "Apply this art style to my scene"
|
||||
- "Make it look like a 1950s poster"
|
||||
- "Convert to watercolor painting style"
|
||||
|
||||
```
|
||||
pixel art, 16-bit style, [specific palette description], crisp clean
|
||||
edges, no anti-aliasing, limited color palette, [era] video game
|
||||
aesthetic, sprite art
|
||||
```
|
||||
|
||||
Key modifiers:
|
||||
- `no anti-aliasing` — keeps hard edges
|
||||
- `limited color palette` — enforces pixel look
|
||||
- `N-bit style` (8-bit, 16-bit, 32-bit) — era control
|
||||
- `sprite art` — character/enemy focus
|
||||
- `tile-based` — background emphasis
|
||||
|
||||
### Photography
|
||||
```
|
||||
A photo of [subject], [lens type], [lighting], [camera angle],
|
||||
[detail/focus], [mood], [orientation]
|
||||
```
|
||||
|
||||
Modifiers: `85mm portrait lens`, `golden hour`, `soft box lighting`,
|
||||
`macro lens`, `aerial shot`, `fisheye`, `motion blur`, `bokeh`,
|
||||
`black and white`, `polaroid`
|
||||
|
||||
### Illustration & Art
|
||||
```
|
||||
A [art style] of [subject] in the style of [artist/movement], [medium]
|
||||
```
|
||||
|
||||
Styles: `pencil sketch`, `charcoal drawing`, `pastel painting`,
|
||||
`watercolor`, `digital art`, `isometric 3D`, `art deco poster`,
|
||||
`impressionist painting`, `renaissance painting`
|
||||
|
||||
### Product Mockups
|
||||
```
|
||||
A studio photograph of [product], [material], on [surface].
|
||||
[Lighting setup]. [Camera angle]. [Background]
|
||||
```
|
||||
|
||||
### Text in Images (Nano Banana)
|
||||
### 3. Text-in-Image
|
||||
Add logos, posters, signs, menus:
|
||||
- Keep text under 25 characters
|
||||
- Specify font style descriptively: `bold sans-serif`, `elegant serif`,
|
||||
`handwritten script`
|
||||
- Specify font: `bold sans-serif`, `elegant serif`, `handwritten script`
|
||||
- Use keywords: `poster`, `logo`, `magazine cover`, `menu`
|
||||
- Include font size: `small`, `medium`, `large`
|
||||
- Iterate — text rendering may need 2-3 attempts
|
||||
- Include size: `small`, `medium`, `large`
|
||||
- May need 2-3 iterations
|
||||
|
||||
### 4. Conversational Editing
|
||||
Iterate naturally:
|
||||
1. "Generate a forest scene"
|
||||
2. "Make it mistier"
|
||||
3. "Add a stone altar in the center"
|
||||
4. "Now place a glowing sword on the altar"
|
||||
|
||||
### 5. Multi-Reference Composition
|
||||
Up to 14 reference images for character consistency, scene composition, or style blending.
|
||||
|
||||
### 6. Google Search Grounding
|
||||
Real-time data in images:
|
||||
- `--grounding web` — Current events, weather, news
|
||||
- `--grounding image` — Visual search results
|
||||
- `--grounding both` — Combined search
|
||||
|
||||
## Aspect Ratios
|
||||
|
||||
Hermes `aspect_ratio` → Google format:
|
||||
| Hermes | Google |
|
||||
|--------|--------|
|
||||
| `landscape` | `16:9` |
|
||||
| `square` | `1:1` |
|
||||
| `portrait` | `9:16` |
|
||||
|
||||
Nano Banana also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
|
||||
Also supports: `4:3`, `3:4`, `21:9`, `1:4`, `4:1`, `1:8`, `8:1`
|
||||
|
||||
## Nano Banana Superpowers
|
||||
## Prompt Engineering
|
||||
|
||||
Nano Banana (Gemini Flash/Pro Image) has capabilities Imagen lacks:
|
||||
**Be hyper-specific:** "ornate elven plate armor etched with silver leaf patterns" beats "fantasy armor"
|
||||
|
||||
1. **Image refinement** — take an initial image and ask for changes:
|
||||
"Keep everything the same but change the sky to sunset orange"
|
||||
**Provide context:** "Create a logo for a high-end minimalist skincare brand" beats "Create a logo"
|
||||
|
||||
2. **img2img / style transfer** — provide reference image + text prompt
|
||||
**Use semantic negatives:** Describe what you WANT, not what you don't: "an empty deserted street" not "no cars"
|
||||
|
||||
3. **Text in images** — logos, posters, menus, infographics
|
||||
|
||||
4. **Google Search grounding** — real-time data in images (weather, news, stocks)
|
||||
|
||||
5. **Multi-reference composition** — up to 14 images for character consistency
|
||||
|
||||
6. **Step-by-step instructions** — "First, create a misty forest. Then add
|
||||
a stone altar. Finally, place a glowing sword on the altar."
|
||||
|
||||
## Prompt Engineering Tips
|
||||
|
||||
1. **Be hyper-specific** — "ornate elven plate armor etched with silver
|
||||
leaf patterns" beats "fantasy armor"
|
||||
|
||||
2. **Provide context and intent** — "Create a logo for a high-end,
|
||||
minimalist skincare brand" beats "Create a logo"
|
||||
|
||||
3. **Iterate conversationally** (Nano Banana) — "That's great, but can
|
||||
you make the lighting warmer?"
|
||||
|
||||
4. **Use semantic negatives** — describe what you WANT, not what you
|
||||
don't: "an empty deserted street" not "no cars"
|
||||
|
||||
5. **Control the camera** — `wide-angle shot`, `macro shot`,
|
||||
`low-angle perspective`
|
||||
|
||||
6. **Max prompt length**: 480 tokens for Imagen
|
||||
|
||||
## Gritty Dark Style (Iron Requiem Art Direction)
|
||||
|
||||
Kay prefers a **dark, gritty, weathered** aesthetic over clean pixel art.
|
||||
The tank should look battle-scarred — rust, scorch marks, mud, oil stains,
|
||||
smoke. Lighting is moody with deep shadows. Color palette is desaturated:
|
||||
dirty browns, rusted greys, oil blacks. This is a war machine that has
|
||||
been fighting for weeks in frozen hell.
|
||||
|
||||
Key gritty prompt modifiers:
|
||||
- `dark gritty pixel art`, `weathered battle-damaged`, `rust and scorch marks`
|
||||
- `dark moody lighting with deep shadows`, `mud-splattered`, `smoke rising`
|
||||
- `muted desaturated colors`, `dirty browns and rusted greys and oil blacks`
|
||||
- `grim war-torn atmosphere`
|
||||
|
||||
Proven prompts that produced good results:
|
||||
|
||||
```
|
||||
dark gritty pixel art, side view of a weathered battle-damaged tank,
|
||||
rust and scorch marks on armor, dark moody lighting with deep shadows,
|
||||
mud-splattered tracks, smoke rising from engine deck, grim war-torn
|
||||
atmosphere, muted desaturated colors, dirty browns and rusted greys
|
||||
and oil blacks, 2D game art style, no text
|
||||
```
|
||||
|
||||
```
|
||||
dark gritty pixel art, top-down view of a battle-scarred tank,
|
||||
rusted armor plates, oil stains, deep shadows, mud and dirt texture
|
||||
on hull, open commander hatch showing darkness inside, muted war-torn
|
||||
color palette of rust browns, dirty greys, oil blacks, grim atmosphere,
|
||||
2D game sprite, no text, no background
|
||||
```
|
||||
|
||||
## Iron Requiem Pixel Art Prompts
|
||||
|
||||
Templates for the game designer:
|
||||
|
||||
```
|
||||
Tank in tundra: pixel art, 16-bit, side view of a Panzer IV tank
|
||||
half-buried in snow on the Russian tundra, grey overcast sky,
|
||||
muzzle flash from the main gun, limited palette of steel greys,
|
||||
ice blues, off-whites, and one point of orange fire, no text,
|
||||
crisp edges, sprite art scale
|
||||
|
||||
Enemy Type 59: pixel art, 16-bit, isometric view of a Chinese
|
||||
Type 59 tank advancing through snow, red star markings on turret,
|
||||
platoon formation visible in background, cold war aesthetic,
|
||||
limited palette adding olive green and red to the tundra tones,
|
||||
bullet hell projectiles as orange dots, no text
|
||||
|
||||
Commander portrait: pixel art, 32-bit, portrait of a weary German
|
||||
tank commander, late 30s, stubble, hollow eyes, looking through
|
||||
a periscope, dim green glow from the optics, limited palette of
|
||||
dark greys and muted greens, dialogue box ready, visual novel style,
|
||||
no text
|
||||
```
|
||||
|
||||
## Imagen API Response Field (Pitfall)
|
||||
|
||||
The Imagen REST API (`:predict` endpoint) returns base64 image data in
|
||||
the field `bytesBase64Encoded`, **NOT** `imageBytes` or `image.imageBytes`.
|
||||
This is different from the Imagen GenAI SDK (which wraps it in an `image`
|
||||
object). When writing plugins or calling the REST API directly, use:
|
||||
|
||||
```python
|
||||
# Correct:
|
||||
b64_bytes = pred["bytesBase64Encoded"]
|
||||
|
||||
# WRONG (silently produces empty response):
|
||||
image_obj = pred.get("image", {})
|
||||
b64_bytes = image_obj.get("imageBytes", "")
|
||||
```
|
||||
|
||||
**See `references/imagen-api-response-structure.md`** for the full response shape, common bug patterns, and verification commands.
|
||||
|
||||
## TypeScript CLI Troubleshooting
|
||||
|
||||
The shared TypeScript CLI (`google-image-gen.ts`) handles all API calls.
|
||||
Common issues:
|
||||
|
||||
**TypeScript compile errors** — if you see `error TS18046: 'error' is of type 'unknown'`:
|
||||
- Add type assertions: `as any` for JSON results, `(): unknown =>` for catch blocks
|
||||
- The script uses ES modules — ensure ts-node is installed
|
||||
|
||||
**Safety setting error** — `Error 400: Only block_low_and_above is supported`:
|
||||
- The API requires `safetySetting: 'block_low_and_above'`
|
||||
- Other values (`block_some`, `block_most`, etc.) are rejected
|
||||
|
||||
**Empty response with no error** — check that `GOOGLE_API_KEY` is passed to the TS script:
|
||||
```bash
|
||||
GOOGLE_API_KEY="${GOOGLE_API_KEY}" npx ts-node google-image-gen.ts ...
|
||||
```
|
||||
|
||||
**⚠️ Pitfall: Use the SDK, not REST API directly** — The Google GenAI SDK
|
||||
(`@google/genai`) handles all API transformation internally. Do NOT curl the
|
||||
REST endpoint directly — it accepts different parameter formats than the SDK.
|
||||
If you see `Invalid value at 'generation_config.response_format.image.aspect_ratio'`,
|
||||
you're using REST when you should be using the SDK. The SDK example:
|
||||
|
||||
```typescript
|
||||
import { GoogleGenAI } from "@google/genai";
|
||||
const ai = new GoogleGenAI({});
|
||||
const response = await ai.models.generateContent({
|
||||
model: 'gemini-3.1-flash-image-preview',
|
||||
contents: [{ text: prompt }],
|
||||
config: { responseFormat: { image: { aspectRatio: '16:9' } } },
|
||||
});
|
||||
```
|
||||
|
||||
See `references/imagen-api-quirks.md` for full API quirks and working examples.
|
||||
**Control the camera:** `wide-angle shot`, `macro shot`, `low-angle perspective`
|
||||
|
||||
## Limitations
|
||||
|
||||
- English prompts only (plus select languages for Nano Banana)
|
||||
- Maximum 480 tokens per prompt
|
||||
- Person generation: `allow_adult` default, block children in EU/UK/CH/MENA
|
||||
- English prompts only (plus select languages)
|
||||
- Max 480 tokens per prompt
|
||||
- Person generation: `allow_adult` default, blocks children in EU/UK/CH/MENA
|
||||
- All images include SynthID watermark
|
||||
- No transparent backgrounds
|
||||
- Text in images works best after first generating the text then requesting
|
||||
image rendering
|
||||
- Imagen: no img2img, no conversational editing — use Nano Banana for that
|
||||
- Text rendering may need 2-3 attempts
|
||||
|
||||
## Model Switching
|
||||
## Switch to Google Imagen For
|
||||
|
||||
```bash
|
||||
# Set provider for all image_generate calls:
|
||||
hermes config set image_gen.provider google-imagen # or: nano-banana
|
||||
- Initial text-to-image generation from scratch
|
||||
- When you need precise aspect ratio control from the start
|
||||
- High-volume batch generation (faster, simpler)
|
||||
|
||||
# Nano Banana model selection (if using nano-banana provider):
|
||||
hermes config set image_gen.nano-banana.model gemini-3.1-flash-image-preview
|
||||
|
||||
# Or one-off via env:
|
||||
GOOGLE_IMAGE_PROVIDER=nano-banana hermes -p game-designer ...
|
||||
```
|
||||
Use `google-imagen` skill for initial generation workflows.
|
||||
|
||||
Reference in New Issue
Block a user