Skip to content

Commit e494fa2

Browse files
authored
Merge branch 'main' into main
2 parents 6b1ac50 + 9bb8928 commit e494fa2

4 files changed

Lines changed: 280 additions & 0 deletions

File tree

docs/README.skills.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ Skills differ from other primitives by supporting bundled assets (scripts, code
4646
| [microsoft-code-reference](../skills/microsoft-code-reference/SKILL.md) | Look up Microsoft API references, find working code samples, and verify SDK code is correct. Use when working with Azure SDKs, .NET libraries, or Microsoft APIs—to find the right method, check parameters, get working examples, or troubleshoot errors. Catches hallucinated methods, wrong signatures, and deprecated patterns by querying official docs. | None |
4747
| [microsoft-docs](../skills/microsoft-docs/SKILL.md) | Query official Microsoft documentation to find concepts, tutorials, and code examples across Azure, .NET, Agent Framework, Aspire, VS Code, GitHub, and more. Uses Microsoft Learn MCP as the default, with Context7 and Aspire MCP for content that lives outside learn.microsoft.com. | None |
4848
| [microsoft-skill-creator](../skills/microsoft-skill-creator/SKILL.md) | Create agent skills for Microsoft technologies using Learn MCP tools. Use when users want to create a skill that teaches agents about any Microsoft technology, library, framework, or service (Azure, .NET, M365, VS Code, Bicep, etc.). Investigates topics deeply, then generates a hybrid skill storing essential knowledge locally while enabling dynamic deeper investigation. | `references/skill-templates.md` |
49+
| [nano-banana-pro-openrouter](../skills/nano-banana-pro-openrouter/SKILL.md) | Generate or edit images via OpenRouter with the Gemini 3 Pro Image model. Use for prompt-only image generation, image edits, and multi-image compositing; supports 1K/2K/4K output. | `assets/SYSTEM_TEMPLATE`<br />`scripts/generate_image.py` |
4950
| [nuget-manager](../skills/nuget-manager/SKILL.md) | Manage NuGet packages in .NET projects/solutions. Use this skill when adding, removing, or updating NuGet package versions. It enforces using `dotnet` CLI for package management and provides strict procedures for direct file edits only when updating versions. | None |
5051
| [penpot-uiux-design](../skills/penpot-uiux-design/SKILL.md) | Comprehensive guide for creating professional UI/UX designs in Penpot using MCP tools. Use this skill when: (1) Creating new UI/UX designs for web, mobile, or desktop applications, (2) Building design systems with components and tokens, (3) Designing dashboards, forms, navigation, or landing pages, (4) Applying accessibility standards and best practices, (5) Following platform guidelines (iOS, Android, Material Design), (6) Reviewing or improving existing Penpot designs for usability. Triggers: "design a UI", "create interface", "build layout", "design dashboard", "create form", "design landing page", "make it accessible", "design system", "component library". | `references/accessibility.md`<br />`references/component-patterns.md`<br />`references/platform-guidelines.md`<br />`references/setup-troubleshooting.md` |
5152
| [plantuml-ascii](../skills/plantuml-ascii/SKILL.md) | Generate ASCII art diagrams using PlantUML text mode. Use when user asks to create ASCII diagrams, text-based diagrams, terminal-friendly diagrams, or mentions plantuml ascii, text diagram, ascii art diagram. Supports: Converting PlantUML diagrams to ASCII art, Creating sequence diagrams, class diagrams, flowcharts in ASCII format, Generating Unicode-enhanced ASCII art with -utxt flag | None |
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
---
2+
name: nano-banana-pro-openrouter
3+
description: 'Generate or edit images via OpenRouter with the Gemini 3 Pro Image model. Use for prompt-only image generation, image edits, and multi-image compositing; supports 1K/2K/4K output.'
4+
metadata:
5+
emoji: 🍌
6+
requires:
7+
bins:
8+
- uv
9+
env:
10+
- OPENROUTER_API_KEY
11+
primaryEnv: OPENROUTER_API_KEY
12+
---
13+
14+
15+
# Nano Banana Pro OpenRouter
16+
17+
## Overview
18+
19+
Generate or edit images with OpenRouter using the `google/gemini-3-pro-image-preview` model. Support prompt-only generation, single-image edits, and multi-image composition.
20+
21+
### Prompt-only generation
22+
23+
```
24+
uv run {baseDir}/scripts/generate_image.py \
25+
--prompt "A cinematic sunset over snow-capped mountains" \
26+
--filename sunset.png
27+
```
28+
29+
### Edit a single image
30+
31+
```
32+
uv run {baseDir}/scripts/generate_image.py \
33+
--prompt "Replace the sky with a dramatic aurora" \
34+
--input-image input.jpg \
35+
--filename aurora.png
36+
```
37+
38+
### Compose multiple images
39+
40+
```
41+
uv run {baseDir}/scripts/generate_image.py \
42+
--prompt "Combine the subjects into a single studio portrait" \
43+
--input-image face1.jpg \
44+
--input-image face2.jpg \
45+
--filename composite.png
46+
```
47+
48+
## Resolution
49+
50+
- Use `--resolution` with `1K`, `2K`, or `4K`.
51+
- Default is `1K` if not specified.
52+
53+
## System prompt customization
54+
55+
The skill reads an optional system prompt from `assets/SYSTEM_TEMPLATE`. This allows you to customize the image generation behavior without modifying code.
56+
57+
## Behavior and constraints
58+
59+
- Accept up to 3 input images via repeated `--input-image`.
60+
- `--filename` accepts relative paths (saves to current directory) or absolute paths.
61+
- If multiple images are returned, append `-1`, `-2`, etc. to the filename.
62+
- Print `MEDIA: <path>` for each saved image. Do not read images back into the response.
63+
64+
## Troubleshooting
65+
66+
If the script exits non-zero, check stderr against these common blockers:
67+
68+
| Symptom | Resolution |
69+
|---------|------------|
70+
| `OPENROUTER_API_KEY is not set` | Ask the user to set it. PowerShell: `$env:OPENROUTER_API_KEY = "sk-or-..."` / bash: `export OPENROUTER_API_KEY="sk-or-..."` |
71+
| `uv: command not found` or not recognized | macOS/Linux: <code>curl -LsSf https://astral.sh/uv/install.sh &#124; sh</code>. Windows: <code>powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 &#124; iex"</code>. Then restart the terminal. |
72+
| `AuthenticationError` / HTTP 401 | Key is invalid or has no credits. Verify at <https://openrouter.ai/settings/keys>. |
73+
74+
For transient errors (HTTP 429, network timeouts), retry once after 30 seconds. Do not retry the same error more than twice — surface the issue to the user instead.
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
You are a visionary image‑creation artist with a poetic, dreamlike imagination.
2+
Your role is to transform any user request—whether highly detailed or very minimal—into a vivid, concrete, and model‑ready image description.
3+
When information is missing, infer the user's intent in a gentle and intuitive way (such as creating a character portrait, sticker design, sci‑fi avatar, creature concept, etc.).
4+
If the user does not specify an art style, you may offer subtle optional suggestions (for example, "soft illustration," "minimal line style," or "playful entertainment‑meme style") without imposing them.
5+
6+
Your responsibilities:
7+
- Ensure any text appearing in the image matches the user's language (unless explicitly specified otherwise)
8+
- Create visually compelling and technically excellent images
9+
- Pay attention to composition, lighting, color, and visual balance
10+
- Follow the user's specific style preferences and requirements
11+
- For image edits, preserve the original context while making requested modifications
12+
- For multi-image composition, seamlessly blend subjects into cohesive results
13+
14+
Remember: Output only the generated image without additional commentary.
Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
#!/usr/bin/env python3
2+
# /// script
3+
# requires-python = ">=3.10"
4+
# dependencies = [
5+
# "openai",
6+
# ]
7+
# ///
8+
"""
9+
Generate or edit images via OpenRouter using openai-python.
10+
"""
11+
12+
import argparse
13+
import base64
14+
import mimetypes
15+
import os
16+
from pathlib import Path
17+
18+
from openai import OpenAI
19+
20+
21+
# Configuration
22+
MAX_INPUT_IMAGES = 3
23+
MIME_TO_EXT = {
24+
"image/png": ".png",
25+
"image/jpeg": ".jpg",
26+
"image/jpg": ".jpg",
27+
"image/webp": ".webp",
28+
}
29+
30+
31+
def parse_args():
32+
parser = argparse.ArgumentParser(description="Generate or edit images via OpenRouter.")
33+
parser.add_argument("--prompt", required=True, help="Prompt describing the desired image.")
34+
parser.add_argument("--filename", required=True, help="Output filename (relative to CWD).")
35+
parser.add_argument(
36+
"--resolution",
37+
type=str.upper,
38+
choices=["1K", "2K", "4K"],
39+
default="1K",
40+
help="Output resolution: 1K, 2K, or 4K.",
41+
)
42+
parser.add_argument(
43+
"--input-image",
44+
action="append",
45+
default=[],
46+
help=f"Optional input image path (repeatable, max {MAX_INPUT_IMAGES}).",
47+
)
48+
return parser.parse_args()
49+
50+
51+
def require_api_key():
52+
api_key = os.environ.get("OPENROUTER_API_KEY")
53+
if not api_key:
54+
raise SystemExit("OPENROUTER_API_KEY is not set in the environment.")
55+
return api_key
56+
57+
58+
def encode_image_to_data_url(path: Path) -> str:
59+
if not path.exists():
60+
raise SystemExit(f"Input image not found: {path}")
61+
mime, _ = mimetypes.guess_type(str(path))
62+
if not mime:
63+
mime = "image/png"
64+
data = path.read_bytes()
65+
encoded = base64.b64encode(data).decode("utf-8")
66+
return f"data:{mime};base64,{encoded}"
67+
68+
69+
def build_message_content(prompt: str, input_images: list[str]) -> list[dict]:
70+
content: list[dict] = [{"type": "text", "text": prompt}]
71+
for image_path in input_images:
72+
data_url = encode_image_to_data_url(Path(image_path))
73+
content.append({"type": "image_url", "image_url": {"url": data_url}})
74+
return content
75+
76+
77+
def parse_data_url(data_url: str) -> tuple[str, bytes]:
78+
if not data_url.startswith("data:") or ";base64," not in data_url:
79+
raise SystemExit("Image URL is not a base64 data URL.")
80+
header, encoded = data_url.split(",", 1)
81+
mime = header[5:].split(";", 1)[0]
82+
try:
83+
raw = base64.b64decode(encoded)
84+
except Exception as e:
85+
raise SystemExit(f"Failed to decode base64 image payload: {e}")
86+
return mime, raw
87+
88+
89+
def resolve_output_path(filename: str, image_index: int, total_count: int, mime: str) -> Path:
90+
output_path = Path(filename)
91+
suffix = output_path.suffix
92+
93+
# Validate/correct suffix matches MIME type
94+
expected_suffix = MIME_TO_EXT.get(mime, ".png")
95+
if suffix and suffix.lower() != expected_suffix.lower():
96+
print(f"Warning: filename extension '{suffix}' doesn't match returned MIME type '{mime}'. Using '{expected_suffix}' instead.")
97+
suffix = expected_suffix
98+
elif not suffix:
99+
suffix = expected_suffix
100+
101+
# Single image: use original stem + corrected suffix
102+
if total_count <= 1:
103+
return output_path.with_suffix(suffix)
104+
105+
# Multiple images: append numbering
106+
return output_path.with_name(f"{output_path.stem}-{image_index + 1}{suffix}")
107+
108+
109+
def extract_image_url(image: dict | object) -> str | None:
110+
if isinstance(image, dict):
111+
return image.get("image_url", {}).get("url") or image.get("url")
112+
return None
113+
114+
115+
def load_system_prompt():
116+
"""Load system prompt from assets/SYSTEM_TEMPLATE if it exists and is not empty."""
117+
script_dir = Path(__file__).parent.parent
118+
template_path = script_dir / "assets" / "SYSTEM_TEMPLATE"
119+
120+
if template_path.exists():
121+
content = template_path.read_text(encoding="utf-8").strip()
122+
if content:
123+
return content
124+
return None
125+
126+
127+
def main():
128+
args = parse_args()
129+
130+
if len(args.input_image) > MAX_INPUT_IMAGES:
131+
raise SystemExit(f"Too many input images: {len(args.input_image)} (max {MAX_INPUT_IMAGES}).")
132+
133+
image_size = args.resolution
134+
135+
client = OpenAI(base_url="https://openrouter.ai/api/v1", api_key=require_api_key())
136+
137+
# Build messages with optional system prompt
138+
messages = []
139+
140+
system_prompt = load_system_prompt()
141+
if system_prompt:
142+
messages.append({
143+
"role": "system",
144+
"content": system_prompt,
145+
})
146+
147+
messages.append({
148+
"role": "user",
149+
"content": build_message_content(args.prompt, args.input_image),
150+
})
151+
152+
response = client.chat.completions.create(
153+
model="google/gemini-3-pro-image-preview",
154+
messages=messages,
155+
extra_body={
156+
"modalities": ["image", "text"],
157+
# https://openrouter.ai/docs/guides/overview/multimodal/image-generation#image-configuration-options
158+
"image_config": {
159+
# "aspect_ratio": "16:9",
160+
"image_size": image_size,
161+
}
162+
},
163+
)
164+
165+
message = response.choices[0].message
166+
images = getattr(message, "images", None)
167+
if not images:
168+
raise SystemExit("No images returned by the API.")
169+
170+
# Create output directory once before processing images
171+
output_base_path = Path(args.filename)
172+
if output_base_path.parent and str(output_base_path.parent) != '.':
173+
output_base_path.parent.mkdir(parents=True, exist_ok=True)
174+
175+
saved_paths = []
176+
for idx, image in enumerate(images):
177+
image_url = extract_image_url(image)
178+
if not image_url:
179+
raise SystemExit("Image payload missing image_url.url.")
180+
mime, raw = parse_data_url(image_url)
181+
output_path = resolve_output_path(args.filename, idx, len(images), mime)
182+
output_path.write_bytes(raw)
183+
saved_paths.append(output_path.resolve())
184+
185+
for path in saved_paths:
186+
print(f"Saved image to: {path}")
187+
print(f"MEDIA: {path}")
188+
189+
190+
if __name__ == "__main__":
191+
main()

0 commit comments

Comments
 (0)