Add generate_image tool for Nanobanana Pro / Nanobanana 2 (#11)

danishi · claude · web-flow · commit 5659331ccece · 2026-03-26T00:27:33.000+09:00
* Add generate_image tool for Nanobanana Pro / Nanobanana 2 image generation Reference: https://github.com/danishi/slack-nano-banana-bot-on-google-cloud - Create app/tools/generate_image.py: ADK tool that calls Gemini image generation models via google-genai SDK - Supports gemini-3-pro-image-preview (Nanobanana Pro) for higher quality - Supports gemini-3.1-flash-image-preview (Nanobanana 2) for faster generation - Stores generated images in thread-safe dict for Slack upload - Update app/main.py: register tool, set session_id in state, upload generated images to Slack thread after agent response - Add google-genai>=1.56.0 dependency for image generation support - Add IMAGE_MODEL_NAME env var to .env.example https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq * Update README with image generation feature and files:write scope https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq * Fix: register generate_image directly on agent tools list SkillToolset only exposes skill-related tools (list_skills, load_skill, etc.). Move generate_image to the agent's tools list so it is directly available to the LLM. https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq * Fix: use contextvars instead of session state for image keying tool_context.state did not reliably reflect session.state, causing generated images to be stored under "unknown" key. Switch to contextvars.ContextVar which is set by the request handler and read by the tool within the same async context. https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq * Add IMAGE_MODEL_NAME env var to Cloud Run deploy script https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq * Fix: use content=bytes instead of file=BytesIO for files_upload_v2 BytesIO may not report file length correctly in some slack_sdk versions, causing files.getUploadURLExternal to fail. Pass raw bytes via content parameter instead. https://claude.ai/code/session_01VycrggwZLpx8ZRV8AiWFpq --------- Co-authored-by: Claude <noreply@anthropic.com>
diff --git a/.env.example b/.env.example
@@ -6,5 +6,6 @@ GOOGLE_CLOUD_PROJECT="your-gcp-project"
 GOOGLE_CLOUD_LOCATION="global"
 CLOUD_RUN_LOCATION="asia-northeast1"
 MODEL_NAME="gemini-3.1-pro-preview"
+IMAGE_MODEL_NAME="gemini-3.1-flash-image-preview"
 REACTION_PROCESSING="eyes"
 REACTION_COMPLETED="white_check_mark"
diff --git a/README.md b/README.md
@@ -13,6 +13,10 @@ If you want a simpler, lightweight Slack bot without the ADK framework, check ou
 ## Features
 - Responds to `@mention` messages in Slack channels.
 - Supports text, image, PDF, text file, video, and audio inputs from Slack messages. Files are fetched via authenticated URLs and sent to Gemini for multimodal understanding.
+- **Image generation** via `generate_image` tool using Gemini image generation models:
+  - `gemini-3-pro-image-preview` ([Nanobanana Pro](https://github.com/danishi/slack-nano-banana-bot-on-google-cloud)) — higher quality
+  - `gemini-3.1-flash-image-preview` ([Nanobanana 2](https://github.com/danishi/slack-nano-banana-bot-on-google-cloud)) — faster generation
+  - Generated images are automatically uploaded to the Slack thread.
 - Maintains conversation context by retrieving prior messages in a thread and sending them as conversation history to Gemini.
 - Formats responses using Slack-compatible Markdown for rich text output.
 - FastAPI-based web server suitable for Cloud Run.
@@ -25,6 +29,7 @@ app/
   agents/
     comedian.py     # ex: Comedian agent implementation
   tools/
+    generate_image.py        # ex: Image generation tool (Nanobanana Pro / Nanobanana 2)
     get_current_datetime.py  # ex: Date/time utility tool
   skills/
     greeting-skill/          # ex: Greeting skill (file-based ADK Skill)
@@ -90,6 +95,7 @@ The Agent Development Kit includes a built-in web-based Development UI that you
    - `im:history`
    - `mpim:history`
    - `files:read`
+   - `files:write`
    - `reactions:write`
    - `users:read`
 3. Install the app to your workspace to obtain `SLACK_BOT_TOKEN` and `SLACK_SIGNING_SECRET`.
diff --git a/app/main.py b/app/main.py
@@ -21,6 +21,7 @@
 
 from .agents.comedian import comedian_agent
 from .tools.get_current_datetime import get_current_datetime
+from .tools.generate_image import generate_image, get_and_clear_images, current_session_id
 
 # Environment variables
 load_dotenv()
@@ -171,6 +172,15 @@ async def _populate_session_from_thread(
 - If asked to summarize a thread, list each person's key points by name.
 - Do NOT include the `[Speaker: ...]` tag in your replies.
 
+### Image Generation
+- When the user asks you to create, draw, generate, or design an image, use the `generate_image` tool.
+- Available models:
+  - `gemini-3.1-flash-image-preview` (Nanobanana 2): Fast generation (default)
+  - `gemini-3-pro-image-preview` (Nanobanana Pro): Higher quality
+- If the user requests a specific model or quality level, set the `model` parameter accordingly.
+- Write a detailed, descriptive prompt for best results.
+- Generated images will be automatically uploaded to the Slack thread.
+
 ### Formatting Rules
 - **Headings / emphasis**: Use `*bold*` for section titles or important words.
 - *Italics*: Use `_underscores_` for emphasis when needed.
@@ -182,6 +192,7 @@ async def _populate_session_from_thread(
 Always structure your response clearly, using these rules so it renders correctly in Slack.""",
     tools=[
         skill_toolset,
+        generate_image,
     ],
     sub_agents=[
         comedian_agent,
@@ -237,6 +248,8 @@ async def handle_mention(body, say, client, logger, ack):
     session = await session_service.get_session(
         app_name=APP_NAME, user_id=user_id, session_id=thread_ts
     )
+    # Set context var so generate_image tool can key images by session
+    current_session_id.set(thread_ts)
     if session and not session.events:
         await _populate_session_from_thread(
             session=session,
@@ -267,6 +280,20 @@ async def handle_mention(body, say, client, logger, ack):
         thread_ts=thread_ts,
     )
 
+    # Upload any images generated by the generate_image tool
+    generated_images = get_and_clear_images(thread_ts)
+    for idx, image_bytes in enumerate(generated_images, start=1):
+        try:
+            await client.files_upload_v2(
+                channel=channel,
+                thread_ts=thread_ts,
+                filename=f"generated-image-{idx}.png",
+                title=f"Generated image {idx}",
+                content=image_bytes,
+            )
+        except Exception:
+            logger.exception("Failed to upload generated image %d", idx)
+
     # Add ✅ reaction to indicate the reply is complete
     try:
         await client.reactions_add(channel=channel, timestamp=message_ts, name=REACTION_COMPLETED)
diff --git a/app/tools/generate_image.py b/app/tools/generate_image.py
@@ -0,0 +1,96 @@
+import asyncio
+import contextvars
+import os
+import threading
+from typing import List
+
+from google import genai
+from google.adk.tools import ToolContext
+from google.genai.types import GenerateContentConfig, Modality
+
+# Default image generation model (Nanobanana 2)
+DEFAULT_IMAGE_MODEL = "gemini-3.1-flash-image-preview"
+
+# Thread-safe storage for generated images keyed by session_id
+_generated_images: dict[str, List[bytes]] = {}
+_images_lock = threading.Lock()
+
+# ContextVar set by the request handler before running the agent
+current_session_id: contextvars.ContextVar[str] = contextvars.ContextVar(
+    "current_session_id", default="unknown"
+)
+
+
+def get_and_clear_images(session_id: str) -> List[bytes]:
+    """Retrieve and remove generated images for a session."""
+    with _images_lock:
+        return _generated_images.pop(session_id, [])
+
+
+async def generate_image(prompt: str, tool_context: ToolContext, model: str = ""):
+    """Generates images using Gemini image generation models (Nanobanana Pro / Nanobanana 2).
+
+    Use this tool when the user asks you to create, draw, generate, or design an image.
+
+    Args:
+        prompt: A detailed description of the image to generate.
+        model: The model to use for image generation.
+               Use "gemini-3-pro-image-preview" (Nanobanana Pro) for higher quality.
+               Use "gemini-3.1-flash-image-preview" (Nanobanana 2) for faster generation.
+               Defaults to Nanobanana 2 if not specified.
+    """
+    image_model = model if model else os.environ.get(
+        "IMAGE_MODEL_NAME", DEFAULT_IMAGE_MODEL
+    )
+
+    project_id = os.environ.get("GOOGLE_CLOUD_PROJECT")
+    location = os.environ.get("GOOGLE_CLOUD_LOCATION", "global")
+
+    def call_gemini():
+        client = genai.Client(vertexai=True, project=project_id, location=location)
+        response = client.models.generate_content(
+            model=image_model,
+            contents=prompt,
+            config=GenerateContentConfig(
+                response_modalities=[Modality.TEXT, Modality.IMAGE],
+            ),
+        )
+        return response
+
+    try:
+        response = await asyncio.to_thread(call_gemini)
+    except Exception as e:
+        return {"error": f"Image generation failed: {e}"}
+
+    text_parts = []
+    images = []
+
+    candidates = getattr(response, "candidates", None)
+    if candidates:
+        for part in candidates[0].content.parts or []:
+            if getattr(part, "thought", None):
+                continue
+            if getattr(part, "text", None):
+                text_parts.append(part.text)
+                continue
+            inline = getattr(part, "inline_data", None)
+            if inline and getattr(inline, "data", None):
+                images.append(inline.data)
+
+    if not images:
+        return {
+            "status": "no_image_generated",
+            "text": "\n".join(text_parts) if text_parts else "No image was generated.",
+        }
+
+    # Store images for the main handler to upload to Slack
+    session_id = current_session_id.get()
+    with _images_lock:
+        _generated_images.setdefault(session_id, []).extend(images)
+
+    return {
+        "status": "success",
+        "model": image_model,
+        "image_count": len(images),
+        "text": "\n".join(text_parts) if text_parts else f"{len(images)} image(s) generated successfully.",
+    }
diff --git a/requirements.txt b/requirements.txt
@@ -5,5 +5,6 @@ uvicorn[standard]>=0.41.0
 google-adk>=1.25.1
 httpx>=0.28.1
 python-dotenv>=1.2.1
+google-genai>=1.56.0
 aiohttp>=3.13.3
 pytz>=2025.2
diff --git a/scripts/deploy.sh b/scripts/deploy.sh
@@ -65,7 +65,7 @@ SERVICE_URL=$(gcloud run deploy "${SERVICE_NAME}" \
   --allow-unauthenticated \
   --no-cpu-throttling  \
   --project "${PROJECT_ID}" \
-  --set-env-vars "SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN},SLACK_SIGNING_SECRET=${SLACK_SIGNING_SECRET},GOOGLE_GENAI_USE_VERTEXAI=${GOOGLE_GENAI_USE_VERTEXAI},GOOGLE_CLOUD_PROJECT=${PROJECT_ID},GOOGLE_CLOUD_LOCATION=global,ALLOWED_SLACK_WORKSPACE=${ALLOWED_SLACK_WORKSPACE:-},MODEL_NAME=${MODEL_NAME:-gemini-3.1-pro-preview},REACTION_PROCESSING=${REACTION_PROCESSING:-},REACTION_COMPLETED=${REACTION_COMPLETED:-}" \
+  --set-env-vars "SLACK_BOT_TOKEN=${SLACK_BOT_TOKEN},SLACK_SIGNING_SECRET=${SLACK_SIGNING_SECRET},GOOGLE_GENAI_USE_VERTEXAI=${GOOGLE_GENAI_USE_VERTEXAI},GOOGLE_CLOUD_PROJECT=${PROJECT_ID},GOOGLE_CLOUD_LOCATION=global,ALLOWED_SLACK_WORKSPACE=${ALLOWED_SLACK_WORKSPACE:-},MODEL_NAME=${MODEL_NAME:-gemini-3.1-pro-preview},IMAGE_MODEL_NAME=${IMAGE_MODEL_NAME:-gemini-3.1-flash-image-preview},REACTION_PROCESSING=${REACTION_PROCESSING:-},REACTION_COMPLETED=${REACTION_COMPLETED:-}" \
   --format 'value(status.url)')
 
 echo "--------------------------------------------"