--- name: nano_banana description: Generates or edits an image using the Gemini 2.5 Flash Image (Nano Banana) model via the API. license: Google Generative AI SDK License context: fork --- ## Skill Instructions Provide clear, step-by-step instructions for Claude to follow when this skill is invoked. Use specific, action-oriented language. **Goal:** To generate or edit an image based on a text prompt (and optionally, input images) by calling the Gemini API's dedicated image model. **Steps:** 1. **Analyze the Request:** Determine the user's need: - **Text-to-Image Generation:** Only a text prompt is provided. - **Image Editing/Fusion:** A text prompt and one or more input images (e.g., file paths, URLs, or Base64 data) are provided. 2. **Utilize Tools (if any):** - Use the **Generative AI SDK** (Python, Node.js, etc.) for API interaction. - Use file handling tools (e.g., `read` in a `bash` context, or file I/O libraries in code) to read local image files and convert them into API-compatible `Part` objects. 3. **Process Data:** - **API Configuration:** Ensure the `GEMINI_API_KEY` is loaded from the environment. The model name for the API call must be **`gemini-2.5-flash-image`**. - **Prepare `contents` List:** - **Generation:** `contents = [prompt_text]` - **Editing/Fusion:** `contents = [image_part_1, ..., image_part_n, prompt_text]`. Input images must be placed _before_ the text prompt. - **Execute Call:** Make the `generate_content()` API call with the specified model and contents. - **Decode Output:** Access the generated image data, which is returned as **Base64-encoded binary data** in the `inline_data` field of a `Part` object. Decode this data back into a binary image file (e.g., PNG or JPEG). 4. **Format Output:** Respond to the user with a confirmation message, indicating the successful generation or edit and confirming the location/availability of the resulting image file. If an error occurs (e.g., API key issue, safety violation), report the error clearly. ## Examples Include example inputs and the expected outputs to help Claude understand success. ### Example 1: Basic Input/Output (Text-to-Image) - **User Prompt:** "Generate a surreal image of a golden mechanical banana floating in space near a constellation." - **Expected Behavior:** Claude uses the prompt in the `contents` list, calls the `gemini-2.5-flash-image` model, decodes the Base64 response, and outputs a confirmation like: "Image generation successful. The surreal image has been saved to the working directory." ### Example 2: Edge Case (Image Editing with File Input) - **User Prompt:** "Please edit the image at 'input/photo.jpg' by changing the person's shirt to bright yellow." - **Expected Behavior:** Claude first converts 'input/photo.jpg' into a `Part` object. The API call is made with `contents=[image_part, "change the person's shirt to bright yellow"]`. The model performs the localized edit, and Claude saves the final image and outputs: "Image editing complete. The updated image with the yellow shirt has been saved." ## Best Practices & Constraints - Keep this skill focused on one specific workflow; do not try to make a "Swiss Army knife" skill. - Ensure all referenced files exist in the correct locations within the skill's directory. - Do not hardcode sensitive information like API keys or passwords. - Always specify the full model name: **`gemini-2.5-flash-image`**. --- ## Practical work through Before you can use the model, you need to complete a few setup steps: 1. **Get an API Key**: Obtain a Gemini API key from Google AI Studio. 2. **Install the SDK**: You'll need the appropriate Google Generative AI SDK for your programming language (e.g., `google-generativeai` for Python). 3. **Set Up Environment**: For security, store your API key in an environment variable, typically named `GEMINI_API_KEY`. For Python, you'll generally install it like this: ```bash pip install google-generativeai pillow ``` --- ## 💻 API Usage: Python Example You use the same `generate_content` call as with other Gemini models, but specify the image model and provide your prompt (and optionally, input images). The model name to use is **`gemini-2.5-flash-image`**. ### 1\. Text-to-Image Generation (Simple Prompt) This is a basic text-to-image request. ```python from google import genai from google.genai import types from PIL import Image from io import BytesIO # The client automatically picks up the GEMINI_API_KEY from your environment client = genai.Client() prompt = "A hyper-realistic image of a cat wearing a party hat, sitting on a banana-shaped sofa." response = client.models.generate_content( model="gemini-2.5-flash-image", # The Nano Banana model contents=[prompt], ) # Extract and save the generated image for part in response.candidates[0].content.parts: if part.inline_data is not None: # The image is returned as base64-encoded data image_data = part.inline_data.data image = Image.open(BytesIO(image_data)) image.save("generated_image.png") print("Image generated and saved as generated_image.png") ``` --- ### 2\. Image Editing (Image + Text-to-Image) To edit an existing image, you pass **both the image data and the text prompt** as the `contents`. 1. **Load the Image**: You need a function to convert your local image file into a format the API can accept. 2. **Call the API**: Send the image and your editing instruction. ```python from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai.Client() # Helper function to convert a local file to a Part object for the API def file_to_part(path: str, mime_type: str): """Converts a local file path to a GenerativePart object.""" return types.Part.from_uri(uri=path, mime_type=mime_type) # --- Example of editing a local image --- # NOTE: Replace 'path/to/your/image.png' with a real image file path # For this example to run, you must have an image at this path. image_part = file_to_part(path="path/to/your/image.png", mime_type="image/png") edit_prompt = "Change the background of this image to a vibrant, neon-lit cityscape at night." response = client.models.generate_content( model="gemini-2.5-flash-image", contents=[image_part, edit_prompt], # Pass both the image and the prompt ) # Extract and save the edited image (same logic as before) for part in response.candidates[0].content.parts: if part.inline_data is not None: image_data = part.inline_data.data edited_image = Image.open(BytesIO(image_data)) edited_image.save("edited_image.png") print("Image edited and saved as edited_image.png") ```