1. The May 2026 Migration Checklist
The Gemini API has evolved significantly. If you have existing code using the older patterns, here is the systematic migration path to the current SDK:
Api-Revision: 2026-05-20) introduces renamed parameters and new response structures. Older code will continue to work temporarily but should be migrated for long-term stability.
1.1 Before/After Comparison
| Concept | Old Pattern | New Pattern (May 2026) |
|---|---|---|
| Import | import google.generativeai as genai | from google import genai |
| Client | genai.configure(api_key=...) | client = genai.Client() |
| Model | genai.GenerativeModel("gemini-pro") | client.models.generate_content(model=...) |
| Input | contents=[...] | contents="..." or input="..." |
| Config | generation_config={...} | config=types.GenerateContentConfig(...) |
| JSON mode | response_mime_type="application/json" | response_format=types.JsonSchema(...) |
| Multi-turn | chat = model.start_chat(history=...) | previous_interaction_id=... |
# ❌ OLD PATTERN (deprecated)
# import google.generativeai as genai
# genai.configure(api_key="YOUR_KEY")
# model = genai.GenerativeModel("gemini-pro")
# response = model.generate_content("Hello")
# print(response.text)
# ✅ NEW PATTERN (current SDK - May 2026)
from google import genai
from google.genai import types
client = genai.Client() # Reads GEMINI_API_KEY from env
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Hello, world!",
config=types.GenerateContentConfig(
temperature=0.7,
max_output_tokens=1024
)
)
print(response.text)
# ❌ OLD: Manual multi-turn with history array
# chat = model.start_chat(history=[])
# response1 = chat.send_message("My name is Alice")
# response2 = chat.send_message("What's my name?")
# ✅ NEW: Server-managed state via Interactions API
from google import genai
client = genai.Client()
# First turn
interaction1 = client.interactions.create(
model="gemini-3.5-flash",
input="My name is Alice."
)
print(f"Turn 1: {interaction1.output_text}")
# Second turn — server remembers context automatically
interaction2 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=interaction1.id,
input="What's my name?"
)
print(f"Turn 2: {interaction2.output_text}")
# Output: "Your name is Alice."
2. GCP OAuth & Service Accounts
For enterprise deployments, move from API keys to OAuth service accounts. This provides fine-grained access control, audit logging, and integration with Google Cloud’s IAM system:
from google import genai
# Option 1: Vertex AI with Application Default Credentials (ADC)
# Requires: gcloud auth application-default login
client = genai.Client(
vertexai=True,
project="my-gcp-project-id",
location="us-central1"
)
# Now all requests go through Vertex AI with full IAM controls
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Explain Kubernetes RBAC in one paragraph."
)
print(response.text)
# Set up Application Default Credentials
gcloud auth application-default login
# Or use a service account key (for CI/CD pipelines)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
# Verify authentication
gcloud auth application-default print-access-token
2.1 IAM Roles & Cloud Logging
# Grant Vertex AI User role to a service account
gcloud projects add-iam-policy-binding my-gcp-project-id \
--member="serviceAccount:my-app@my-gcp-project-id.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
# Enable Cloud Logging for audit trails
gcloud services enable logging.googleapis.com
gcloud services enable aiplatform.googleapis.com
# View Gemini API logs
gcloud logging read 'resource.type="aiplatform.googleapis.com/Endpoint"' \
--limit=10 --format=json
From 99% to 99.95% Uptime
A SaaS company improved their Gemini-powered feature from 99% to 99.95% availability through: multi-region deployment with automatic failover, circuit breakers that fall back to cached responses, request queuing during rate limit periods, and a “degraded mode” that serves simpler responses when the primary model is unavailable.
3. OpenAI Compatibility Layer
Gemini offers an OpenAI-compatible endpoint, enabling you to use the OpenAI SDK with Gemini models as a drop-in replacement. This is ideal for migrating existing OpenAI-based applications:
3.1 Supported Endpoints
from openai import OpenAI
# Point the OpenAI SDK at Gemini's compatibility endpoint
client = OpenAI(
api_key="YOUR_GEMINI_API_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
# Use standard OpenAI SDK methods — they work with Gemini!
response = client.chat.completions.create(
model="gemini-3.5-flash",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What are the benefits of microservices?"}
],
temperature=0.7,
max_tokens=500
)
print(response.choices[0].message.content)
from openai import OpenAI
# Embeddings via OpenAI compatibility
client = OpenAI(
api_key="YOUR_GEMINI_API_KEY",
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
# Generate embeddings using Gemini's embedding model
embedding_response = client.embeddings.create(
model="text-embedding-004",
input="The quick brown fox jumps over the lazy dog."
)
print(f"Embedding dimension: {len(embedding_response.data[0].embedding)}")
print(f"First 5 values: {embedding_response.data[0].embedding[:5]}")
chat.completions, embeddings, and models.list. It does NOT support assistants, threads, files, or fine-tuning endpoints. For those features, use the native Gemini SDK (Interactions API, Files API, etc.).
4. LangChain & LangGraph Integration
LangChain provides a first-class Gemini integration via the langchain-google-genai package:
pip install langchain-google-genai langchain langgraph
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.messages import HumanMessage, SystemMessage
# Initialize Gemini as a LangChain ChatModel
llm = ChatGoogleGenerativeAI(
model="gemini-3.5-flash",
temperature=0.3
)
# Use in a standard LangChain chain
messages = [
SystemMessage(content="You are a technical architect."),
HumanMessage(content="Design a cache invalidation strategy for a CDN.")
]
response = llm.invoke(messages)
print(response.content)
4.1 LangGraph: Cyclic Agent Workflows
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
# Define state
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
next_step: str
# Initialize Gemini
llm = ChatGoogleGenerativeAI(model="gemini-3.5-flash", temperature=0)
# Define nodes
def researcher(state: AgentState) -> AgentState:
"""Research node — gathers information."""
response = llm.invoke(f"Research this topic: {state['messages'][-1]}")
return {"messages": [f"[Research] {response.content}"], "next_step": "writer"}
def writer(state: AgentState) -> AgentState:
"""Writer node — synthesizes research into content."""
response = llm.invoke(f"Write a summary based on: {state['messages'][-1]}")
return {"messages": [f"[Draft] {response.content}"], "next_step": "end"}
# Build graph
workflow = StateGraph(AgentState)
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", END)
# Compile and run
app = workflow.compile()
result = app.invoke({"messages": ["Quantum computing applications in drug discovery"], "next_step": "researcher"})
print(result["messages"][-1])
5. CrewAI, LlamaIndex & Vercel AI SDK
5.1 CrewAI: Multi-Agent Teams
from crewai import Agent, Task, Crew
from langchain_google_genai import ChatGoogleGenerativeAI
# Gemini as the LLM backbone for CrewAI agents
llm = ChatGoogleGenerativeAI(model="gemini-3.5-flash", temperature=0.4)
# Define specialized agents
researcher = Agent(
role="Senior Research Analyst",
goal="Find comprehensive market data on renewable energy trends",
backstory="Expert at analyzing market reports and extracting key insights.",
llm=llm,
verbose=True
)
writer = Agent(
role="Content Strategist",
goal="Transform research into an executive brief",
backstory="Skilled at distilling complex data into clear executive summaries.",
llm=llm,
verbose=True
)
# Define tasks
research_task = Task(
description="Research the global renewable energy market for 2026. Focus on solar and wind.",
agent=researcher,
expected_output="A detailed research brief with statistics and trends."
)
writing_task = Task(
description="Write a 1-page executive summary from the research findings.",
agent=writer,
expected_output="A polished executive brief suitable for C-suite."
)
# Assemble crew
crew = Crew(agents=[researcher, writer], tasks=[research_task, writing_task], verbose=True)
result = crew.kickoff()
print(result)
5.2 LlamaIndex: Multi-Modal RAG
from llama_index.llms.gemini import Gemini
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Initialize Gemini LLM for LlamaIndex
llm = Gemini(model="models/gemini-3.5-flash", temperature=0.2)
# Load documents and create index
documents = SimpleDirectoryReader("./data/").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query with Gemini
query_engine = index.as_query_engine(llm=llm)
response = query_engine.query("What are the key findings from the Q1 report?")
print(response)
5.3 Vercel AI SDK: Streaming in Next.js
// app/api/chat/route.ts (Next.js App Router)
import { google } from "@ai-sdk/google";
import { streamText } from "ai";
export async function POST(req) {
const { messages } = await req.json();
const result = streamText({
model: google("gemini-3.5-flash"),
messages: messages,
system: "You are a helpful assistant for a SaaS product.",
});
return result.toDataStreamResponse();
}
// components/Chat.tsx (React client component)
"use client";
import { useChat } from "ai/react";
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: "/api/chat"
});
return (
<div>
{messages.map(m => (
<div key={m.id}>
<strong>{m.role}:</strong> {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} placeholder="Ask anything..." />
<button type="submit">Send</button>
</form>
</div>
);
}
6. Production Deployment Checklist
Before going live, verify every item on this checklist:
- Rotate API keys every 90 days (or use OAuth service accounts)
- Restrict keys by IP address and referrer
- Never expose keys in client-side code (use ephemeral tokens for Live API)
- Enable VPC Service Controls for data residency
- Log all requests with token counts and latency metrics
- Set billing alerts at 50%, 80%, and 100% of budget
- Implement circuit breakers for cascading failure protection
- Configure fallback models (e.g., Flash Lite if Flash is unavailable)
- Retry with exponential backoff (max 5 retries, 32s cap)
- Use context caching for repeated prefixes (>1024 tokens)
- Use Flex inference for non-urgent workloads (50% savings)
- Use Batch API for bulk processing (50% savings)
- Set
thinking_budget=0for simple lookups - Monitor
usage_metadataon every response
from google import genai
from google.genai import types
client = genai.Client()
# Production-ready request with full observability
import time
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("gemini-prod")
def production_generate(prompt: str, model: str = "gemini-3.5-flash") -> str:
"""Production wrapper with logging, metrics, and error handling."""
start = time.time()
try:
response = client.models.generate_content(
model=model,
contents=prompt,
config=types.GenerateContentConfig(
temperature=0.3,
max_output_tokens=2048,
thinking_config=types.ThinkingConfig(thinking_budget=-1)
)
)
latency = time.time() - start
usage = response.usage_metadata
logger.info(
f"Gemini call | model={model} | "
f"input_tokens={usage.prompt_token_count} | "
f"output_tokens={usage.candidates_token_count} | "
f"cached_tokens={usage.cached_content_token_count} | "
f"latency={latency:.2f}s"
)
return response.text
except Exception as e:
latency = time.time() - start
logger.error(f"Gemini error | model={model} | error={e} | latency={latency:.2f}s")
raise
# Usage
result = production_generate("What is the capital of Japan?")
print(result)
Gemini SDK Track Complete!
Congratulations! You’ve completed all 14 parts of the Gemini SDK Track — from platform setup and text generation through multimodal AI, function calling, agents, deep research, live streaming, and enterprise deployment. You now have production-ready knowledge of the full Google GenAI ecosystem. Return to the AI App Dev Series to explore the Anthropic SDK track or Foundation articles.