1. The Architectural Shift
The Interactions API represents a fundamental paradigm change in how you build conversational AI with Gemini. Instead of managing conversation history client-side (appending messages to a contents array), the server maintains state — you simply reference previous interactions by ID.
Api-Revision: 2026-05-20 header locks your app to a specific API surface for stability.
| Aspect | generateContent | Interactions API |
|---|---|---|
| State Management | Client-side (you manage history array) | Server-side (referenced by ID) |
| Caching | Manual (explicit cache creation) | Automatic (implicit cache hits) |
| Multi-turn | Append to contents array | Set previous_interaction_id |
| Thought Signatures | Must preserve opaque bytes | Handled automatically |
| Response Structure | candidates[0].content.parts | steps[] timeline |
1.1 Your First Interaction
from google import genai
client = genai.Client()
# The simplest possible Interactions API call
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="What are the three laws of thermodynamics?"
)
# Access the response text directly
print("Response:", interaction.output_text)
# The interaction has a unique ID for chaining
print(f"Interaction ID: {interaction.id}")
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
// JavaScript equivalent
const interaction = await client.interactions.create({
model: "gemini-3.5-flash",
input: "What are the three laws of thermodynamics?"
});
console.log("Response:", interaction.outputText);
console.log("ID:", interaction.id);
2. The Steps Timeline
Unlike generateContent which returns a flat candidates array, the Interactions API returns a rich steps timeline — a sequence of typed entries showing exactly what the model did: thinking, searching, generating output.
2.1 Step Types
| Step Type | Description | Contains |
|---|---|---|
user_input | The user’s input message | Text content |
model_output | The model’s generated text | Text parts |
thought | Internal reasoning (thinking tokens) | Thought text (may be redacted) |
google_search_call | Model invoked Google Search | Search query |
google_search_result | Search results returned | Retrieved content |
file_search_call | Model invoked File Search | Search query, store names |
file_search_result | File Search results | Retrieved chunks |
function_call | Model called a function | Function name, arguments |
2.2 Parsing the Steps Timeline
from google import genai
client = genai.Client()
# Create an interaction that might use multiple step types
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="What happened in tech news today?",
tools=[{"type": "google_search"}]
)
# Iterate through the steps timeline
print("Steps Timeline:")
print("=" * 60)
for i, step in enumerate(interaction.steps):
print(f"\nStep {i + 1}: {step.type}")
if step.type == "user_input":
print(f" Input: {step.text}")
elif step.type == "thought":
print(f" Thinking: {step.text[:100]}...")
elif step.type == "google_search_call":
print(f" Query: {step.google_search_call.query}")
elif step.type == "google_search_result":
print(f" Results: {len(step.google_search_result.chunks)} chunks")
elif step.type == "model_output":
print(f" Output: {step.text[:200]}...")
print(f"\nFull response: {interaction.output_text[:300]}...")
thought steps, display search queries during google_search_call, and stream the final model_output — giving users full transparency into the model’s reasoning process.
Real-Time Financial Advisory
A robo-advisor platform grounds Gemini with live market data, enabling answers like “Should I buy NVIDIA stock?” to include today’s price, recent earnings, analyst ratings, and market trends — not just general investment advice. Grounding reduced hallucinated financial data from 15% to under 1%.
3. Multi-Turn with previous_interaction_id
3.1 Chaining Conversations
The killer feature of the Interactions API: continuing conversations without manually managing history. Just pass the previous interaction’s ID:
from google import genai
client = genai.Client()
# Turn 1: Establish context
turn1 = client.interactions.create(
model="gemini-3.5-flash",
input="I'm building a Python web app with FastAPI. I need help with authentication."
)
print(f"Turn 1: {turn1.output_text[:200]}...")
# Turn 2: Continue the conversation — server remembers context
turn2 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=turn1.id,
input="Can you show me how to implement JWT token validation?"
)
print(f"\nTurn 2: {turn2.output_text[:200]}...")
# Turn 3: Reference earlier context without repeating it
turn3 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=turn2.id,
input="Now add refresh token rotation to that implementation."
)
print(f"\nTurn 3: {turn3.output_text[:200]}...")
3.2 Implicit Caching Benefits
previous_interaction_id, the server automatically caches the conversation history. Subsequent turns only process the new input — you don’t pay to re-process the entire history on every turn. This can reduce costs by 75% or more for long conversations.
from google import genai
client = genai.Client()
# First interaction — full price for all tokens
interaction1 = client.interactions.create(
model="gemini-3.5-flash",
input="Explain the CAP theorem in distributed systems with examples."
)
print(f"Turn 1 output tokens: check usage_metadata")
# Second interaction — previous context is cached, only new input is charged
interaction2 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=interaction1.id,
input="How does this apply to choosing between Redis and PostgreSQL?"
)
print(f"Turn 2: {interaction2.output_text[:200]}...")
# The server implicitly cached interaction1's context
# You pay input tokens only for the new question, not the full history
4. Polymorphic response_format
4.1 The New Format System
The Interactions API replaces the old response_mime_type string with a polymorphic response_format object. This supports multiple output modalities and structured schemas:
| Format Type | Description | Use Case |
|---|---|---|
{"type": "text"} | Plain text output (default) | General conversation |
{"type": "json", "schema": {...}} | Structured JSON with schema enforcement | APIs, data extraction |
{"type": "audio"} | Audio output | Voice assistants |
{"type": "image"} | Image generation | Creative applications |
4.2 Structured JSON Output
from google import genai
from google.genai import types
client = genai.Client()
# Define a JSON schema for structured output
recipe_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"prep_time_minutes": {"type": "integer"},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"item": {"type": "string"},
"quantity": {"type": "string"}
},
"required": ["item", "quantity"]
}
},
"steps": {
"type": "array",
"items": {"type": "string"}
},
"difficulty": {"type": "string", "enum": ["easy", "medium", "hard"]}
},
"required": ["name", "prep_time_minutes", "ingredients", "steps", "difficulty"]
}
# Use response_format with the Interactions API
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="Give me a recipe for chocolate lava cake.",
config={
"response_format": {
"type": "json",
"schema": recipe_schema
}
}
)
import json
recipe = json.loads(interaction.output_text)
print(f"Recipe: {recipe['name']}")
print(f"Difficulty: {recipe['difficulty']}")
print(f"Prep time: {recipe['prep_time_minutes']} minutes")
print(f"Ingredients: {len(recipe['ingredients'])}")
for step_num, step in enumerate(recipe['steps'], 1):
print(f" {step_num}. {step}")
from google import genai
client = genai.Client()
# Request multiple output modalities (text + image)
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="Describe and draw a simple flowchart for a login process.",
config={
"response_format": [
{"type": "text"},
{"type": "image"}
]
}
)
# Access text and image parts separately
for step in interaction.steps:
if step.type == "model_output":
for part in step.parts:
if part.text:
print(f"Text: {part.text[:200]}...")
elif part.inline_data:
print(f"Image: {part.inline_data.mime_type}, {len(part.inline_data.data)} bytes")
5. Streaming Interactions
5.1 Event Types
Streaming with the Interactions API emits a sequence of typed events that map to the steps lifecycle:
| Event | When Emitted | Contains |
|---|---|---|
interaction.created | Start of response | Interaction ID, model |
step.start | New step begins | Step type, index |
step.delta | Incremental content | Text delta, partial data |
step.stop | Step completed | Final step data |
interaction.completed | Full response ready | Usage metadata, final ID |
5.2 Processing Streaming Events
from google import genai
client = genai.Client()
# Stream an interaction
stream = client.interactions.create(
model="gemini-3.5-flash",
input="Write a detailed explanation of how garbage collection works in Python.",
stream=True
)
# Process events as they arrive
full_text = ""
for event in stream:
if event.type == "interaction.created":
print(f"[Started] ID: {event.interaction.id}")
elif event.type == "step.start":
print(f"\n[Step {event.step.index}] Type: {event.step.type}")
elif event.type == "step.delta":
if event.step.delta.text:
# Print text deltas in real-time
print(event.step.delta.text, end="", flush=True)
full_text += event.step.delta.text
elif event.type == "step.stop":
print(f"\n[Step complete]")
elif event.type == "interaction.completed":
print(f"\n\n[Done] Total tokens: {event.interaction.usage_metadata}")
print(f"\nFull response length: {len(full_text)} chars")
from google import genai
client = genai.Client()
# Streaming multi-turn conversation
turn1 = client.interactions.create(
model="gemini-3.5-flash",
input="I need to design a message queue system."
)
# Stream the follow-up turn
print("Streaming Turn 2:")
print("-" * 40)
stream = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=turn1.id,
input="Show me a Python implementation using Redis as the backend.",
stream=True
)
for event in stream:
if event.type == "step.delta" and event.step.delta.text:
print(event.step.delta.text, end="", flush=True)
print("\n[Stream complete]")
6. Migration from generateContent
6.1 The May 2026 Breaking Changes
generateContent endpoint remains functional but frozen — no new capabilities will be added. Set Api-Revision: 2026-05-20 header to opt into the latest surface.
| generateContent (Old) | Interactions API (New) | Notes |
|---|---|---|
contents (array) | input (string or parts) | Server manages history |
generationConfig | config.generation_config | Nested under config |
response_mime_type | response_format object | Polymorphic type system |
candidates[0].content | steps[] timeline | Richer structure |
systemInstruction | system_instruction | Same concept, snake_case |
| Manual history append | previous_interaction_id | Automatic state |
6.2 Migration Checklist
from google import genai
from google.genai import types
client = genai.Client()
# ===== OLD PATTERN (generateContent) =====
# history = []
# history.append({"role": "user", "parts": [{"text": "Hello"}]})
# response = client.models.generate_content(
# model="gemini-3.5-flash",
# contents=history
# )
# history.append({"role": "model", "parts": response.candidates[0].content.parts})
# ===== NEW PATTERN (Interactions API) =====
# No history management needed!
interaction1 = client.interactions.create(
model="gemini-3.5-flash",
input="Hello, I need help with my Python project."
)
print(f"Response: {interaction1.output_text}")
# Continue — just reference the previous ID
interaction2 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=interaction1.id,
input="It's a FastAPI app that needs WebSocket support."
)
print(f"Response: {interaction2.output_text}")
from google import genai
from google.genai import types
client = genai.Client()
# Migration example: structured output
# OLD: response_mime_type string
# response = client.models.generate_content(
# model="gemini-3.5-flash",
# contents="Extract entities",
# config=types.GenerateContentConfig(
# response_mime_type="application/json",
# response_schema=my_schema
# )
# )
# NEW: response_format object
interaction = client.interactions.create(
model="gemini-3.5-flash",
input="Extract all person names and locations from: 'Alice went to Paris and met Bob in London.'",
config={
"response_format": {
"type": "json",
"schema": {
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"type": "string", "enum": ["person", "location"]}
},
"required": ["name", "type"]
}
}
},
"required": ["entities"]
}
}
}
)
import json
result = json.loads(interaction.output_text)
for entity in result["entities"]:
print(f" {entity['name']} ({entity['type']})")
1. Replace
client.models.generate_content() with client.interactions.create()2. Replace
contents array with input string/parts3. Replace history management with
previous_interaction_id chaining4. Replace
response_mime_type with response_format object5. Update response parsing:
response.text → interaction.output_text6. Add
Api-Revision: 2026-05-20 header for latest features7. Update streaming: event-based instead of chunk-based
Next in the Gemini SDK Track
In Part 10: Autonomous Agents & Antigravity SDK, we’ll build autonomous agents with the Antigravity SDK — managed remote Linux sandboxes, custom agents with inline environments and skills, the hook interception engine, and multimodal agent inputs.