Back to AI App Dev Series

Gemini SDK Track Part 9: The Interactions API (Beta)

May 24, 2026 Wasil Zafar 45 min read

Master the Interactions API — the architectural shift from generateContent to stateful server-managed conversations, the steps timeline, previous_interaction_id chaining, polymorphic response_format, streaming events, and the May 2026 migration guide.

Table of Contents

  1. The Architectural Shift
  2. The Steps Timeline
  3. Multi-Turn Conversations
  4. Polymorphic response_format
  5. Streaming Interactions
  6. Migration from generateContent
What You’ll Learn: Grounding connects Gemini to real-time information — Google Search results, your own data stores, and verified knowledge bases. Without grounding, the model can only use its training data (which has a cutoff). With grounding, it can answer questions about today’s news, current prices, and live events. Think of it like the difference between asking someone a question from memory vs letting them Google it first.

1. The Architectural Shift

The Interactions API represents a fundamental paradigm change in how you build conversational AI with Gemini. Instead of managing conversation history client-side (appending messages to a contents array), the server maintains state — you simply reference previous interactions by ID.

Beta Status: The Interactions API is currently in beta. While production-ready for many use cases, some features may evolve. The Api-Revision: 2026-05-20 header locks your app to a specific API surface for stability.
AspectgenerateContentInteractions API
State ManagementClient-side (you manage history array)Server-side (referenced by ID)
CachingManual (explicit cache creation)Automatic (implicit cache hits)
Multi-turnAppend to contents arraySet previous_interaction_id
Thought SignaturesMust preserve opaque bytesHandled automatically
Response Structurecandidates[0].content.partssteps[] timeline

1.1 Your First Interaction

from google import genai

client = genai.Client()

# The simplest possible Interactions API call
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="What are the three laws of thermodynamics?"
)

# Access the response text directly
print("Response:", interaction.output_text)

# The interaction has a unique ID for chaining
print(f"Interaction ID: {interaction.id}")
import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// JavaScript equivalent
const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: "What are the three laws of thermodynamics?"
});

console.log("Response:", interaction.outputText);
console.log("ID:", interaction.id);

2. The Steps Timeline

Unlike generateContent which returns a flat candidates array, the Interactions API returns a rich steps timeline — a sequence of typed entries showing exactly what the model did: thinking, searching, generating output.

2.1 Step Types

Step TypeDescriptionContains
user_inputThe user’s input messageText content
model_outputThe model’s generated textText parts
thoughtInternal reasoning (thinking tokens)Thought text (may be redacted)
google_search_callModel invoked Google SearchSearch query
google_search_resultSearch results returnedRetrieved content
file_search_callModel invoked File SearchSearch query, store names
file_search_resultFile Search resultsRetrieved chunks
function_callModel called a functionFunction name, arguments

2.2 Parsing the Steps Timeline

from google import genai

client = genai.Client()

# Create an interaction that might use multiple step types
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="What happened in tech news today?",
    tools=[{"type": "google_search"}]
)

# Iterate through the steps timeline
print("Steps Timeline:")
print("=" * 60)
for i, step in enumerate(interaction.steps):
    print(f"\nStep {i + 1}: {step.type}")

    if step.type == "user_input":
        print(f"  Input: {step.text}")

    elif step.type == "thought":
        print(f"  Thinking: {step.text[:100]}...")

    elif step.type == "google_search_call":
        print(f"  Query: {step.google_search_call.query}")

    elif step.type == "google_search_result":
        print(f"  Results: {len(step.google_search_result.chunks)} chunks")

    elif step.type == "model_output":
        print(f"  Output: {step.text[:200]}...")

print(f"\nFull response: {interaction.output_text[:300]}...")
UI Building: The steps timeline is designed for building rich UIs. Show a “Thinking...” indicator during thought steps, display search queries during google_search_call, and stream the final model_output — giving users full transparency into the model’s reasoning process.
Real-World Application

Real-Time Financial Advisory

A robo-advisor platform grounds Gemini with live market data, enabling answers like “Should I buy NVIDIA stock?” to include today’s price, recent earnings, analyst ratings, and market trends — not just general investment advice. Grounding reduced hallucinated financial data from 15% to under 1%.

GroundingFinanceReal-Time Data

3. Multi-Turn with previous_interaction_id

3.1 Chaining Conversations

The killer feature of the Interactions API: continuing conversations without manually managing history. Just pass the previous interaction’s ID:

from google import genai

client = genai.Client()

# Turn 1: Establish context
turn1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="I'm building a Python web app with FastAPI. I need help with authentication."
)
print(f"Turn 1: {turn1.output_text[:200]}...")

# Turn 2: Continue the conversation — server remembers context
turn2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn1.id,
    input="Can you show me how to implement JWT token validation?"
)
print(f"\nTurn 2: {turn2.output_text[:200]}...")

# Turn 3: Reference earlier context without repeating it
turn3 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn2.id,
    input="Now add refresh token rotation to that implementation."
)
print(f"\nTurn 3: {turn3.output_text[:200]}...")

3.2 Implicit Caching Benefits

Automatic Cost Savings: When you chain interactions with previous_interaction_id, the server automatically caches the conversation history. Subsequent turns only process the new input — you don’t pay to re-process the entire history on every turn. This can reduce costs by 75% or more for long conversations.
from google import genai

client = genai.Client()

# First interaction — full price for all tokens
interaction1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="Explain the CAP theorem in distributed systems with examples."
)
print(f"Turn 1 output tokens: check usage_metadata")

# Second interaction — previous context is cached, only new input is charged
interaction2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=interaction1.id,
    input="How does this apply to choosing between Redis and PostgreSQL?"
)
print(f"Turn 2: {interaction2.output_text[:200]}...")

# The server implicitly cached interaction1's context
# You pay input tokens only for the new question, not the full history

4. Polymorphic response_format

4.1 The New Format System

The Interactions API replaces the old response_mime_type string with a polymorphic response_format object. This supports multiple output modalities and structured schemas:

Format TypeDescriptionUse Case
{"type": "text"}Plain text output (default)General conversation
{"type": "json", "schema": {...}}Structured JSON with schema enforcementAPIs, data extraction
{"type": "audio"}Audio outputVoice assistants
{"type": "image"}Image generationCreative applications

4.2 Structured JSON Output

from google import genai
from google.genai import types

client = genai.Client()

# Define a JSON schema for structured output
recipe_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "prep_time_minutes": {"type": "integer"},
        "ingredients": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "item": {"type": "string"},
                    "quantity": {"type": "string"}
                },
                "required": ["item", "quantity"]
            }
        },
        "steps": {
            "type": "array",
            "items": {"type": "string"}
        },
        "difficulty": {"type": "string", "enum": ["easy", "medium", "hard"]}
    },
    "required": ["name", "prep_time_minutes", "ingredients", "steps", "difficulty"]
}

# Use response_format with the Interactions API
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Give me a recipe for chocolate lava cake.",
    config={
        "response_format": {
            "type": "json",
            "schema": recipe_schema
        }
    }
)

import json
recipe = json.loads(interaction.output_text)
print(f"Recipe: {recipe['name']}")
print(f"Difficulty: {recipe['difficulty']}")
print(f"Prep time: {recipe['prep_time_minutes']} minutes")
print(f"Ingredients: {len(recipe['ingredients'])}")
for step_num, step in enumerate(recipe['steps'], 1):
    print(f"  {step_num}. {step}")
from google import genai

client = genai.Client()

# Request multiple output modalities (text + image)
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Describe and draw a simple flowchart for a login process.",
    config={
        "response_format": [
            {"type": "text"},
            {"type": "image"}
        ]
    }
)

# Access text and image parts separately
for step in interaction.steps:
    if step.type == "model_output":
        for part in step.parts:
            if part.text:
                print(f"Text: {part.text[:200]}...")
            elif part.inline_data:
                print(f"Image: {part.inline_data.mime_type}, {len(part.inline_data.data)} bytes")

5. Streaming Interactions

5.1 Event Types

Streaming with the Interactions API emits a sequence of typed events that map to the steps lifecycle:

EventWhen EmittedContains
interaction.createdStart of responseInteraction ID, model
step.startNew step beginsStep type, index
step.deltaIncremental contentText delta, partial data
step.stopStep completedFinal step data
interaction.completedFull response readyUsage metadata, final ID

5.2 Processing Streaming Events

from google import genai

client = genai.Client()

# Stream an interaction
stream = client.interactions.create(
    model="gemini-3.5-flash",
    input="Write a detailed explanation of how garbage collection works in Python.",
    stream=True
)

# Process events as they arrive
full_text = ""
for event in stream:
    if event.type == "interaction.created":
        print(f"[Started] ID: {event.interaction.id}")

    elif event.type == "step.start":
        print(f"\n[Step {event.step.index}] Type: {event.step.type}")

    elif event.type == "step.delta":
        if event.step.delta.text:
            # Print text deltas in real-time
            print(event.step.delta.text, end="", flush=True)
            full_text += event.step.delta.text

    elif event.type == "step.stop":
        print(f"\n[Step complete]")

    elif event.type == "interaction.completed":
        print(f"\n\n[Done] Total tokens: {event.interaction.usage_metadata}")

print(f"\nFull response length: {len(full_text)} chars")
from google import genai

client = genai.Client()

# Streaming multi-turn conversation
turn1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="I need to design a message queue system."
)

# Stream the follow-up turn
print("Streaming Turn 2:")
print("-" * 40)
stream = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn1.id,
    input="Show me a Python implementation using Redis as the backend.",
    stream=True
)

for event in stream:
    if event.type == "step.delta" and event.step.delta.text:
        print(event.step.delta.text, end="", flush=True)

print("\n[Stream complete]")

6. Migration from generateContent

6.1 The May 2026 Breaking Changes

API Revision Required: Starting May 2026, new features are only available via the Interactions API. The generateContent endpoint remains functional but frozen — no new capabilities will be added. Set Api-Revision: 2026-05-20 header to opt into the latest surface.
generateContent (Old)Interactions API (New)Notes
contents (array)input (string or parts)Server manages history
generationConfigconfig.generation_configNested under config
response_mime_typeresponse_format objectPolymorphic type system
candidates[0].contentsteps[] timelineRicher structure
systemInstructionsystem_instructionSame concept, snake_case
Manual history appendprevious_interaction_idAutomatic state

6.2 Migration Checklist

from google import genai
from google.genai import types

client = genai.Client()

# ===== OLD PATTERN (generateContent) =====
# history = []
# history.append({"role": "user", "parts": [{"text": "Hello"}]})
# response = client.models.generate_content(
#     model="gemini-3.5-flash",
#     contents=history
# )
# history.append({"role": "model", "parts": response.candidates[0].content.parts})

# ===== NEW PATTERN (Interactions API) =====
# No history management needed!
interaction1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="Hello, I need help with my Python project."
)
print(f"Response: {interaction1.output_text}")

# Continue — just reference the previous ID
interaction2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=interaction1.id,
    input="It's a FastAPI app that needs WebSocket support."
)
print(f"Response: {interaction2.output_text}")
from google import genai
from google.genai import types

client = genai.Client()

# Migration example: structured output
# OLD: response_mime_type string
# response = client.models.generate_content(
#     model="gemini-3.5-flash",
#     contents="Extract entities",
#     config=types.GenerateContentConfig(
#         response_mime_type="application/json",
#         response_schema=my_schema
#     )
# )

# NEW: response_format object
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Extract all person names and locations from: 'Alice went to Paris and met Bob in London.'",
    config={
        "response_format": {
            "type": "json",
            "schema": {
                "type": "object",
                "properties": {
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "type": {"type": "string", "enum": ["person", "location"]}
                            },
                            "required": ["name", "type"]
                        }
                    }
                },
                "required": ["entities"]
            }
        }
    }
)

import json
result = json.loads(interaction.output_text)
for entity in result["entities"]:
    print(f"  {entity['name']} ({entity['type']})")
Migration Checklist:
1. Replace client.models.generate_content() with client.interactions.create()
2. Replace contents array with input string/parts
3. Replace history management with previous_interaction_id chaining
4. Replace response_mime_type with response_format object
5. Update response parsing: response.textinteraction.output_text
6. Add Api-Revision: 2026-05-20 header for latest features
7. Update streaming: event-based instead of chunk-based
Try It Yourself: Build a ‘fact-checked news summarizer’: (1) Use Google Search grounding to get today’s top news, (2) summarize each story with Gemini, (3) verify key claims by grounding against additional search results, (4) output a daily briefing with confidence scores and source links for each claim.

Next in the Gemini SDK Track

In Part 10: Autonomous Agents & Antigravity SDK, we’ll build autonomous agents with the Antigravity SDK — managed remote Linux sandboxes, custom agents with inline environments and skills, the hook interception engine, and multimodal agent inputs.