Gemini SDK Track Part 9: The Interactions API (Beta)

                        
                        What You’ll Learn: Grounding connects Gemini to real-time information — Google Search results, your own data stores, and verified knowledge bases. Without grounding, the model can only use its training data (which has a cutoff). With grounding, it can answer questions about today’s news, current prices, and live events. Think of it like the difference between asking someone a question from memory vs letting them Google it first.
                    

1. The Architectural Shift

The Interactions API represents a fundamental paradigm change in how you build conversational AI with Gemini. Instead of managing conversation history client-side (appending messages to a contents array), the server maintains state — you simply reference previous interactions by ID.

                        
                        Beta Status: The Interactions API is currently in beta. While production-ready for many use cases, some features may evolve. The Api-Revision: 2026-05-20 header locks your app to a specific API surface for stability.
                    

Aspect	generateContent	Interactions API
State Management	Client-side (you manage history array)	Server-side (referenced by ID)
Caching	Manual (explicit cache creation)	Automatic (implicit cache hits)
Multi-turn	Append to contents array	Set previous_interaction_id
Thought Signatures	Must preserve opaque bytes	Handled automatically
Response Structure	candidates[0].content.parts	steps[] timeline

1.1 Your First Interaction

from google import genai

client = genai.Client()

# The simplest possible Interactions API call
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="What are the three laws of thermodynamics?"
)

# Access the response text directly
print("Response:", interaction.output_text)

# The interaction has a unique ID for chaining
print(f"Interaction ID: {interaction.id}")

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// JavaScript equivalent
const interaction = await client.interactions.create({
    model: "gemini-3.5-flash",
    input: "What are the three laws of thermodynamics?"
});

console.log("Response:", interaction.outputText);
console.log("ID:", interaction.id);

2. The Steps Timeline

Unlike generateContent which returns a flat candidates array, the Interactions API returns a rich steps timeline — a sequence of typed entries showing exactly what the model did: thinking, searching, generating output.

2.1 Step Types

Step Type	Description	Contains
`user_input`	The user’s input message	Text content
`model_output`	The model’s generated text	Text parts
`thought`	Internal reasoning (thinking tokens)	Thought text (may be redacted)
`google_search_call`	Model invoked Google Search	Search query
`google_search_result`	Search results returned	Retrieved content
`file_search_call`	Model invoked File Search	Search query, store names
`file_search_result`	File Search results	Retrieved chunks
`function_call`	Model called a function	Function name, arguments

2.2 Parsing the Steps Timeline

from google import genai

client = genai.Client()

# Create an interaction that might use multiple step types
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="What happened in tech news today?",
    tools=[{"type": "google_search"}]
)

# Iterate through the steps timeline
print("Steps Timeline:")
print("=" * 60)
for i, step in enumerate(interaction.steps):
    print(f"\nStep {i + 1}: {step.type}")

    if step.type == "user_input":
        print(f"  Input: {step.text}")

    elif step.type == "thought":
        print(f"  Thinking: {step.text[:100]}...")

    elif step.type == "google_search_call":
        print(f"  Query: {step.google_search_call.query}")

    elif step.type == "google_search_result":
        print(f"  Results: {len(step.google_search_result.chunks)} chunks")

    elif step.type == "model_output":
        print(f"  Output: {step.text[:200]}...")

print(f"\nFull response: {interaction.output_text[:300]}...")

                        
                        UI Building: The steps timeline is designed for building rich UIs. Show a “Thinking...” indicator during thought steps, display search queries during google_search_call, and stream the final model_output — giving users full transparency into the model’s reasoning process.
                    

Real-World Application

Real-Time Financial Advisory

A robo-advisor platform grounds Gemini with live market data, enabling answers like “Should I buy NVIDIA stock?” to include today’s price, recent earnings, analyst ratings, and market trends — not just general investment advice. Grounding reduced hallucinated financial data from 15% to under 1%.

GroundingFinanceReal-Time Data

3. Multi-Turn with previous_interaction_id

3.1 Chaining Conversations

The killer feature of the Interactions API: continuing conversations without manually managing history. Just pass the previous interaction’s ID:

from google import genai

client = genai.Client()

# Turn 1: Establish context
turn1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="I'm building a Python web app with FastAPI. I need help with authentication."
)
print(f"Turn 1: {turn1.output_text[:200]}...")

# Turn 2: Continue the conversation — server remembers context
turn2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn1.id,
    input="Can you show me how to implement JWT token validation?"
)
print(f"\nTurn 2: {turn2.output_text[:200]}...")

# Turn 3: Reference earlier context without repeating it
turn3 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn2.id,
    input="Now add refresh token rotation to that implementation."
)
print(f"\nTurn 3: {turn3.output_text[:200]}...")

3.2 Implicit Caching Benefits

                        
                        Automatic Cost Savings: When you chain interactions with previous_interaction_id, the server automatically caches the conversation history. Subsequent turns only process the new input — you don’t pay to re-process the entire history on every turn. This can reduce costs by 75% or more for long conversations.
                    

from google import genai

client = genai.Client()

# First interaction — full price for all tokens
interaction1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="Explain the CAP theorem in distributed systems with examples."
)
print(f"Turn 1 output tokens: check usage_metadata")

# Second interaction — previous context is cached, only new input is charged
interaction2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=interaction1.id,
    input="How does this apply to choosing between Redis and PostgreSQL?"
)
print(f"Turn 2: {interaction2.output_text[:200]}...")

# The server implicitly cached interaction1's context
# You pay input tokens only for the new question, not the full history

4. Polymorphic response_format

4.1 The New Format System

The Interactions API replaces the old response_mime_type string with a polymorphic response_format object. This supports multiple output modalities and structured schemas:

Format Type	Description	Use Case
`{"type": "text"}`	Plain text output (default)	General conversation
`{"type": "json", "schema": {...}}`	Structured JSON with schema enforcement	APIs, data extraction
`{"type": "audio"}`	Audio output	Voice assistants
`{"type": "image"}`	Image generation	Creative applications

4.2 Structured JSON Output

from google import genai
from google.genai import types

client = genai.Client()

# Define a JSON schema for structured output
recipe_schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "prep_time_minutes": {"type": "integer"},
        "ingredients": {
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "item": {"type": "string"},
                    "quantity": {"type": "string"}
                },
                "required": ["item", "quantity"]
            }
        },
        "steps": {
            "type": "array",
            "items": {"type": "string"}
        },
        "difficulty": {"type": "string", "enum": ["easy", "medium", "hard"]}
    },
    "required": ["name", "prep_time_minutes", "ingredients", "steps", "difficulty"]
}

# Use response_format with the Interactions API
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Give me a recipe for chocolate lava cake.",
    config={
        "response_format": {
            "type": "json",
            "schema": recipe_schema
        }
    }
)

import json
recipe = json.loads(interaction.output_text)
print(f"Recipe: {recipe['name']}")
print(f"Difficulty: {recipe['difficulty']}")
print(f"Prep time: {recipe['prep_time_minutes']} minutes")
print(f"Ingredients: {len(recipe['ingredients'])}")
for step_num, step in enumerate(recipe['steps'], 1):
    print(f"  {step_num}. {step}")

from google import genai

client = genai.Client()

# Request multiple output modalities (text + image)
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Describe and draw a simple flowchart for a login process.",
    config={
        "response_format": [
            {"type": "text"},
            {"type": "image"}
        ]
    }
)

# Access text and image parts separately
for step in interaction.steps:
    if step.type == "model_output":
        for part in step.parts:
            if part.text:
                print(f"Text: {part.text[:200]}...")
            elif part.inline_data:
                print(f"Image: {part.inline_data.mime_type}, {len(part.inline_data.data)} bytes")

5. Streaming Interactions

5.1 Event Types

Streaming with the Interactions API emits a sequence of typed events that map to the steps lifecycle:

Event	When Emitted	Contains
`interaction.created`	Start of response	Interaction ID, model
`step.start`	New step begins	Step type, index
`step.delta`	Incremental content	Text delta, partial data
`step.stop`	Step completed	Final step data
`interaction.completed`	Full response ready	Usage metadata, final ID

5.2 Processing Streaming Events

from google import genai

client = genai.Client()

# Stream an interaction
stream = client.interactions.create(
    model="gemini-3.5-flash",
    input="Write a detailed explanation of how garbage collection works in Python.",
    stream=True
)

# Process events as they arrive
full_text = ""
for event in stream:
    if event.type == "interaction.created":
        print(f"[Started] ID: {event.interaction.id}")

    elif event.type == "step.start":
        print(f"\n[Step {event.step.index}] Type: {event.step.type}")

    elif event.type == "step.delta":
        if event.step.delta.text:
            # Print text deltas in real-time
            print(event.step.delta.text, end="", flush=True)
            full_text += event.step.delta.text

    elif event.type == "step.stop":
        print(f"\n[Step complete]")

    elif event.type == "interaction.completed":
        print(f"\n\n[Done] Total tokens: {event.interaction.usage_metadata}")

print(f"\nFull response length: {len(full_text)} chars")

from google import genai

client = genai.Client()

# Streaming multi-turn conversation
turn1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="I need to design a message queue system."
)

# Stream the follow-up turn
print("Streaming Turn 2:")
print("-" * 40)
stream = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=turn1.id,
    input="Show me a Python implementation using Redis as the backend.",
    stream=True
)

for event in stream:
    if event.type == "step.delta" and event.step.delta.text:
        print(event.step.delta.text, end="", flush=True)

print("\n[Stream complete]")

6. Migration from generateContent

6.1 The May 2026 Breaking Changes

                        
                        API Revision Required: Starting May 2026, new features are only available via the Interactions API. The generateContent endpoint remains functional but frozen — no new capabilities will be added. Set Api-Revision: 2026-05-20 header to opt into the latest surface.
                    

generateContent (Old)	Interactions API (New)	Notes
`contents` (array)	`input` (string or parts)	Server manages history
`generationConfig`	`config.generation_config`	Nested under config
`response_mime_type`	`response_format` object	Polymorphic type system
`candidates[0].content`	`steps[]` timeline	Richer structure
`systemInstruction`	`system_instruction`	Same concept, snake_case
Manual history append	`previous_interaction_id`	Automatic state

6.2 Migration Checklist

from google import genai
from google.genai import types

client = genai.Client()

# ===== OLD PATTERN (generateContent) =====
# history = []
# history.append({"role": "user", "parts": [{"text": "Hello"}]})
# response = client.models.generate_content(
#     model="gemini-3.5-flash",
#     contents=history
# )
# history.append({"role": "model", "parts": response.candidates[0].content.parts})

# ===== NEW PATTERN (Interactions API) =====
# No history management needed!
interaction1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="Hello, I need help with my Python project."
)
print(f"Response: {interaction1.output_text}")

# Continue — just reference the previous ID
interaction2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=interaction1.id,
    input="It's a FastAPI app that needs WebSocket support."
)
print(f"Response: {interaction2.output_text}")

from google import genai
from google.genai import types

client = genai.Client()

# Migration example: structured output
# OLD: response_mime_type string
# response = client.models.generate_content(
#     model="gemini-3.5-flash",
#     contents="Extract entities",
#     config=types.GenerateContentConfig(
#         response_mime_type="application/json",
#         response_schema=my_schema
#     )
# )

# NEW: response_format object
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="Extract all person names and locations from: 'Alice went to Paris and met Bob in London.'",
    config={
        "response_format": {
            "type": "json",
            "schema": {
                "type": "object",
                "properties": {
                    "entities": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "type": {"type": "string", "enum": ["person", "location"]}
                            },
                            "required": ["name", "type"]
                        }
                    }
                },
                "required": ["entities"]
            }
        }
    }
)

import json
result = json.loads(interaction.output_text)
for entity in result["entities"]:
    print(f"  {entity['name']} ({entity['type']})")

                        
                        Migration Checklist:

                        1. Replace client.models.generate_content() with client.interactions.create()

                        2. Replace contents array with input string/parts

                        3. Replace history management with previous_interaction_id chaining

                        4. Replace response_mime_type with response_format object

                        5. Update response parsing: response.text → interaction.output_text

                        6. Add Api-Revision: 2026-05-20 header for latest features

                        7. Update streaming: event-based instead of chunk-based

                        
                        Try It Yourself: Build a ‘fact-checked news summarizer’: (1) Use Google Search grounding to get today’s top news, (2) summarize each story with Gemini, (3) verify key claims by grounding against additional search results, (4) output a daily briefing with confidence scores and source links for each claim.
                    

Next in the Gemini SDK Track

In Part 10: Autonomous Agents & Antigravity SDK, we’ll build autonomous agents with the Antigravity SDK — managed remote Linux sandboxes, custom agents with inline environments and skills, the hook interception engine, and multimodal agent inputs.