Gemini SDK Track Part 12: Live API & Real-Time Streaming

                        
                        What You’ll Learn: Gemini’s advanced features push beyond standard chat: code execution runs Python in a sandboxed environment, thinking mode shows reasoning chains, URL/YouTube context lets you analyze web pages and videos directly, and JSON mode guarantees structured responses. These features unlock use cases that simpler models can’t handle.
                    

1. Bidirectional WebSocket Architecture

The Live API establishes a persistent, low-latency WebSocket connection between your application and Gemini. Unlike request-response APIs, this enables continuous bidirectional streaming — you push audio/video frames in real-time while simultaneously receiving model responses.

                        
                        Key Concept: The Live API uses model gemini-3.1-flash-live-preview, optimized for low-latency streaming with support for audio input, audio output, text output, and tool calling — all within the same persistent session.
                    

1.1 Use Cases

Voice Assistants: Push microphone audio, receive spoken responses in real-time
Real-Time Translation: Continuous speech-to-speech translation during live conversations
Live Coaching: Screen sharing with continuous AI feedback and guidance
Interactive Tutoring: Multi-modal sessions combining voice, drawing, and text

2. Getting Started with the SDK

2.1 Python: Async Live Connection

The Python SDK uses async/await for the Live API. The client.aio.live.connect() method establishes the WebSocket session:

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def live_text_session():
    """Basic live session with text input/output."""
    config = types.LiveConnectConfig(
        response_modalities=[types.Modality.TEXT]
    )

    async with client.aio.live.connect(
        model="gemini-3.1-flash-live-preview",
        config=config
    ) as session:
        # Send a text message
        await session.send_client_content(
            turns=types.Content(
                role="user",
                parts=[types.Part.from_text("Hello! What can you help me with today?")]
            )
        )

        # Receive streaming response
        async for message in session.receive():
            if message.text:
                print(message.text, end="")

        print()  # Newline after response

asyncio.run(live_text_session())

2.2 JavaScript: Live Connection with Callbacks

import { GoogleGenAI, Modality } from "@google/genai";

const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

async function liveTextSession() {
    const session = await client.live.connect({
        model: "gemini-3.1-flash-live-preview",
        config: {
            responseModalities: [Modality.TEXT]
        },
        callbacks: {
            onMessage(message) {
                if (message.text) {
                    process.stdout.write(message.text);
                }
            },
            onError(error) {
                console.error("Live session error:", error);
            },
            onClose() {
                console.log("\nSession closed.");
            }
        }
    });

    // Send text input
    session.sendClientContent({
        turns: { role: "user", parts: [{ text: "Explain WebSockets in 2 sentences." }] }
    });
}

liveTextSession();

Real-World Application

Automated Data Science Pipeline

A consulting firm uses Gemini’s code execution to prototype data analyses for clients. Analysts describe what they want (“find the correlation between marketing spend and revenue by region”), Gemini writes and executes the Python code, and produces visualizations — turning 4-hour analysis tasks into 15-minute conversations.

Code ExecutionData ScienceConsulting

3. Ephemeral Tokens for Client Security

For mobile and web applications, you cannot embed API keys in client-side code. Ephemeral tokens solve this — your backend generates a short-lived token that the client uses for its WebSocket connection.

                        
                        Security Critical: Ephemeral tokens are the ONLY safe way to connect to the Live API from client-side code. They expire after a short period and can only be used for Live API connections — not for other Gemini endpoints.
                    

3.1 Backend Token Generation

from google import genai

client = genai.Client()

# Backend: Generate an ephemeral token for the client
token_response = client.live.create_ephemeral_token(
    model="gemini-3.1-flash-live-preview",
    config={
        "response_modalities": ["AUDIO"],
        "voice": {"voice_name": "Kore"}
    }
)

# Send this token to your client application
ephemeral_token = token_response.token
print(f"Token (send to client): {ephemeral_token[:50]}...")
print(f"Expires: {token_response.expires_at}")

On the client side (JavaScript in browser or mobile app):

import { GoogleGenAI, Modality } from "@google/genai";

// Client receives ephemeral token from backend
const ephemeralToken = await fetch("/api/get-live-token").then(r => r.json());

// Connect using the ephemeral token (NOT an API key)
const client = new GoogleGenAI({ apiKey: ephemeralToken.token });

const session = await client.live.connect({
    model: "gemini-3.1-flash-live-preview",
    config: {
        responseModalities: [Modality.AUDIO]
    },
    callbacks: {
        onMessage(message) {
            if (message.audio) {
                playAudioChunk(message.audio);  // Your audio playback function
            }
        }
    }
});

// Stream microphone audio to the session
navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
    const recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
    recorder.ondataavailable = (event) => {
        session.sendRealtimeInput({ audio: event.data });
    };
    recorder.start(100);  // Send chunks every 100ms
});

4. Session Management

Live sessions have a defined lifecycle: connect, configure, send/receive, and close. Proper session management ensures reliable real-time applications:

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def managed_live_session():
    """Live session with proper lifecycle management."""
    config = types.LiveConnectConfig(
        response_modalities=[types.Modality.AUDIO, types.Modality.TEXT],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Puck")
            )
        ),
        system_instruction="You are a helpful voice assistant. Keep responses concise and natural."
    )

    try:
        async with client.aio.live.connect(
            model="gemini-3.1-flash-live-preview",
            config=config
        ) as session:
            print("Session connected. Type messages (or 'quit' to exit):")

            while True:
                user_input = input("> ")
                if user_input.lower() == "quit":
                    break

                await session.send_client_content(
                    turns=types.Content(
                        role="user",
                        parts=[types.Part.from_text(user_input)]
                    )
                )

                async for message in session.receive():
                    if message.text:
                        print(f"Assistant: {message.text}")
                    if message.server_content and message.server_content.turn_complete:
                        break

    except Exception as e:
        print(f"Session error: {e}")
    finally:
        print("Session closed.")

asyncio.run(managed_live_session())

4.1 Handling Reconnections

                        
                        Connection Resilience: Live sessions can drop due to network issues. Implement reconnection logic with exponential backoff. The session state is lost on disconnect — you must re-establish context by resending system instructions and any critical conversation history.
                    

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def resilient_live_session(max_retries: int = 3):
    """Live session with automatic reconnection."""
    retry_count = 0

    while retry_count < max_retries:
        try:
            config = types.LiveConnectConfig(
                response_modalities=[types.Modality.TEXT],
                system_instruction="You are a concise assistant."
            )

            async with client.aio.live.connect(
                model="gemini-3.1-flash-live-preview",
                config=config
            ) as session:
                retry_count = 0  # Reset on successful connect
                print("Connected. Streaming...")

                # Your session logic here
                await session.send_client_content(
                    turns=types.Content(
                        role="user",
                        parts=[types.Part.from_text("Hello!")]
                    )
                )

                async for message in session.receive():
                    if message.text:
                        print(message.text, end="")

        except ConnectionError as e:
            retry_count += 1
            wait_time = 2 ** retry_count  # Exponential backoff
            print(f"\nConnection lost. Retrying in {wait_time}s... ({retry_count}/{max_retries})")
            await asyncio.sleep(wait_time)

    print("Max retries reached. Session terminated.")

asyncio.run(resilient_live_session())

5. Tool Use in Live Sessions

The Live API supports function calling within streaming sessions. When the model needs external data, it pauses audio output, requests a tool call, waits for your response, then continues speaking with the new information:

import asyncio
import json
from google import genai
from google.genai import types

client = genai.Client()

# Define tools available in the live session
weather_tool = types.Tool(
    function_declarations=[
        types.FunctionDeclaration(
            name="get_weather",
            description="Get current weather for a city",
            parameters=types.Schema(
                type=types.Type.OBJECT,
                properties={
                    "city": types.Schema(type=types.Type.STRING, description="City name"),
                    "unit": types.Schema(type=types.Type.STRING, enum=["celsius", "fahrenheit"])
                },
                required=["city"]
            )
        )
    ]
)

def execute_weather_tool(city: str, unit: str = "celsius") -> dict:
    """Simulate weather API call."""
    return {"city": city, "temperature": 22, "unit": unit, "condition": "Sunny"}

async def live_session_with_tools():
    """Live session with function calling."""
    config = types.LiveConnectConfig(
        response_modalities=[types.Modality.TEXT],
        tools=[weather_tool],
        system_instruction="You are a helpful assistant. Use the weather tool when asked about weather."
    )

    async with client.aio.live.connect(
        model="gemini-3.1-flash-live-preview",
        config=config
    ) as session:
        await session.send_client_content(
            turns=types.Content(
                role="user",
                parts=[types.Part.from_text("What's the weather like in Tokyo?")]
            )
        )

        async for message in session.receive():
            # Handle tool call requests
            if message.tool_call:
                print(f"[Tool Call] {message.tool_call.function_calls[0].name}")
                func_call = message.tool_call.function_calls[0]
                args = dict(func_call.args)

                # Execute the tool
                result = execute_weather_tool(**args)

                # Send result back to the session
                await session.send_tool_response(
                    function_responses=[
                        types.FunctionResponse(
                            name=func_call.name,
                            response=result
                        )
                    ]
                )

            # Handle text responses
            if message.text:
                print(f"Assistant: {message.text}")

            if message.server_content and message.server_content.turn_complete:
                break

asyncio.run(live_session_with_tools())

6. Best Practices

Optimizing Live API sessions for production quality:

Chunk Size: Send audio in 100ms chunks for optimal latency. Larger chunks increase delay; smaller chunks increase overhead.
Language Specification: Always set the language in system instructions (e.g., “Respond in English”). Without this, the model may switch languages based on audio input characteristics.
Voice Selection: Choose voices matching your use case. Available voices include Puck, Charon, Kore, Fenrir, and Aoede — each with distinct characteristics.
Graceful Disconnection: Always close sessions properly. Abrupt disconnections waste server resources and may count against rate limits.
Audio Format: Use PCM 16-bit at 16kHz for input audio. The model outputs audio in the same format by default.

import asyncio
from google import genai
from google.genai import types

client = genai.Client()

async def production_live_config():
    """Production-optimized live session configuration."""
    config = types.LiveConnectConfig(
        response_modalities=[types.Modality.AUDIO, types.Modality.TEXT],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
            )
        ),
        system_instruction=(
            "You are a professional voice assistant. "
            "Always respond in English. "
            "Keep responses under 3 sentences unless asked for detail. "
            "If you don't know something, say so clearly."
        ),
        # Input audio configuration
        realtime_input_config=types.RealtimeInputConfig(
            automatic_activity_detection=types.AutomaticActivityDetection(
                disabled=False  # Auto-detect when user stops speaking
            )
        )
    )

    async with client.aio.live.connect(
        model="gemini-3.1-flash-live-preview",
        config=config
    ) as session:
        print("Production session ready.")
        # Your production session logic here

asyncio.run(production_live_config())

                        
                        Latency Tip: For the lowest possible latency, use gemini-3.1-flash-live-preview with text-only response modality and keep system instructions short. Each additional modality and instruction length adds processing time.
                    

                        
                        Try It Yourself: Build a ‘data analysis assistant’ using code execution: (1) upload a CSV dataset to Gemini, (2) ask it to perform statistical analysis (mean, median, correlations), (3) have it generate and execute matplotlib visualization code, (4) retrieve the generated chart image. Compare the quality of Gemini’s code execution vs manually writing the analysis.
                    

Next in the Gemini SDK Track

In Part 13: Optimization & Production Operations, we’ll optimize cost and performance with context caching (explicit + implicit), Flex and Priority inference, Batch API, webhooks, data logging, and billing architecture.