1. Bidirectional WebSocket Architecture
The Live API establishes a persistent, low-latency WebSocket connection between your application and Gemini. Unlike request-response APIs, this enables continuous bidirectional streaming — you push audio/video frames in real-time while simultaneously receiving model responses.
gemini-3.1-flash-live-preview, optimized for low-latency streaming with support for audio input, audio output, text output, and tool calling — all within the same persistent session.
1.1 Use Cases
- Voice Assistants: Push microphone audio, receive spoken responses in real-time
- Real-Time Translation: Continuous speech-to-speech translation during live conversations
- Live Coaching: Screen sharing with continuous AI feedback and guidance
- Interactive Tutoring: Multi-modal sessions combining voice, drawing, and text
2. Getting Started with the SDK
2.1 Python: Async Live Connection
The Python SDK uses async/await for the Live API. The client.aio.live.connect() method establishes the WebSocket session:
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def live_text_session():
"""Basic live session with text input/output."""
config = types.LiveConnectConfig(
response_modalities=[types.Modality.TEXT]
)
async with client.aio.live.connect(
model="gemini-3.1-flash-live-preview",
config=config
) as session:
# Send a text message
await session.send_client_content(
turns=types.Content(
role="user",
parts=[types.Part.from_text("Hello! What can you help me with today?")]
)
)
# Receive streaming response
async for message in session.receive():
if message.text:
print(message.text, end="")
print() # Newline after response
asyncio.run(live_text_session())
2.2 JavaScript: Live Connection with Callbacks
import { GoogleGenAI, Modality } from "@google/genai";
const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
async function liveTextSession() {
const session = await client.live.connect({
model: "gemini-3.1-flash-live-preview",
config: {
responseModalities: [Modality.TEXT]
},
callbacks: {
onMessage(message) {
if (message.text) {
process.stdout.write(message.text);
}
},
onError(error) {
console.error("Live session error:", error);
},
onClose() {
console.log("\nSession closed.");
}
}
});
// Send text input
session.sendClientContent({
turns: { role: "user", parts: [{ text: "Explain WebSockets in 2 sentences." }] }
});
}
liveTextSession();
Automated Data Science Pipeline
A consulting firm uses Gemini’s code execution to prototype data analyses for clients. Analysts describe what they want (“find the correlation between marketing spend and revenue by region”), Gemini writes and executes the Python code, and produces visualizations — turning 4-hour analysis tasks into 15-minute conversations.
3. Ephemeral Tokens for Client Security
For mobile and web applications, you cannot embed API keys in client-side code. Ephemeral tokens solve this — your backend generates a short-lived token that the client uses for its WebSocket connection.
3.1 Backend Token Generation
from google import genai
client = genai.Client()
# Backend: Generate an ephemeral token for the client
token_response = client.live.create_ephemeral_token(
model="gemini-3.1-flash-live-preview",
config={
"response_modalities": ["AUDIO"],
"voice": {"voice_name": "Kore"}
}
)
# Send this token to your client application
ephemeral_token = token_response.token
print(f"Token (send to client): {ephemeral_token[:50]}...")
print(f"Expires: {token_response.expires_at}")
On the client side (JavaScript in browser or mobile app):
import { GoogleGenAI, Modality } from "@google/genai";
// Client receives ephemeral token from backend
const ephemeralToken = await fetch("/api/get-live-token").then(r => r.json());
// Connect using the ephemeral token (NOT an API key)
const client = new GoogleGenAI({ apiKey: ephemeralToken.token });
const session = await client.live.connect({
model: "gemini-3.1-flash-live-preview",
config: {
responseModalities: [Modality.AUDIO]
},
callbacks: {
onMessage(message) {
if (message.audio) {
playAudioChunk(message.audio); // Your audio playback function
}
}
}
});
// Stream microphone audio to the session
navigator.mediaDevices.getUserMedia({ audio: true }).then(stream => {
const recorder = new MediaRecorder(stream, { mimeType: "audio/webm" });
recorder.ondataavailable = (event) => {
session.sendRealtimeInput({ audio: event.data });
};
recorder.start(100); // Send chunks every 100ms
});
4. Session Management
Live sessions have a defined lifecycle: connect, configure, send/receive, and close. Proper session management ensures reliable real-time applications:
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def managed_live_session():
"""Live session with proper lifecycle management."""
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO, types.Modality.TEXT],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Puck")
)
),
system_instruction="You are a helpful voice assistant. Keep responses concise and natural."
)
try:
async with client.aio.live.connect(
model="gemini-3.1-flash-live-preview",
config=config
) as session:
print("Session connected. Type messages (or 'quit' to exit):")
while True:
user_input = input("> ")
if user_input.lower() == "quit":
break
await session.send_client_content(
turns=types.Content(
role="user",
parts=[types.Part.from_text(user_input)]
)
)
async for message in session.receive():
if message.text:
print(f"Assistant: {message.text}")
if message.server_content and message.server_content.turn_complete:
break
except Exception as e:
print(f"Session error: {e}")
finally:
print("Session closed.")
asyncio.run(managed_live_session())
4.1 Handling Reconnections
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def resilient_live_session(max_retries: int = 3):
"""Live session with automatic reconnection."""
retry_count = 0
while retry_count < max_retries:
try:
config = types.LiveConnectConfig(
response_modalities=[types.Modality.TEXT],
system_instruction="You are a concise assistant."
)
async with client.aio.live.connect(
model="gemini-3.1-flash-live-preview",
config=config
) as session:
retry_count = 0 # Reset on successful connect
print("Connected. Streaming...")
# Your session logic here
await session.send_client_content(
turns=types.Content(
role="user",
parts=[types.Part.from_text("Hello!")]
)
)
async for message in session.receive():
if message.text:
print(message.text, end="")
except ConnectionError as e:
retry_count += 1
wait_time = 2 ** retry_count # Exponential backoff
print(f"\nConnection lost. Retrying in {wait_time}s... ({retry_count}/{max_retries})")
await asyncio.sleep(wait_time)
print("Max retries reached. Session terminated.")
asyncio.run(resilient_live_session())
5. Tool Use in Live Sessions
The Live API supports function calling within streaming sessions. When the model needs external data, it pauses audio output, requests a tool call, waits for your response, then continues speaking with the new information:
import asyncio
import json
from google import genai
from google.genai import types
client = genai.Client()
# Define tools available in the live session
weather_tool = types.Tool(
function_declarations=[
types.FunctionDeclaration(
name="get_weather",
description="Get current weather for a city",
parameters=types.Schema(
type=types.Type.OBJECT,
properties={
"city": types.Schema(type=types.Type.STRING, description="City name"),
"unit": types.Schema(type=types.Type.STRING, enum=["celsius", "fahrenheit"])
},
required=["city"]
)
)
]
)
def execute_weather_tool(city: str, unit: str = "celsius") -> dict:
"""Simulate weather API call."""
return {"city": city, "temperature": 22, "unit": unit, "condition": "Sunny"}
async def live_session_with_tools():
"""Live session with function calling."""
config = types.LiveConnectConfig(
response_modalities=[types.Modality.TEXT],
tools=[weather_tool],
system_instruction="You are a helpful assistant. Use the weather tool when asked about weather."
)
async with client.aio.live.connect(
model="gemini-3.1-flash-live-preview",
config=config
) as session:
await session.send_client_content(
turns=types.Content(
role="user",
parts=[types.Part.from_text("What's the weather like in Tokyo?")]
)
)
async for message in session.receive():
# Handle tool call requests
if message.tool_call:
print(f"[Tool Call] {message.tool_call.function_calls[0].name}")
func_call = message.tool_call.function_calls[0]
args = dict(func_call.args)
# Execute the tool
result = execute_weather_tool(**args)
# Send result back to the session
await session.send_tool_response(
function_responses=[
types.FunctionResponse(
name=func_call.name,
response=result
)
]
)
# Handle text responses
if message.text:
print(f"Assistant: {message.text}")
if message.server_content and message.server_content.turn_complete:
break
asyncio.run(live_session_with_tools())
6. Best Practices
Optimizing Live API sessions for production quality:
- Chunk Size: Send audio in 100ms chunks for optimal latency. Larger chunks increase delay; smaller chunks increase overhead.
- Language Specification: Always set the language in system instructions (e.g., “Respond in English”). Without this, the model may switch languages based on audio input characteristics.
- Voice Selection: Choose voices matching your use case. Available voices include Puck, Charon, Kore, Fenrir, and Aoede — each with distinct characteristics.
- Graceful Disconnection: Always close sessions properly. Abrupt disconnections waste server resources and may count against rate limits.
- Audio Format: Use PCM 16-bit at 16kHz for input audio. The model outputs audio in the same format by default.
import asyncio
from google import genai
from google.genai import types
client = genai.Client()
async def production_live_config():
"""Production-optimized live session configuration."""
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO, types.Modality.TEXT],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name="Kore")
)
),
system_instruction=(
"You are a professional voice assistant. "
"Always respond in English. "
"Keep responses under 3 sentences unless asked for detail. "
"If you don't know something, say so clearly."
),
# Input audio configuration
realtime_input_config=types.RealtimeInputConfig(
automatic_activity_detection=types.AutomaticActivityDetection(
disabled=False # Auto-detect when user stops speaking
)
)
)
async with client.aio.live.connect(
model="gemini-3.1-flash-live-preview",
config=config
) as session:
print("Production session ready.")
# Your production session logic here
asyncio.run(production_live_config())
gemini-3.1-flash-live-preview with text-only response modality and keep system instructions short. Each additional modality and instruction length adds processing time.
Next in the Gemini SDK Track
In Part 13: Optimization & Production Operations, we’ll optimize cost and performance with context caching (explicit + implicit), Flex and Priority inference, Batch API, webhooks, data logging, and billing architecture.