1. Deep Research Agent (Preview)
Deep Research is not just another model endpoint — it is a fully autonomous agent designed for multi-step investigation. When you submit a research query, the agent formulates a plan, searches multiple sources, synthesizes findings with citations, and delivers a structured report.
deep-research-preview-04-2026. Unlike standard generation, it runs asynchronously and may take 30 seconds to several minutes depending on query complexity.
1.1 Background Mode & Polling
Since deep research tasks can take significant time, you launch them in background mode and poll for completion:
from google import genai
client = genai.Client()
# Launch a deep research task in background mode
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input="What are the latest advances in solid-state battery technology for EVs in 2026? Include key companies, breakthroughs, and timeline to commercialization.",
background=True
)
print(f"Research started: {interaction.id}")
print(f"Status: {interaction.status}")
# Status will be "PROCESSING" initially
Poll for completion using the interaction ID:
import time
from google import genai
client = genai.Client()
# Assume interaction_id from previous launch
interaction_id = "interaction-abc123"
# Poll until complete
while True:
result = client.interactions.get(id=interaction_id)
print(f"Status: {result.status}")
if result.status == "COMPLETED":
print("\n--- Research Report ---")
print(result.output_text)
break
elif result.status == "FAILED":
print(f"Error: {result.error}")
break
time.sleep(5) # Check every 5 seconds
output_text contains Markdown with numbered references. Use interaction.citations to access structured citation metadata.
2. Research with Documents & Multi-Turn Follow-up
Deep Research can analyze uploaded documents (PDFs, text files) and URLs alongside your text query. This enables grounded research over your own proprietary data:
from google import genai
from google.genai import types
client = genai.Client()
# Upload a PDF for research context
uploaded_file = client.files.upload(file="quarterly-report-q1-2026.pdf")
# Research with document context
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input=[
types.Part.from_text("Analyze the trends in this quarterly report and compare with public market data for the same period."),
types.Part.from_uri(file_uri=uploaded_file.uri, mime_type="application/pdf")
],
background=True
)
print(f"Research with document started: {interaction.id}")
2.1 Multi-Turn Follow-up
Continue a research conversation with follow-up questions using previous_interaction_id:
from google import genai
client = genai.Client()
# First research interaction (already completed)
first_interaction_id = "interaction-abc123"
# Follow-up question referencing previous research
followup = client.interactions.create(
agent="deep-research-preview-04-2026",
previous_interaction_id=first_interaction_id,
input="Based on your findings, which company has the strongest patent portfolio in this space?",
background=True
)
print(f"Follow-up research started: {followup.id}")
Global Bank AI Deployment
A multinational bank deployed Gemini via Vertex AI with full compliance: data residency in EU (Frankfurt region), CMEK encryption for all prompts/responses, VPC-SC perimeter blocking external data exfiltration, and audit logs sent to their SIEM. Approval took 6 months but the architecture now serves 50+ internal AI applications.
3. Streaming Deep Research
For real-time visibility into the research process, use streaming mode. The agent emits events as it progresses through planning, searching, and synthesizing:
3.1 Streaming Implementation
from google import genai
client = genai.Client()
# Stream research progress
stream = client.interactions.create_stream(
agent="deep-research-preview-04-2026",
input="Compare the regulatory frameworks for autonomous vehicles in the EU, US, and China as of 2026.",
stream=True
)
for event in stream:
if event.type == "thought":
print(f"[PLAN] {event.text}")
elif event.type == "tool_use":
print(f"[SEARCH] Querying: {event.tool_name}")
elif event.type == "model_output":
print(f"\n--- Final Report ---")
print(event.text)
elif event.type == "citation":
print(f"[SOURCE] {event.url} - {event.title}")
thought events show the agent’s planning steps. tool_use events indicate searches being performed. model_output delivers the final synthesized report. citation events provide structured source references.
4. Computer Use (Preview)
Computer Use enables Gemini to interact with graphical user interfaces — clicking buttons, filling forms, navigating websites, and reading screen content. It is a vision-driven browser automation system where the model observes screenshots, reasons about what to do, and executes actions.
4.1 Prerequisites & Environment
You need a secure execution environment with a browser automation framework:
# Set up a sandboxed environment with Playwright
pip install google-genai playwright
playwright install chromium
# Run in Docker for production safety
docker run -it --rm python:3.12 bash -c "pip install google-genai playwright && playwright install chromium && python your_script.py"
5. The Computer Use Action Loop
Computer Use follows a 5-step cycle: Initialize → Capture Screen → Model Evaluate → Safety Check → Execute & Recurse. The loop continues until the model signals task completion.
import base64
from google import genai
from google.genai import types
from playwright.sync_api import sync_playwright
client = genai.Client()
def capture_screenshot(page):
"""Capture and encode the current page screenshot."""
screenshot_bytes = page.screenshot()
return base64.b64encode(screenshot_bytes).decode("utf-8")
def execute_action(page, action):
"""Execute a model-requested action on the page."""
if action.type == "click":
page.mouse.click(action.x, action.y)
elif action.type == "type":
page.keyboard.type(action.text)
elif action.type == "scroll":
page.mouse.wheel(action.delta_x, action.delta_y)
elif action.type == "navigate":
page.goto(action.url)
def computer_use_loop(task: str, start_url: str, max_steps: int = 20):
"""Run the observe-think-act cycle for browser automation."""
with sync_playwright() as p:
browser = p.chromium.launch(headless=True)
page = browser.new_page(viewport={"width": 1280, "height": 720})
page.goto(start_url)
for step in range(max_steps):
# Step 1: Capture screen
screenshot_b64 = capture_screenshot(page)
# Step 2: Model evaluates and decides action
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=[
types.Part.from_text(f"Task: {task}\nStep {step + 1}. What action should I take next?"),
types.Part.from_image(
image=types.Image(image_bytes=base64.b64decode(screenshot_b64), mime_type="image/png")
)
],
config=types.GenerateContentConfig(
response_mime_type="application/json"
)
)
action_data = response.parsed
print(f"Step {step + 1}: {action_data}")
# Step 3: Safety check
if action_data.get("requires_confirmation"):
confirm = input(f"Confirm action: {action_data}? (y/n): ")
if confirm.lower() != "y":
print("Action cancelled by user.")
break
# Step 4: Check for completion
if action_data.get("status") == "complete":
print(f"Task complete: {action_data.get('result')}")
break
# Step 5: Execute action
execute_action(page, types.SimpleNamespace(**action_data))
browser.close()
# Example usage
computer_use_loop(
task="Find the current price of AAPL stock on Google Finance",
start_url="https://www.google.com/finance"
)
5.1 Safety & Confirmation Policies
Always implement safety boundaries for high-risk actions:
from google import genai
# Define actions that require human confirmation
HIGH_RISK_ACTIONS = {"purchase", "delete", "submit_form", "login", "payment"}
def safety_check(action_data: dict) -> bool:
"""Evaluate whether an action requires human confirmation."""
action_type = action_data.get("action_category", "")
if action_type in HIGH_RISK_ACTIONS:
print(f"\n⚠️ HIGH-RISK ACTION DETECTED: {action_type}")
print(f" Details: {action_data.get('description', 'No description')}")
confirm = input(" Approve? (yes/no): ")
return confirm.lower() == "yes"
return True # Low-risk actions proceed automatically
6. Best Practices & Limitations
Key considerations for production deployments:
- Rate Limiting: Deep Research tasks consume significant compute. Implement queue-based submission with exponential backoff for retry logic.
- Timeouts: Set reasonable timeouts (5–10 minutes for deep research, 60 seconds per computer use step). Kill long-running tasks gracefully.
- Sandboxing: Computer Use must run in isolated environments. Never expose host filesystem, credentials, or network access beyond the target site.
- Citations: Always validate Deep Research citations. The agent provides source URLs — verify they are accessible and relevant.
- Cost Management: Deep Research can generate 10,000+ tokens per report. Monitor usage and set billing alerts.
from google import genai
client = genai.Client()
# Production-ready research with timeout and error handling
import asyncio
async def research_with_timeout(query: str, timeout_seconds: int = 300):
"""Run deep research with a timeout safeguard."""
interaction = client.interactions.create(
agent="deep-research-preview-04-2026",
input=query,
background=True
)
start_time = asyncio.get_event_loop().time()
while True:
elapsed = asyncio.get_event_loop().time() - start_time
if elapsed > timeout_seconds:
print(f"Timeout after {timeout_seconds}s")
return None
result = client.interactions.get(id=interaction.id)
if result.status == "COMPLETED":
return result.output_text
elif result.status == "FAILED":
print(f"Research failed: {result.error}")
return None
await asyncio.sleep(5)
# Usage
# report = asyncio.run(research_with_timeout("Latest quantum computing breakthroughs"))
Next in the Gemini SDK Track
In Part 12: Live API & Real-Time Streaming, we’ll build real-time audio/video applications with bidirectional WebSocket connections, ephemeral tokens for client-side security, and tool use within live sessions.