Back to AI App Dev Series

Gemini SDK Track Part 11: Deep Research & Computer Use

May 24, 2026 Wasil Zafar 40 min read

Deploy the Deep Research Agent for autonomous multi-step investigations with background mode and citations. Build Computer Use browser automation loops with the observe-think-act cycle and safety policy evaluation.

Table of Contents

  1. Deep Research Agent
  2. Research with Documents
  3. Streaming Deep Research
  4. Computer Use (Preview)
  5. The Computer Use Action Loop
  6. Best Practices & Limitations
What You’ll Learn: Enterprise deployments need more than a working prototype — they need authentication, rate limiting, cost management, compliance, and integration with existing infrastructure. This article covers the patterns that make Gemini ready for Fortune 500 deployment: Vertex AI integration, VPC-SC, CMEK encryption, IAM policies, and multi-region failover.

1. Deep Research Agent (Preview)

Deep Research is not just another model endpoint — it is a fully autonomous agent designed for multi-step investigation. When you submit a research query, the agent formulates a plan, searches multiple sources, synthesizes findings with citations, and delivers a structured report.

Key Insight: Deep Research uses the agent code deep-research-preview-04-2026. Unlike standard generation, it runs asynchronously and may take 30 seconds to several minutes depending on query complexity.

1.1 Background Mode & Polling

Since deep research tasks can take significant time, you launch them in background mode and poll for completion:

from google import genai

client = genai.Client()

# Launch a deep research task in background mode
interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input="What are the latest advances in solid-state battery technology for EVs in 2026? Include key companies, breakthroughs, and timeline to commercialization.",
    background=True
)

print(f"Research started: {interaction.id}")
print(f"Status: {interaction.status}")
# Status will be "PROCESSING" initially

Poll for completion using the interaction ID:

import time
from google import genai

client = genai.Client()

# Assume interaction_id from previous launch
interaction_id = "interaction-abc123"

# Poll until complete
while True:
    result = client.interactions.get(id=interaction_id)
    print(f"Status: {result.status}")

    if result.status == "COMPLETED":
        print("\n--- Research Report ---")
        print(result.output_text)
        break
    elif result.status == "FAILED":
        print(f"Error: {result.error}")
        break

    time.sleep(5)  # Check every 5 seconds
Important: Deep Research produces reports with inline citations linked to source URLs. The output_text contains Markdown with numbered references. Use interaction.citations to access structured citation metadata.

2. Research with Documents & Multi-Turn Follow-up

Deep Research can analyze uploaded documents (PDFs, text files) and URLs alongside your text query. This enables grounded research over your own proprietary data:

from google import genai
from google.genai import types

client = genai.Client()

# Upload a PDF for research context
uploaded_file = client.files.upload(file="quarterly-report-q1-2026.pdf")

# Research with document context
interaction = client.interactions.create(
    agent="deep-research-preview-04-2026",
    input=[
        types.Part.from_text("Analyze the trends in this quarterly report and compare with public market data for the same period."),
        types.Part.from_uri(file_uri=uploaded_file.uri, mime_type="application/pdf")
    ],
    background=True
)

print(f"Research with document started: {interaction.id}")

2.1 Multi-Turn Follow-up

Continue a research conversation with follow-up questions using previous_interaction_id:

from google import genai

client = genai.Client()

# First research interaction (already completed)
first_interaction_id = "interaction-abc123"

# Follow-up question referencing previous research
followup = client.interactions.create(
    agent="deep-research-preview-04-2026",
    previous_interaction_id=first_interaction_id,
    input="Based on your findings, which company has the strongest patent portfolio in this space?",
    background=True
)

print(f"Follow-up research started: {followup.id}")
Real-World Application

Global Bank AI Deployment

A multinational bank deployed Gemini via Vertex AI with full compliance: data residency in EU (Frankfurt region), CMEK encryption for all prompts/responses, VPC-SC perimeter blocking external data exfiltration, and audit logs sent to their SIEM. Approval took 6 months but the architecture now serves 50+ internal AI applications.

EnterpriseComplianceBanking

3. Streaming Deep Research

For real-time visibility into the research process, use streaming mode. The agent emits events as it progresses through planning, searching, and synthesizing:

3.1 Streaming Implementation

from google import genai

client = genai.Client()

# Stream research progress
stream = client.interactions.create_stream(
    agent="deep-research-preview-04-2026",
    input="Compare the regulatory frameworks for autonomous vehicles in the EU, US, and China as of 2026.",
    stream=True
)

for event in stream:
    if event.type == "thought":
        print(f"[PLAN] {event.text}")
    elif event.type == "tool_use":
        print(f"[SEARCH] Querying: {event.tool_name}")
    elif event.type == "model_output":
        print(f"\n--- Final Report ---")
        print(event.text)
    elif event.type == "citation":
        print(f"[SOURCE] {event.url} - {event.title}")
Event Types: thought events show the agent’s planning steps. tool_use events indicate searches being performed. model_output delivers the final synthesized report. citation events provide structured source references.

4. Computer Use (Preview)

Computer Use enables Gemini to interact with graphical user interfaces — clicking buttons, filling forms, navigating websites, and reading screen content. It is a vision-driven browser automation system where the model observes screenshots, reasons about what to do, and executes actions.

Security Requirement: Computer Use MUST run in a sandboxed environment (Docker container or VM). Never run it on your host machine or give it access to sensitive accounts. The model can make mistakes and click unintended elements.

4.1 Prerequisites & Environment

You need a secure execution environment with a browser automation framework:

# Set up a sandboxed environment with Playwright
pip install google-genai playwright
playwright install chromium

# Run in Docker for production safety
docker run -it --rm python:3.12 bash -c "pip install google-genai playwright && playwright install chromium && python your_script.py"

5. The Computer Use Action Loop

Computer Use follows a 5-step cycle: InitializeCapture ScreenModel EvaluateSafety CheckExecute & Recurse. The loop continues until the model signals task completion.

import base64
from google import genai
from google.genai import types
from playwright.sync_api import sync_playwright

client = genai.Client()

def capture_screenshot(page):
    """Capture and encode the current page screenshot."""
    screenshot_bytes = page.screenshot()
    return base64.b64encode(screenshot_bytes).decode("utf-8")

def execute_action(page, action):
    """Execute a model-requested action on the page."""
    if action.type == "click":
        page.mouse.click(action.x, action.y)
    elif action.type == "type":
        page.keyboard.type(action.text)
    elif action.type == "scroll":
        page.mouse.wheel(action.delta_x, action.delta_y)
    elif action.type == "navigate":
        page.goto(action.url)

def computer_use_loop(task: str, start_url: str, max_steps: int = 20):
    """Run the observe-think-act cycle for browser automation."""
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page(viewport={"width": 1280, "height": 720})
        page.goto(start_url)

        for step in range(max_steps):
            # Step 1: Capture screen
            screenshot_b64 = capture_screenshot(page)

            # Step 2: Model evaluates and decides action
            response = client.models.generate_content(
                model="gemini-3.5-flash",
                contents=[
                    types.Part.from_text(f"Task: {task}\nStep {step + 1}. What action should I take next?"),
                    types.Part.from_image(
                        image=types.Image(image_bytes=base64.b64decode(screenshot_b64), mime_type="image/png")
                    )
                ],
                config=types.GenerateContentConfig(
                    response_mime_type="application/json"
                )
            )

            action_data = response.parsed
            print(f"Step {step + 1}: {action_data}")

            # Step 3: Safety check
            if action_data.get("requires_confirmation"):
                confirm = input(f"Confirm action: {action_data}? (y/n): ")
                if confirm.lower() != "y":
                    print("Action cancelled by user.")
                    break

            # Step 4: Check for completion
            if action_data.get("status") == "complete":
                print(f"Task complete: {action_data.get('result')}")
                break

            # Step 5: Execute action
            execute_action(page, types.SimpleNamespace(**action_data))

        browser.close()

# Example usage
computer_use_loop(
    task="Find the current price of AAPL stock on Google Finance",
    start_url="https://www.google.com/finance"
)

5.1 Safety & Confirmation Policies

Always implement safety boundaries for high-risk actions:

from google import genai

# Define actions that require human confirmation
HIGH_RISK_ACTIONS = {"purchase", "delete", "submit_form", "login", "payment"}

def safety_check(action_data: dict) -> bool:
    """Evaluate whether an action requires human confirmation."""
    action_type = action_data.get("action_category", "")

    if action_type in HIGH_RISK_ACTIONS:
        print(f"\n⚠️  HIGH-RISK ACTION DETECTED: {action_type}")
        print(f"   Details: {action_data.get('description', 'No description')}")
        confirm = input("   Approve? (yes/no): ")
        return confirm.lower() == "yes"

    return True  # Low-risk actions proceed automatically
Custom Actions: Beyond standard mouse/keyboard actions, you can define custom action types for specialized platforms. For Android automation, extend the action vocabulary with tap, swipe, and app-launch commands.

6. Best Practices & Limitations

Resolution Settings: Use 1280×720 viewport for optimal model performance. Higher resolutions increase token costs without proportional accuracy gains. The model works best with standard web layouts.

Key considerations for production deployments:

  • Rate Limiting: Deep Research tasks consume significant compute. Implement queue-based submission with exponential backoff for retry logic.
  • Timeouts: Set reasonable timeouts (5–10 minutes for deep research, 60 seconds per computer use step). Kill long-running tasks gracefully.
  • Sandboxing: Computer Use must run in isolated environments. Never expose host filesystem, credentials, or network access beyond the target site.
  • Citations: Always validate Deep Research citations. The agent provides source URLs — verify they are accessible and relevant.
  • Cost Management: Deep Research can generate 10,000+ tokens per report. Monitor usage and set billing alerts.
from google import genai

client = genai.Client()

# Production-ready research with timeout and error handling
import asyncio

async def research_with_timeout(query: str, timeout_seconds: int = 300):
    """Run deep research with a timeout safeguard."""
    interaction = client.interactions.create(
        agent="deep-research-preview-04-2026",
        input=query,
        background=True
    )

    start_time = asyncio.get_event_loop().time()

    while True:
        elapsed = asyncio.get_event_loop().time() - start_time
        if elapsed > timeout_seconds:
            print(f"Timeout after {timeout_seconds}s")
            return None

        result = client.interactions.get(id=interaction.id)
        if result.status == "COMPLETED":
            return result.output_text
        elif result.status == "FAILED":
            print(f"Research failed: {result.error}")
            return None

        await asyncio.sleep(5)

# Usage
# report = asyncio.run(research_with_timeout("Latest quantum computing breakthroughs"))
Try It Yourself: Design an enterprise architecture for a Gemini deployment: (1) draw the network topology (VPC, private endpoints, load balancer), (2) define IAM roles for 3 teams (ML engineers, developers, auditors), (3) set up a cost management policy with per-project budgets and alerts, (4) implement a simple rate limiter in Python that enforces per-user quotas.

Next in the Gemini SDK Track

In Part 12: Live API & Real-Time Streaming, we’ll build real-time audio/video applications with bidirectional WebSocket connections, ephemeral tokens for client-side security, and tool use within live sessions.