Gemini SDK Track Part 8: File Search & RAG

                        
                        What You’ll Learn: Context caching lets you pre-load large documents or conversation history into Gemini’s memory once, then make multiple fast, cheap queries against that cached context. This is transformative for applications that repeatedly reference the same large documents — you pay the input cost once, then queries are 75% cheaper. Think of it like preloading a textbook into working memory so you can answer any question about it instantly.
                    

1. File Search Overview

Gemini’s File Search is a native retrieval-augmented generation (RAG) tool built directly into the API. Unlike traditional RAG pipelines where you manage embedding infrastructure, chunk documents manually, and orchestrate vector databases, File Search handles all of this server-side — you just upload documents and query them.

                        
                        Key Concept: File Search uses semantic search (embedding-based similarity) rather than keyword matching. It understands the meaning of your query and retrieves relevant document passages even when exact keywords don’t match.
                    

1.1 When to Use File Search

Approach	Best For	Limitations
File Search (RAG)	Large document collections (100s-1000s of files), dynamic content, need citations	Latency from retrieval step, retrieval quality depends on embedding
Context Caching	Repeated queries against same large document, cost optimization	Fixed context, requires cache management
Long Context (1M tokens)	Single document analysis, full-context reasoning	Expensive for large docs, no citation tracking

from google import genai

client = genai.Client()

# File Search is used as a "tool" — similar to function calling
# The model decides when to search based on the query
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What are the key findings in the Q4 report?",
    config={
        "tools": [{"file_search": {"file_search_store_names": ["fileSearchStores/my-store"]}}]
    }
)
print(response.text)

2. FileSearchStores

A FileSearchStore is a managed container for your documents. Behind the scenes, Gemini chunks your documents, generates embeddings using gemini-embedding-2, and indexes them for fast semantic retrieval.

2.1 Creating Stores

from google import genai
from google.genai import types

client = genai.Client()

# Create a new FileSearchStore with an embedding model
store = client.file_search_stores.create(
    config={
        "display_name": "product-documentation",
        "embedding_model": "models/gemini-embedding-2"
    }
)
print(f"Store created: {store.name}")
print(f"Display name: {store.display_name}")
print(f"Embedding model: {store.embedding_model}")

                        
                        Gemini Embedding 2: The models/gemini-embedding-2 model supports multimodal embeddings — it can embed text, images, audio, video, and PDFs into the same vector space. This means your File Search store can contain mixed media documents and still retrieve semantically relevant results.
                    

from google import genai

client = genai.Client()

# Create a store optimized for code documentation
code_store = client.file_search_stores.create(
    config={
        "display_name": "api-reference-docs",
        "embedding_model": "models/gemini-embedding-2"
    }
)
print(f"Code store: {code_store.name}")

# Create a store for internal knowledge base
kb_store = client.file_search_stores.create(
    config={
        "display_name": "internal-knowledge-base",
        "embedding_model": "models/gemini-embedding-2"
    }
)
print(f"KB store: {kb_store.name}")

2.2 Listing & Deleting Stores

from google import genai

client = genai.Client()

# List all stores
print("Your FileSearchStores:")
print("-" * 50)
for store in client.file_search_stores.list():
    print(f"  Name: {store.name}")
    print(f"  Display: {store.display_name}")
    print(f"  Model: {store.embedding_model}")
    print()

# Get a specific store by name
store = client.file_search_stores.get(name="fileSearchStores/abc123")
print(f"Retrieved: {store.display_name}")

# Delete a store (removes all documents and embeddings)
client.file_search_stores.delete(name="fileSearchStores/old-store-id")
print("Store deleted successfully")

Real-World Application

Legal Contract Analysis Platform

A law firm caches 200-page contracts and lets attorneys ask unlimited questions. Without caching, each question costs $0.50 (re-processing the entire document). With caching, the first load costs $2.00 but subsequent questions cost $0.05 each. For contracts needing 20+ queries, this saves 75% on API costs.

Context CachingLegal TechCost Optimization

3. Managing Documents

3.1 Uploading Documents

Once you have a store, add documents to it. Gemini automatically chunks, embeds, and indexes each document:

from google import genai

client = genai.Client()

store_name = "fileSearchStores/my-store-id"

# Upload a single document from a local file
with open("docs/architecture-guide.pdf", "rb") as f:
    doc = client.file_search_stores.documents.create(
        parent=store_name,
        file=f,
        config={"display_name": "Architecture Guide v2.1"}
    )
print(f"Uploaded: {doc.name}")
print(f"Status: {doc.status}")

from google import genai

client = genai.Client()

store_name = "fileSearchStores/my-store-id"

# List all documents in a store
print("Documents in store:")
for doc in client.file_search_stores.documents.list(parent=store_name):
    print(f"  {doc.display_name} — {doc.name}")

# Get details about a specific document
doc = client.file_search_stores.documents.get(
    name=f"{store_name}/documents/doc123"
)
print(f"\nDocument: {doc.display_name}")
print(f"Size: {doc.size_bytes} bytes")
print(f"Status: {doc.status}")

# Delete a document from the store
client.file_search_stores.documents.delete(
    name=f"{store_name}/documents/doc123"
)
print("Document removed from store")

3.2 Supported File Types

Category	Formats	Notes
Text	.txt, .md, .csv, .html	Direct text extraction
Documents	.pdf, .docx	OCR for scanned PDFs
Code	.py, .js, .ts, .java, .go, .rs	Language-aware chunking
Data	.json, .xml, .yaml	Structure-preserving chunking

4. Querying with File Search

4.1 Using File Search as a Tool in generateContent

File Search is passed as a tool to generateContent. The model autonomously decides when to invoke the search based on whether the query requires external knowledge:

from google import genai
from google.genai import types

client = genai.Client()

# Query using File Search tool
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Summarize the authentication flow described in our architecture docs.",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=["fileSearchStores/my-store-id"]
                )
            )
        ]
    )
)

# The response includes the generated text
print("Answer:", response.text)

# Access grounding metadata — retrieved chunks and citations
if response.candidates[0].grounding_metadata:
    metadata = response.candidates[0].grounding_metadata
    print(f"\nRetrieved {len(metadata.grounding_chunks)} chunks")
    for chunk in metadata.grounding_chunks:
        print(f"  Source: {chunk.retrieved_context.title}")
        print(f"  Text: {chunk.retrieved_context.text[:100]}...")

4.2 Interactions API Variant

The Interactions API uses a slightly different tool specification format with type discriminators:

from google import genai

client = genai.Client()

# File Search via the Interactions API
interaction = client.interactions.create(
    model="gemini-3.5-flash",
    input="What deployment strategies does our documentation recommend?",
    tools=[
        {
            "type": "file_search",
            "file_search_store_names": ["fileSearchStores/my-store-id"]
        }
    ]
)

print("Answer:", interaction.output_text)

# Inspect steps to see the retrieval
for step in interaction.steps:
    print(f"  Step type: {step.type}")
    if step.type == "file_search_call":
        print(f"  Query: {step.file_search_call.query}")
    elif step.type == "file_search_result":
        print(f"  Results: {len(step.file_search_result.chunks)} chunks")

5. Combining File Search with Generation

5.1 Structured Outputs + File Search

Combine File Search with structured output to get type-safe responses grounded in your documents:

from google import genai
from google.genai import types

client = genai.Client()

# Define a schema for the response
summary_schema = types.Schema(
    type="object",
    properties={
        "title": types.Schema(type="string", description="Document title"),
        "key_points": types.Schema(
            type="array",
            items=types.Schema(type="string"),
            description="Key points from the document"
        ),
        "confidence": types.Schema(type="number", description="Confidence score 0-1")
    },
    required=["title", "key_points", "confidence"]
)

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Extract the top 5 key points from the security policy document.",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=["fileSearchStores/security-docs"]
                )
            )
        ],
        response_mime_type="application/json",
        response_schema=summary_schema
    )
)

import json
result = json.loads(response.text)
print(f"Title: {result['title']}")
print(f"Confidence: {result['confidence']}")
for i, point in enumerate(result['key_points'], 1):
    print(f"  {i}. {point}")

5.2 Multi-Store Queries

Query multiple stores simultaneously to cross-reference different document collections:

from google import genai
from google.genai import types

client = genai.Client()

# Search across multiple stores at once
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Compare the authentication approaches in our API docs vs our security policy.",
    config=types.GenerateContentConfig(
        tools=[
            types.Tool(
                file_search=types.FileSearch(
                    file_search_store_names=[
                        "fileSearchStores/api-docs",
                        "fileSearchStores/security-policy"
                    ]
                )
            )
        ]
    )
)

print(response.text)

# Check which stores contributed to the answer
if response.candidates[0].grounding_metadata:
    for chunk in response.candidates[0].grounding_metadata.grounding_chunks:
        print(f"  Source: {chunk.retrieved_context.title}")

                        
                        Performance Tip: Multi-store queries run searches in parallel but increase latency slightly. For latency-sensitive applications, prefer a single consolidated store. Use multi-store when you need clear separation (e.g., public docs vs internal docs) or when access control requires different stores per team.
                    

                        
                        Try It Yourself: Build a ‘codebase Q&A system’: (1) Cache an entire codebase (10+ files) using context caching, (2) ask 5 different questions about the code architecture, (3) compare the cost of cached queries vs non-cached queries, (4) measure response time differences. Calculate the break-even point (how many queries make caching worthwhile).
                    

Next in the Gemini SDK Track

In Part 9: The Interactions API (Beta), we’ll explore the architectural shift from stateless generateContent to stateful server-managed conversations — the steps timeline, previous_interaction_id chaining, polymorphic response formats, and streaming events.