1. Text Generation Fundamentals
The generate_content method is the core of all Gemini text interactions. It accepts a model identifier, content (text, images, or multi-turn history), and an optional configuration object controlling generation behavior.
1.1 Basic Generation
The simplest pattern — a single text prompt producing a single text response:
from google import genai
client = genai.Client()
# Basic single-turn generation
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Explain the difference between TCP and UDP in 3 bullet points."
)
print(response.text)
You can also provide system instructions to set the model’s persona and behavior constraints:
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
config=types.GenerateContentConfig(
system_instruction="You are a senior software architect. Answer concisely with concrete examples. Use bullet points."
),
contents="What are the trade-offs between microservices and monoliths?"
)
print(response.text)
1.2 Multi-Turn Conversations
For multi-turn conversations, pass a list of Content objects alternating between user and model roles:
from google import genai
from google.genai import types
client = genai.Client()
# Build a conversation history
history = [
types.Content(role="user", parts=[types.Part(text="My name is Alex and I'm building a REST API.")]),
types.Content(role="model", parts=[types.Part(text="Hi Alex! I'd be happy to help with your REST API. What technology stack are you using?")]),
types.Content(role="user", parts=[types.Part(text="FastAPI with Python. How should I structure authentication?")])
]
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=history
)
print(response.text)
1.3 Generation Configuration
Fine-tune output behavior with GenerateContentConfig. Key parameters control randomness, length, and stop conditions:
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
config=types.GenerateContentConfig(
temperature=0.2, # Lower = more deterministic (0.0-2.0)
top_p=0.8, # Nucleus sampling threshold
top_k=40, # Top-K token sampling
max_output_tokens=1024, # Maximum response length
stop_sequences=["\n\n"], # Stop generation at double newline
candidate_count=1 # Number of response candidates
),
contents="Write a Python function to validate an email address using regex."
)
print(response.text)
0.0–0.3 for factual/code tasks (deterministic), 0.7–1.0 for creative writing (diverse), and 1.0–2.0 for brainstorming (highly varied). The default is model-dependent but typically around 1.0.
2. Streaming Responses
Streaming delivers response chunks as they’re generated, reducing time-to-first-token and enabling real-time UI updates. Essential for chat interfaces and long-form generation.
2.1 Python Streaming
from google import genai
client = genai.Client()
# Stream response chunks as they arrive
response = client.models.generate_content_stream(
model="gemini-3.5-flash",
contents="Write a comprehensive guide to Python decorators with examples."
)
# Iterate over streamed chunks
full_text = ""
for chunk in response:
print(chunk.text, end="", flush=True)
full_text += chunk.text
print(f"\n\n--- Total length: {len(full_text)} characters ---")
2.2 JavaScript Streaming
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await client.models.generateContentStream({
model: "gemini-3.5-flash",
contents: "Explain event-driven architecture with a real-world example."
});
let fullText = "";
for await (const chunk of response) {
process.stdout.write(chunk.text);
fullText += chunk.text;
}
console.log(`\n\n--- Total length: ${fullText.length} characters ---`);
AI-Powered Marketing Agency
A digital agency generates 200+ social media posts daily using Gemini with tailored system instructions per brand. Each brand has a “voice profile” encoded in the system instruction, and temperature is tuned per content type (0.3 for product descriptions, 0.9 for engagement posts).
3. Structured Outputs & JSON Schema
Structured outputs guarantee the model returns valid JSON conforming to a schema you define. This eliminates post-processing, regex extraction, and retry loops — the response is always machine-parseable.
3.1 JSON Mode with Inline Schema
Specify response_mime_type and response_schema in the config to enforce structured JSON output:
from google import genai
from google.genai import types
client = genai.Client()
# Define the schema inline
response = client.models.generate_content(
model="gemini-3.5-flash",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema={
"type": "object",
"properties": {
"recipe_name": {"type": "string"},
"prep_time_minutes": {"type": "integer"},
"difficulty": {"type": "string", "enum": ["easy", "medium", "hard"]},
"ingredients": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"quantity": {"type": "string"}
},
"required": ["name", "quantity"]
}
},
"steps": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["recipe_name", "prep_time_minutes", "difficulty", "ingredients", "steps"]
}
),
contents="Give me a recipe for classic French onion soup."
)
import json
recipe = json.loads(response.text)
print(f"Recipe: {recipe['recipe_name']}")
print(f"Difficulty: {recipe['difficulty']}")
print(f"Prep time: {recipe['prep_time_minutes']} min")
print(f"Ingredients: {len(recipe['ingredients'])}")
for step in recipe['steps'][:3]:
print(f" - {step}")
3.2 Pydantic Schema Approach
For Python developers, you can pass a Pydantic BaseModel class directly as the schema — the SDK handles serialization automatically:
from google import genai
from google.genai import types
from pydantic import BaseModel
client = genai.Client()
# Define schema as a Pydantic model
class MovieReview(BaseModel):
title: str
year: int
genre: str
rating: float
summary: str
pros: list[str]
cons: list[str]
# Pass the class directly — SDK converts to JSON Schema
response = client.models.generate_content(
model="gemini-3.5-flash",
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=MovieReview
),
contents="Review the movie 'Inception' (2010) by Christopher Nolan."
)
import json
review = json.loads(response.text)
print(f"{review['title']} ({review['year']}) — {review['rating']}/10")
print(f"Genre: {review['genre']}")
print(f"Summary: {review['summary'][:100]}...")
print(f"Pros: {', '.join(review['pros'][:3])}")
3.3 Supported Schema Types & Properties
The Gemini structured output system supports the following JSON Schema features:
| Type | Description | Extra Properties |
|---|---|---|
string | Text values | enum, format (date, uri, email) |
number | Floating-point | minimum, maximum |
integer | Whole numbers | minimum, maximum |
boolean | true/false | — |
object | Nested objects | properties, required, additionalProperties |
array | Lists | items, minItems, maxItems |
null | Nullable fields | Use with anyOf for optional fields |
$ref, oneOf, allOf, or recursive schemas. Keep schemas flat or use at most 2 levels of nesting. For complex data, break into multiple API calls with simpler schemas.
4. Token Counting & Usage Metadata
4.1 Count Tokens API
Before sending expensive prompts, use count_tokens to preview the token cost without making a generation call:
from google import genai
client = genai.Client()
# Count tokens before generation (no cost incurred)
prompt = "Explain the complete history of the Roman Empire from founding to fall."
token_count = client.models.count_tokens(
model="gemini-3.5-flash",
contents=prompt
)
print(f"Prompt tokens: {token_count.total_tokens}")
print(f"Estimated cost at $0.15/1M input: ${token_count.total_tokens * 0.15 / 1_000_000:.6f}")
4.2 Usage Metadata from Responses
Every generation response includes detailed token usage in usage_metadata:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Write a haiku about distributed systems."
)
print(f"Response: {response.text}")
print(f"\n--- Token Usage ---")
print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
print(f"Output tokens: {response.usage_metadata.candidates_token_count}")
print(f"Thinking tokens: {response.usage_metadata.thoughts_token_count}")
print(f"Total tokens: {response.usage_metadata.total_token_count}")
count_tokens with multimodal content to get exact counts before generation.
5. Long Context (1M Tokens)
Gemini 3.5 Flash supports a 1,048,576 token context window — enough to process entire codebases, books, or hours of video in a single API call. This is one of Gemini’s most distinctive capabilities.
5.1 When to Use Long Context
from google import genai
client = genai.Client()
# Example: Analyzing a large document in full context
large_document = open("technical_spec.md").read() # Imagine a 200-page document
response = client.models.generate_content(
model="gemini-3.5-flash",
contents=f"""Analyze the following technical specification and provide:
1. A one-paragraph executive summary
2. The top 5 risks identified
3. Any inconsistencies between sections
Document:
{large_document}"""
)
print(response.text)
5.2 Long Context vs RAG
Choosing between stuffing the full context and using RAG (retrieval-augmented generation) depends on your use case:
| Factor | Long Context | RAG |
|---|---|---|
| Best for | Complete understanding of a document | Searching across millions of documents |
| Accuracy | Higher (sees everything) | Depends on retrieval quality |
| Cost per query | Higher (large input token count) | Lower (only relevant chunks) |
| Latency | Higher for first call | Lower per query |
| Setup complexity | Minimal (just send the text) | Requires embeddings, vector DB, indexing |
| Dynamic data | Always fresh (send latest) | Requires re-indexing |
Next in the Gemini SDK Track
In Part 3: Multimodal — Vision, Video & Documents, we’ll explore image understanding vs. generation, video analysis with timestamps, PDF processing, and the Files API for managing media assets.