1. SDK Installation & Environment Setup
The modern Google GenAI SDK replaces the deprecated google-generativeai package. The new unified library — google-genai for Python and @google/genai for JavaScript — provides a consistent client-first API pattern across both the classic generateContent and the new Interactions API.
import google.generativeai as genai or genai.GenerativeModel, those are deprecated. See the Migration Guide for the transition path. This series teaches only the current SDK.
1.1 Python Setup
Requires Python 3.9+. Install the unified package:
pip install -U google-genai
Create a client instance. The SDK reads your API key from the GEMINI_API_KEY environment variable by default:
from google import genai
# Option 1: Reads GEMINI_API_KEY from environment (recommended)
client = genai.Client()
# Option 2: Explicit API key (for quick testing only)
client = genai.Client(api_key="YOUR_API_KEY")
# Verify connection — list available models
for model in client.models.list():
print(model.name)
1.2 JavaScript / TypeScript Setup
Requires Node.js v18+. Install via npm:
npm install @google/genai
Create a client and make your first call:
import { GoogleGenAI } from "@google/genai";
const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await client.models.generateContent({
model: "gemini-3.5-flash",
contents: "Explain quantum entanglement in one paragraph."
});
console.log(response.text);
1.3 Your First API Call (Python)
The simplest possible generation — a single text prompt with a single text response:
from google import genai
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="What is the capital of France?"
)
print(response.text)
# Output: The capital of France is Paris.
client.models.generate_content() rather than creating a GenerativeModel object first. The client is your single entry point to all API operations — models, files, caches, interactions, and agents.
2. API Key Management & Security
2.1 Creating & Restricting Keys
Generate your API key from Google AI Studio. For production deployments, restrict keys in the GCP Console:
from google import genai
import os
# Production pattern: key from environment variable
# Set in your shell: export GEMINI_API_KEY="your-key-here"
client = genai.Client()
# Verify the key works with a minimal call
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Hello!"
)
print(f"API key valid. Response: {response.text[:50]}...")
2.2 Environment Security Best Practices
Never hardcode API keys in source files. Use environment variables or secret managers:
import os
from google import genai
# Pattern 1: Environment variable (development)
api_key = os.environ.get("GEMINI_API_KEY")
if not api_key:
raise ValueError("GEMINI_API_KEY environment variable not set")
client = genai.Client(api_key=api_key)
# Pattern 2: GCP Secret Manager (production)
# from google.cloud import secretmanager
# sm_client = secretmanager.SecretManagerServiceClient()
# secret = sm_client.access_secret_version(name="projects/my-project/secrets/gemini-key/versions/latest")
# client = genai.Client(api_key=secret.payload.data.decode("utf-8"))
# Pattern 3: For GCP-hosted apps, use OAuth (no API key needed)
# client = genai.Client(vertexai=True, project="my-project", location="us-central1")
print(f"Client initialized. Model: gemini-3.5-flash")
.env to .gitignore. (2) Rotate keys regularly via GCP Console. (3) Restrict keys to specific APIs and IP ranges. (4) Use GCP OAuth service accounts for server-side production workloads instead of API keys.
Multilingual Content Platform
A global education company uses Gemini Flash for real-time translation of course content into 20 languages. Flash’s speed (sub-second latency) enables live subtitle generation during video lectures, while Pro handles the offline translation of complex academic papers.
3. The Gemini Model Directory
Google provides a tiered model family optimized for different use cases. Understanding the directory is critical for cost-effective architecture decisions.
3.1 Gemini 3.5 Flash — The Flagship
Gemini 3.5 Flash is the primary model for most use cases. It provides sustained frontier-level intelligence optimized for real-world tasks at high speed and low cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and rapid coding iterations.
| Property | Value |
|---|---|
| Model code | gemini-3.5-flash |
| Input types | Text, Image, Video, Audio, PDF |
| Context window | 1,048,576 tokens (1M) |
| Max output | 65,536 tokens |
| Thinking | Supported (default on) |
| Function calling | Supported |
| Grounding (Search/Maps) | Supported |
| Context caching | Supported |
| File Search (RAG) | Supported |
| Structured outputs | Supported |
| Knowledge cutoff | January 2025 |
from google import genai
client = genai.Client()
# Using Gemini 3.5 Flash — the default workhorse
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Explain the CAP theorem in distributed systems. Be concise."
)
print(response.text)
print(f"\nToken usage: {response.usage_metadata}")
3.2 Gemini 3.1 Pro — Deep Intelligence
Gemini 3.1 Pro is the frontier reasoning model for complex logic, long-horizon planning, and deep analysis where accuracy matters more than latency.
from google import genai
client = genai.Client()
# Using Gemini 3.1 Pro for complex reasoning
response = client.models.generate_content(
model="gemini-3.1-pro-preview",
contents="""Analyze this system design:
- 100M daily active users
- Sub-100ms p99 latency requirement
- Strong consistency needed for financial transactions
- Global deployment across 5 regions
What architecture would you recommend? Consider trade-offs."""
)
print(response.text)
3.3 Specialized Models
Beyond the core Gemini family, Google offers task-specialized models:
| Model | Purpose | Key Capability |
|---|---|---|
gemini-3.1-flash-lite | High-volume, low-cost tasks | Fastest latency, cheapest per-token |
gemini-3.1-flash-live-preview | Real-time audio dialogue | Bidirectional streaming, acoustic nuance |
gemini-embedding-2 | Semantic embeddings | Text, image, audio, video → vectors |
imagen-4.0-generate-preview | Image generation | Hyper-realistic spatial synthesis |
veo-3.1-generate-preview | Video generation | HD video from text prompts |
lyria-realtime-preview | Music/audio generation | Real-time musical creation |
from google import genai
client = genai.Client()
# List all available models programmatically
print("Available Gemini Models:")
print("-" * 50)
for model in client.models.list():
print(f" {model.name}")
4. Token Pricing Architecture
4.1 Free vs Paid Tiers
Google offers a generous free tier for development and a cost-effective paid tier for production:
| Model | Free Tier | Paid (per 1M tokens) |
|---|---|---|
| Gemini 3.5 Flash | Not available | Input: $0.15 / Output: $0.60 |
| Gemini 3.1 Flash Lite | Available | Input: $0.02 / Output: $0.10 |
| Gemini 2.5 Flash | Available | Input: $0.15 / Output: $0.60 |
| Gemini 3.1 Pro | Not available | Input: $1.50 / Output: $6.00 |
4.2 Thinking Token Costs
Starting with Gemini 2.5, models spend a “thinking budget” before responding. These thinking tokens are charged at the output token rate and contribute to overall cost. You can control this with the thinking_budget parameter:
from google import genai
from google.genai import types
client = genai.Client()
# Default: model decides thinking budget dynamically
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="What is 15% of 340?"
)
print(f"Answer: {response.text}")
print(f"Total tokens: {response.usage_metadata}")
# Explicit: limit thinking tokens to reduce cost
response_fast = client.models.generate_content(
model="gemini-3.5-flash",
contents="What is 15% of 340?",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=0) # No thinking
)
)
print(f"\nNo-thinking answer: {response_fast.text}")
# High budget: maximize reasoning for complex problems
response_deep = client.models.generate_content(
model="gemini-3.5-flash",
contents="Prove that there are infinitely many prime numbers.",
config=types.GenerateContentConfig(
thinking_config=types.ThinkingConfig(thinking_budget=8192)
)
)
print(f"\nDeep reasoning answer: {response_deep.text[:200]}...")
thinking_budget=0 for simple lookups and thinking_budget=-1 for dynamic (model-chosen) depth.
5. MCP Coding Agent Setup
The Gemini Docs MCP (Model Context Protocol) server allows your coding IDE to query the latest Gemini SDK documentation in real-time. This ensures your code assistant always references current APIs rather than stale training data.
Add this to your VS Code mcp.json configuration:
{
"servers": {
"Gemini API Docs": {
"url": "https://gemini-api-docs-mcp.dev",
"type": "http"
}
}
}
Once configured, your coding agent can search the live Gemini documentation to find current method signatures, parameters, and code samples — particularly useful for the rapidly evolving Interactions API.
from google import genai
# After MCP setup, your coding agent knows the current SDK patterns.
# For example, it will recommend this (current):
client = genai.Client()
response = client.models.generate_content(
model="gemini-3.5-flash",
contents="Hello, world!"
)
# Instead of this (deprecated):
# import google.generativeai as genai # ❌ OLD
# model = genai.GenerativeModel("gemini-pro") # ❌ OLD
# response = model.generate_content("Hello") # ❌ OLD
6. Interactions API Preview
The Interactions API (Beta) represents the future of Gemini — a stateful, server-side conversation system that replaces manual history management. While Part 9 covers this in depth, here is a taste of the new paradigm:
from google import genai
client = genai.Client()
# The Interactions API manages state server-side
interaction1 = client.interactions.create(
model="gemini-3.5-flash",
input="Hi, my name is Phil."
)
print(f"Response 1: {interaction1.output_text}")
# Continue the conversation — no history array needed!
interaction2 = client.interactions.create(
model="gemini-3.5-flash",
previous_interaction_id=interaction1.id,
input="What is my name?"
)
print(f"Response 2: {interaction2.output_text}")
# Output: "Your name is Phil."
generateContent, you manually manage the contents array (appending user/model turns). With the Interactions API, the server tracks conversation state via previous_interaction_id — enabling automatic cache hits, implicit thought signature preservation, and dramatically simplified multi-turn code.
Next in the Gemini SDK Track
In Part 2: Text Generation & Structured Outputs, we’ll dive into generateContent parameters, enforce type-safe JSON responses via response_format with schemas, streaming patterns, token counting, and working with the 1M token context window.