Gemini SDK Track Part 1: Platform Setup & SDK Fundamentals

                        
                        What You’ll Learn: This article gets you started with Google’s Gemini platform — from creating a project in Google AI Studio to making your first API call with the Python SDK. You’ll understand the model family (Flash for speed, Pro for quality), configure authentication, and build your first generative AI application. Think of Google AI Studio as your control panel for Gemini, similar to how the GCP Console manages cloud resources.
                    

1. SDK Installation & Environment Setup

The modern Google GenAI SDK replaces the deprecated google-generativeai package. The new unified library — google-genai for Python and @google/genai for JavaScript — provides a consistent client-first API pattern across both the classic generateContent and the new Interactions API.

Migration Warning: If you have existing code using import google.generativeai as genai or genai.GenerativeModel, those are deprecated. See the Migration Guide for the transition path. This series teaches only the current SDK.

1.1 Python Setup

Requires Python 3.9+. Install the unified package:

pip install -U google-genai

Create a client instance. The SDK reads your API key from the GEMINI_API_KEY environment variable by default:

from google import genai

# Option 1: Reads GEMINI_API_KEY from environment (recommended)
client = genai.Client()

# Option 2: Explicit API key (for quick testing only)
client = genai.Client(api_key="YOUR_API_KEY")

# Verify connection — list available models
for model in client.models.list():
    print(model.name)

1.2 JavaScript / TypeScript Setup

Requires Node.js v18+. Install via npm:

npm install @google/genai

Create a client and make your first call:

import { GoogleGenAI } from "@google/genai";

const client = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await client.models.generateContent({
    model: "gemini-3.5-flash",
    contents: "Explain quantum entanglement in one paragraph."
});

console.log(response.text);

1.3 Your First API Call (Python)

The simplest possible generation — a single text prompt with a single text response:

from google import genai

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What is the capital of France?"
)

print(response.text)
# Output: The capital of France is Paris.

                        
                        Key Difference from Old SDK: The new SDK uses client.models.generate_content() rather than creating a GenerativeModel object first. The client is your single entry point to all API operations — models, files, caches, interactions, and agents.
                    

2. API Key Management & Security

2.1 Creating & Restricting Keys

Generate your API key from Google AI Studio. For production deployments, restrict keys in the GCP Console:

from google import genai
import os

# Production pattern: key from environment variable
# Set in your shell: export GEMINI_API_KEY="your-key-here"
client = genai.Client()

# Verify the key works with a minimal call
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Hello!"
)
print(f"API key valid. Response: {response.text[:50]}...")

2.2 Environment Security Best Practices

Never hardcode API keys in source files. Use environment variables or secret managers:

import os
from google import genai

# Pattern 1: Environment variable (development)
api_key = os.environ.get("GEMINI_API_KEY")
if not api_key:
    raise ValueError("GEMINI_API_KEY environment variable not set")
client = genai.Client(api_key=api_key)

# Pattern 2: GCP Secret Manager (production)
# from google.cloud import secretmanager
# sm_client = secretmanager.SecretManagerServiceClient()
# secret = sm_client.access_secret_version(name="projects/my-project/secrets/gemini-key/versions/latest")
# client = genai.Client(api_key=secret.payload.data.decode("utf-8"))

# Pattern 3: For GCP-hosted apps, use OAuth (no API key needed)
# client = genai.Client(vertexai=True, project="my-project", location="us-central1")

print(f"Client initialized. Model: gemini-3.5-flash")

                        
                        Security Rules: (1) Add .env to .gitignore. (2) Rotate keys regularly via GCP Console. (3) Restrict keys to specific APIs and IP ranges. (4) Use GCP OAuth service accounts for server-side production workloads instead of API keys.
                    

Real-World Application

Multilingual Content Platform

A global education company uses Gemini Flash for real-time translation of course content into 20 languages. Flash’s speed (sub-second latency) enables live subtitle generation during video lectures, while Pro handles the offline translation of complex academic papers.

Gemini FlashMultilingualEdTech

3. The Gemini Model Directory

Google provides a tiered model family optimized for different use cases. Understanding the directory is critical for cost-effective architecture decisions.

3.1 Gemini 3.5 Flash — The Flagship

Gemini 3.5 Flash is the primary model for most use cases. It provides sustained frontier-level intelligence optimized for real-world tasks at high speed and low cost. Designed for the agentic era, it excels at sub-agent deployment, multi-step workflows, and rapid coding iterations.

Property	Value
Model code	`gemini-3.5-flash`
Input types	Text, Image, Video, Audio, PDF
Context window	1,048,576 tokens (1M)
Max output	65,536 tokens
Thinking	Supported (default on)
Function calling	Supported
Grounding (Search/Maps)	Supported
Context caching	Supported
File Search (RAG)	Supported
Structured outputs	Supported
Knowledge cutoff	January 2025

from google import genai

client = genai.Client()

# Using Gemini 3.5 Flash — the default workhorse
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Explain the CAP theorem in distributed systems. Be concise."
)
print(response.text)
print(f"\nToken usage: {response.usage_metadata}")

3.2 Gemini 3.1 Pro — Deep Intelligence

Gemini 3.1 Pro is the frontier reasoning model for complex logic, long-horizon planning, and deep analysis where accuracy matters more than latency.

from google import genai

client = genai.Client()

# Using Gemini 3.1 Pro for complex reasoning
response = client.models.generate_content(
    model="gemini-3.1-pro-preview",
    contents="""Analyze this system design:
    - 100M daily active users
    - Sub-100ms p99 latency requirement
    - Strong consistency needed for financial transactions
    - Global deployment across 5 regions

    What architecture would you recommend? Consider trade-offs."""
)
print(response.text)

3.3 Specialized Models

Beyond the core Gemini family, Google offers task-specialized models:

Model	Purpose	Key Capability
`gemini-3.1-flash-lite`	High-volume, low-cost tasks	Fastest latency, cheapest per-token
`gemini-3.1-flash-live-preview`	Real-time audio dialogue	Bidirectional streaming, acoustic nuance
`gemini-embedding-2`	Semantic embeddings	Text, image, audio, video → vectors
`imagen-4.0-generate-preview`	Image generation	Hyper-realistic spatial synthesis
`veo-3.1-generate-preview`	Video generation	HD video from text prompts
`lyria-realtime-preview`	Music/audio generation	Real-time musical creation

from google import genai

client = genai.Client()

# List all available models programmatically
print("Available Gemini Models:")
print("-" * 50)
for model in client.models.list():
    print(f"  {model.name}")

4. Token Pricing Architecture

4.1 Free vs Paid Tiers

Google offers a generous free tier for development and a cost-effective paid tier for production:

Model	Free Tier	Paid (per 1M tokens)
Gemini 3.5 Flash	Not available	Input: $0.15 / Output: $0.60
Gemini 3.1 Flash Lite	Available	Input: $0.02 / Output: $0.10
Gemini 2.5 Flash	Available	Input: $0.15 / Output: $0.60
Gemini 3.1 Pro	Not available	Input: $1.50 / Output: $6.00

4.2 Thinking Token Costs

Starting with Gemini 2.5, models spend a “thinking budget” before responding. These thinking tokens are charged at the output token rate and contribute to overall cost. You can control this with the thinking_budget parameter:

from google import genai
from google.genai import types

client = genai.Client()

# Default: model decides thinking budget dynamically
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What is 15% of 340?"
)
print(f"Answer: {response.text}")
print(f"Total tokens: {response.usage_metadata}")

# Explicit: limit thinking tokens to reduce cost
response_fast = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="What is 15% of 340?",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=0)  # No thinking
    )
)
print(f"\nNo-thinking answer: {response_fast.text}")

# High budget: maximize reasoning for complex problems
response_deep = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Prove that there are infinitely many prime numbers.",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_budget=8192)
    )
)
print(f"\nDeep reasoning answer: {response_deep.text[:200]}...")

                        
                        Cost Formula: Total cost = (input_tokens × input_rate) + (thinking_tokens × output_rate) + (output_tokens × output_rate). Cached tokens are charged at ~75% discount. Use thinking_budget=0 for simple lookups and thinking_budget=-1 for dynamic (model-chosen) depth.
                    

5. MCP Coding Agent Setup

The Gemini Docs MCP (Model Context Protocol) server allows your coding IDE to query the latest Gemini SDK documentation in real-time. This ensures your code assistant always references current APIs rather than stale training data.

Add this to your VS Code mcp.json configuration:

{
    "servers": {
        "Gemini API Docs": {
            "url": "https://gemini-api-docs-mcp.dev",
            "type": "http"
        }
    }
}

Once configured, your coding agent can search the live Gemini documentation to find current method signatures, parameters, and code samples — particularly useful for the rapidly evolving Interactions API.

from google import genai

# After MCP setup, your coding agent knows the current SDK patterns.
# For example, it will recommend this (current):
client = genai.Client()
response = client.models.generate_content(
    model="gemini-3.5-flash",
    contents="Hello, world!"
)

# Instead of this (deprecated):
# import google.generativeai as genai  # ❌ OLD
# model = genai.GenerativeModel("gemini-pro")  # ❌ OLD
# response = model.generate_content("Hello")  # ❌ OLD

6. Interactions API Preview

The Interactions API (Beta) represents the future of Gemini — a stateful, server-side conversation system that replaces manual history management. While Part 9 covers this in depth, here is a taste of the new paradigm:

from google import genai

client = genai.Client()

# The Interactions API manages state server-side
interaction1 = client.interactions.create(
    model="gemini-3.5-flash",
    input="Hi, my name is Phil."
)
print(f"Response 1: {interaction1.output_text}")

# Continue the conversation — no history array needed!
interaction2 = client.interactions.create(
    model="gemini-3.5-flash",
    previous_interaction_id=interaction1.id,
    input="What is my name?"
)
print(f"Response 2: {interaction2.output_text}")
# Output: "Your name is Phil."

                        
                        Key Architectural Difference: With generateContent, you manually manage the contents array (appending user/model turns). With the Interactions API, the server tracks conversation state via previous_interaction_id — enabling automatic cache hits, implicit thought signature preservation, and dramatically simplified multi-turn code.
                    

                        
                        Try It Yourself: Build a ‘language detector and translator’: use Gemini Flash for fast language detection, then Gemini Pro for high-quality translation. Compare response times and quality between the two models on 5 sample texts in different languages. Log token usage for cost analysis.
                    

Next in the Gemini SDK Track

In Part 2: Text Generation & Structured Outputs, we’ll dive into generateContent parameters, enforce type-safe JSON responses via response_format with schemas, streaming patterns, token counting, and working with the 1M token context window.