Back to AI App Dev Series

OpenAI SDK Track Part 4: Function Calling & Tools

May 22, 2026 Wasil Zafar 45 min read

Implement OpenAI function calling — define tools with JSON Schema, parallel and forced tool calling, multi-step tool orchestration loops, production patterns (retries, idempotency, timeouts, permissions), and building robust tool-augmented AI applications.

Table of Contents

  1. Tool Calling Fundamentals
  2. Tool Schemas
  3. Parallel Tool Calling
  4. Multi-Step Orchestration
  5. Production Patterns
  6. Built-In Tools & Advanced Config
What You’ll Learn: Function calling is how you give an AI model ‘hands’ — the ability to interact with the real world through your code. Instead of just generating text, the model can search databases, call APIs, update records, and trigger workflows. This article teaches you to design functions that the model understands intuitively, handle parallel calls, and enforce execution policies. Think of it like teaching a new employee which internal tools exist and how to use them.

1. Tool Calling Fundamentals

Function Calling Flow
sequenceDiagram
    participant User
    participant App
    participant Model as OpenAI Model
    participant Tool as Your Function

    User->>App: "What's the weather in London?"
    App->>Model: messages + tool definitions
    Model-->>App: tool_call: get_weather(city="London")
    App->>Tool: Execute get_weather("London")
    Tool-->>App: {"temp": 15, "condition": "cloudy"}
    App->>Model: messages + tool result
    Model-->>App: "It's 15°C and cloudy in London."
    App-->>User: Final answer
                        

Function calling is easiest to understand as a four-step handshake: define a schema, let the model decide whether a tool is needed, execute the tool in your own runtime, then feed the result back for synthesis. The example below walks through that exact loop end to end.

from openai import OpenAI
import json

client = OpenAI()

# Define tools the model can call
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "units": {"type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature units"},
                },
                "required": ["city"],
                "additionalProperties": False,
            },
        },
    }
]

# Step 1: Send message with tools
response = client.responses.create(
    model="gpt-4.1-mini",
    input="What's the weather in London?",
    tools=tools,
)

# Step 2: Check for tool calls in output
for item in response.output:
    if item.type == "function_call":
        print(f"Tool called: {item.name}({item.arguments})")

        # Step 3: Execute the function (your code)
        args = json.loads(item.arguments)
        weather_result = {"city": args["city"], "temp": 15, "condition": "cloudy", "units": "celsius"}

        # Step 4: Send tool result back
        final = client.responses.create(
            model="gpt-4.1-mini",
            input=[
                {"role": "user", "content": "What's the weather in London?"},
                item,  # the function_call item
                {"type": "function_call_output", "call_id": item.call_id, "output": json.dumps(weather_result)},
            ],
            tools=tools,
        )
        print(f"Final: {final.output_text}")

2. Tool Schemas

Good tool schemas do more than satisfy JSON validation. They shape model behavior. Clear names, tight enums, and nested objects reduce hallucinated arguments and make the model much more reliable when it needs to select filters or complex parameter bundles.

from openai import OpenAI
import json

client = OpenAI()

# Complex tool with nested schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_products",
            "description": "Search for products in the catalog with filters",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query text"},
                    "category": {
                        "type": "string",
                        "enum": ["electronics", "clothing", "books", "food"],
                        "description": "Product category to filter by",
                    },
                    "price_range": {
                        "type": "object",
                        "properties": {
                            "min": {"type": "number", "description": "Minimum price"},
                            "max": {"type": "number", "description": "Maximum price"},
                        },
                        "required": ["min", "max"],
                        "additionalProperties": False,
                    },
                    "sort_by": {
                        "type": "string",
                        "enum": ["price_asc", "price_desc", "rating", "newest"],
                    },
                    "limit": {"type": "integer", "description": "Max results (1-50)"},
                },
                "required": ["query"],
                "additionalProperties": False,
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "add_to_cart",
            "description": "Add a product to the user's shopping cart",
            "parameters": {
                "type": "object",
                "properties": {
                    "product_id": {"type": "string"},
                    "quantity": {"type": "integer", "minimum": 1},
                },
                "required": ["product_id", "quantity"],
                "additionalProperties": False,
            },
        },
    },
]

response = client.responses.create(
    model="gpt-4.1-mini",
    input="Find me wireless headphones under $100, sorted by rating",
    tools=tools,
)

for item in response.output:
    if item.type == "function_call":
        print(f"Tool: {item.name}")
        print(f"Args: {json.dumps(json.loads(item.arguments), indent=2)}")
Real-World Application

Autonomous CRM Agent

A sales team built an agent with functions for their CRM (get_contact, log_call, create_follow_up, update_deal_stage). Sales reps describe calls in natural language, and the agent automatically logs activities, updates deal stages, and creates follow-up tasks. Result: 2 hours/day saved per rep on admin work.

CRM AutomationSales Productivity

3. Parallel Tool Calling

Parallel calling matters when independent facts can be gathered at the same time. Instead of forcing the model through a long serial chain, you can let it request several tools in one response and then return every tool result together for a single final synthesis pass.

from openai import OpenAI
import json

client = OpenAI()

tools = [
    {"type": "function", "function": {"name": "get_weather", "description": "Get weather for a city", "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "get_time", "description": "Get current time in a timezone", "parameters": {"type": "object", "properties": {"timezone": {"type": "string"}}, "required": ["timezone"], "additionalProperties": False}}},
]

# Model may call MULTIPLE tools in parallel
response = client.responses.create(
    model="gpt-4.1-mini",
    input="What's the weather and time in Tokyo and New York?",
    tools=tools,
)

# Collect all tool calls
tool_calls = [item for item in response.output if item.type == "function_call"]
print(f"Model made {len(tool_calls)} parallel tool calls:")
for tc in tool_calls:
    print(f"  - {tc.name}({tc.arguments})")

# Execute all tools and return results
tool_results = []
for tc in tool_calls:
    args = json.loads(tc.arguments)
    # Simulate execution
    if tc.name == "get_weather":
        result = {"city": args["city"], "temp": 22, "condition": "sunny"}
    else:
        result = {"timezone": args["timezone"], "time": "14:30"}

    tool_results.append(tc)  # Include the function_call item
    tool_results.append({"type": "function_call_output", "call_id": tc.call_id, "output": json.dumps(result)})

# Send all results back at once
final = client.responses.create(
    model="gpt-4.1-mini",
    input=[{"role": "user", "content": "What's the weather and time in Tokyo and New York?"}] + tool_results,
    tools=tools,
)
print(f"\nFinal: {final.output_text}")

4. Multi-Step Tool Orchestration

Once the model can call tools repeatedly, you are effectively building an execution engine. The loop below demonstrates the production idea clearly: keep the message state, keep executing tool calls until the model produces final text, and cap the number of iterations so a bad prompt cannot spin forever.

from openai import OpenAI
import json

client = OpenAI()

def execute_tool(name: str, arguments: str) -> str:
    """Route and execute tool calls."""
    args = json.loads(arguments)
    if name == "search_database":
        return json.dumps({"results": [{"id": "doc-1", "title": "RAG Guide", "score": 0.95}]})
    elif name == "read_document":
        return json.dumps({"content": "RAG combines retrieval with generation for grounded responses..."})
    elif name == "summarize":
        return json.dumps({"summary": "RAG uses retrieved context to ground LLM responses."})
    return json.dumps({"error": f"Unknown tool: {name}"})

tools = [
    {"type": "function", "function": {"name": "search_database", "description": "Search knowledge base", "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "read_document", "description": "Read a document by ID", "parameters": {"type": "object", "properties": {"doc_id": {"type": "string"}}, "required": ["doc_id"], "additionalProperties": False}}},
    {"type": "function", "function": {"name": "summarize", "description": "Summarize text", "parameters": {"type": "object", "properties": {"text": {"type": "string"}, "max_words": {"type": "integer"}}, "required": ["text"], "additionalProperties": False}}},
]

# Agentic loop — keep going until model stops calling tools
messages = [{"role": "user", "content": "Find information about RAG and summarize it in 20 words."}]
max_iterations = 5

for i in range(max_iterations):
    response = client.responses.create(
        model="gpt-4.1",
        input=messages,
        tools=tools,
    )

    # Check if model wants to call tools
    tool_calls = [item for item in response.output if item.type == "function_call"]

    if not tool_calls:
        # No more tool calls — we have the final answer
        print(f"\n[Iteration {i+1}] Final answer: {response.output_text}")
        break

    # Execute each tool call
    for tc in tool_calls:
        print(f"[Iteration {i+1}] Calling: {tc.name}({tc.arguments})")
        result = execute_tool(tc.name, tc.arguments)
        messages.append(tc)
        messages.append({"type": "function_call_output", "call_id": tc.call_id, "output": result})

5. Production Patterns

Production tool systems fail in ways demo scripts never reveal. Arguments can be malformed, tools can time out, users can invoke capabilities they should not have, and the same tool call can be retried after a network failure. A dedicated execution layer is where you enforce those guardrails.

from openai import OpenAI
import json
import time
import hashlib
from functools import lru_cache

client = OpenAI()

class ToolExecutor:
    """Production-grade tool executor with retries, logging, and permissions."""

    def __init__(self, allowed_tools: set[str] | None = None):
        self.allowed_tools = allowed_tools
        self.execution_log: list[dict] = []

    def execute(self, name: str, arguments: str, call_id: str) -> str:
        """Execute a tool call with safety checks."""
        # Permission check
        if self.allowed_tools and name not in self.allowed_tools:
            return json.dumps({"error": f"Tool '{name}' not permitted"})

        # Parse and validate arguments
        try:
            args = json.loads(arguments)
        except json.JSONDecodeError:
            return json.dumps({"error": "Invalid JSON arguments"})

        # Execute with timing
        start = time.time()
        try:
            result = self._dispatch(name, args)
        except Exception as e:
            result = {"error": str(e)}
        elapsed = time.time() - start

        # Log execution
        self.execution_log.append({
            "call_id": call_id,
            "tool": name,
            "args": args,
            "result": result,
            "duration_ms": round(elapsed * 1000),
        })

        return json.dumps(result)

    def _dispatch(self, name: str, args: dict) -> dict:
        """Route to actual tool implementations."""
        handlers = {
            "get_weather": self._get_weather,
            "search": self._search,
        }
        handler = handlers.get(name)
        if not handler:
            return {"error": f"No handler for tool: {name}"}
        return handler(**args)

    def _get_weather(self, city: str, **kwargs) -> dict:
        return {"city": city, "temp": 20, "condition": "clear"}

    def _search(self, query: str, **kwargs) -> dict:
        return {"results": [f"Result for: {query}"]}

# Usage
executor = ToolExecutor(allowed_tools={"get_weather", "search"})
result = executor.execute("get_weather", '{"city": "London"}', "call-001")
print(result)
print(f"Log: {executor.execution_log}")

6. Built-In Tools, MCP & Advanced Configuration

The OpenAI platform provides a rich ecosystem of built-in tools (web search, file search, code interpreter, image generation, computer use, shell) and native MCP server integration. The design implication is important: not every capability has to be implemented as your own Python function. Use custom function tools when your app owns the execution logic. Use built-in tools when OpenAI already provides the capability. Use MCP servers when you want standardized access to external systems.

from openai import OpenAI

client = OpenAI()

# Built-in web search tool: no local execution loop required.
response = client.responses.create(
    model="gpt-4.1",
    input="Find recent guidance on prompt caching and summarize the operational benefits.",
    tools=[
        {"type": "web_search_preview"},
    ],
)

print(response.output_text)

6.1 Tool Choice Configuration

The tool_choice parameter controls whether and how the model uses tools. The default ("auto") lets the model decide. Use "required" to force at least one tool call, or specify a function name to force a particular tool. The allowed_tools variant restricts which tools the model can use while keeping the full tool list in context for prompt caching benefits.

from openai import OpenAI

client = OpenAI()

tools = [
    {"type": "function", "name": "get_weather", "description": "Get weather for a city.",
     "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"], "additionalProperties": False}, "strict": True},
    {"type": "function", "name": "search_docs", "description": "Search internal documents.",
     "parameters": {"type": "object", "properties": {"query": {"type": "string"}}, "required": ["query"], "additionalProperties": False}, "strict": True},
]

# Auto (default): model decides whether to call tools
response = client.responses.create(
    model="gpt-4.1", input="What's the weather in Paris?",
    tools=tools, tool_choice="auto",
)

# Required: model MUST call at least one tool
response = client.responses.create(
    model="gpt-4.1", input="Look up the weather.",
    tools=tools, tool_choice="required",
)

# Forced function: model MUST call this specific tool
response = client.responses.create(
    model="gpt-4.1", input="Tell me about Paris.",
    tools=tools, tool_choice={"type": "function", "name": "get_weather"},
)

# Allowed tools: restrict which tools can be called (cache-friendly)
response = client.responses.create(
    model="gpt-4.1", input="Find docs about rate limits.",
    tools=tools,
    tool_choice={"type": "allowed_tools", "mode": "auto", "tools": [
        {"type": "function", "name": "search_docs"},
    ]},
)
print(response.output_text)

6.2 Tool Namespaces

Namespaces group related tools by domain (e.g., crm, billing, shipping). This helps the model choose between tools that serve different systems, and enables deferred loading of infrequently-used tools within a namespace.

from openai import OpenAI

client = OpenAI()

# Namespace groups related tools by domain
tools = [
    {
        "type": "namespace",
        "name": "crm",
        "description": "CRM tools for customer lookup and order management.",
        "tools": [
            {
                "type": "function",
                "name": "get_customer_profile",
                "description": "Fetch a customer profile by customer ID.",
                "parameters": {
                    "type": "object",
                    "properties": {"customer_id": {"type": "string"}},
                    "required": ["customer_id"],
                    "additionalProperties": False,
                },
            },
            {
                "type": "function",
                "name": "list_open_orders",
                "description": "List open orders for a customer.",
                "defer_loading": True,  # Only loaded when model needs it
                "parameters": {
                    "type": "object",
                    "properties": {"customer_id": {"type": "string"}},
                    "required": ["customer_id"],
                    "additionalProperties": False,
                },
            },
        ],
    },
    {
        "type": "namespace",
        "name": "shipping",
        "description": "Shipping tools for tracking and logistics.",
        "tools": [
            {
                "type": "function",
                "name": "track_package",
                "description": "Track a package by tracking number.",
                "parameters": {
                    "type": "object",
                    "properties": {"tracking_number": {"type": "string"}},
                    "required": ["tracking_number"],
                    "additionalProperties": False,
                },
            },
        ],
    },
]

response = client.responses.create(
    model="gpt-4.1",
    input="What orders are open for customer C-1234?",
    tools=tools,
)
print(response.output_text)

When your application exposes dozens or hundreds of tools, you can use tool_search to let the model dynamically search for and load relevant tools at runtime. This keeps the initial context small and is supported on gpt-5.4 and later models.

from openai import OpenAI

client = OpenAI()

# tool_search allows the model to discover tools at runtime (gpt-5.4+)
tools = [
    {"type": "tool_search"},  # Enables dynamic tool discovery
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location.",
        "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"],
            "additionalProperties": False,
        },
    },
    {
        "type": "function",
        "name": "book_flight",
        "description": "Book a flight between two cities.",
        "defer_loading": True,  # Not loaded until model searches for it
        "parameters": {
            "type": "object",
            "properties": {
                "origin": {"type": "string"},
                "destination": {"type": "string"},
                "date": {"type": "string"},
            },
            "required": ["origin", "destination", "date"],
            "additionalProperties": False,
        },
    },
]

# Model will search for relevant tools, load them, then call them
response = client.responses.create(
    model="gpt-5.4",
    input="Book a flight from London to Tokyo on June 15th.",
    tools=tools,
)
print(response.output_text)
Best Practice: Aim for fewer than 20 tools loaded at the start of a turn. Use tool_search or defer_loading within namespaces for the rest. This improves accuracy and reduces input token costs.

6.4 Remote MCP Integration

The Responses API natively supports connecting to remote MCP (Model Context Protocol) servers. This gives the model access to external systems without writing custom function wrappers — the MCP server advertises its capabilities and the model calls them directly.

from openai import OpenAI

client = OpenAI()

# Remote MCP server: model discovers and calls tools from the server
response = client.responses.create(
    model="gpt-4.1",
    tools=[
        {
            "type": "mcp",
            "server_label": "github",
            "server_description": "GitHub MCP server for repository operations.",
            "server_url": "https://mcp.github.com/sse",
            "require_approval": "never",
        },
    ],
    input="List the open issues in the openai/openai-python repository.",
)

print(response.output_text)
Architecture choice: Custom functions are best when you need full control, deterministic business logic, or private runtime state. Built-in tools reduce code. MCP servers provide a standard contract for external systems across multiple agents or products. Use require_approval: "always" for sensitive operations.
Try It Yourself: Build a ‘travel planning assistant’ with 4 functions: search_flights(origin, destination, date), search_hotels(city, check_in, check_out, budget), get_weather(city, date), and convert_currency(amount, from_currency, to_currency). Have the model plan a 3-day trip to Tokyo, making parallel function calls where possible. Measure how many API rounds it takes.

Next in the SDK Track

In OA Part 5: Multimodal — Vision & Image Generation, we’ll use the Vision API for image understanding and DALL-E/GPT-Image for production image generation workflows.