Back to AI App Dev Series

CrewAI SDK Track Part 11: Human-in-the-Loop & Multimodal

May 24, 2026 Wasil Zafar 35 min read

Implement human-in-the-loop workflows for agent oversight, configure human input on task execution, integrate human feedback in Flows, build multimodal agents with vision capabilities, and generate images with DALL-E integration.

Table of Contents

  1. Human-in-the-Loop Overview
  2. Human Input on Execution
  3. Human Feedback in Flows
  4. Multimodal Agents
  5. Image Generation with DALL-E
What You’ll Learn: Human-in-the-loop (HITL) puts humans at strategic decision points in your AI workflow — approving actions, providing feedback, or correcting course. Multimodal capabilities let agents work with images and video. This article covers both: building agents that know when to ask for help and agents that can see. Think of HITL like the ‘approve’ button in a deployment pipeline: automation does the work, humans make the critical decisions.

1. Human-in-the-Loop Overview

Production AI systems need human oversight. Autonomous agents can hallucinate, make costly mistakes, or take irreversible actions. Human-in-the-loop (HITL) patterns let you inject human judgment at critical decision points while preserving the speed benefits of automation for routine tasks.

When to Use HITL: High-stakes decisions (financial transactions, publishing content), ambiguous inputs requiring domain expertise, compliance requirements mandating human approval, and any action that’s difficult to reverse.

1.1 CrewAI’s HITL Architecture

CrewAI provides HITL at multiple levels: task-level input, agent-level feedback, and Flow-level approval gates. The simplest mechanism is setting human_input=True on a task, which pauses execution and prompts the user before the agent finalizes its output.

from crewai import Agent, Task, Crew, Process

# Basic HITL agent — human reviews output before finalization
reviewer_agent = Agent(
    role="Content Reviewer",
    goal="Review and approve content for publication",
    backstory="You are a senior editor who ensures quality standards.",
    llm="gpt-4o",
    verbose=True
)

# Task with human input enabled
review_task = Task(
    description="Review the following draft article and provide feedback: {draft}",
    expected_output="Approved content with any necessary corrections.",
    agent=reviewer_agent,
    human_input=True  # Pauses for human review before finalizing
)

crew = Crew(
    agents=[reviewer_agent],
    tasks=[review_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff(inputs={"draft": "AI is transforming healthcare..."})
print(result.raw)

1.2 Approval, Feedback, and Correction Patterns

Beyond simple approval, HITL supports iterative correction loops where humans can reject agent output and request revisions:

from crewai import Agent, Task, Crew, Process

# Multi-agent crew with human checkpoints
research_agent = Agent(
    role="Research Analyst",
    goal="Gather and synthesize market data",
    backstory="Expert at finding relevant market intelligence.",
    llm="gpt-4o"
)

strategy_agent = Agent(
    role="Strategy Advisor",
    goal="Develop actionable business strategies",
    backstory="Senior consultant with 20 years of experience.",
    llm="gpt-4o"
)

# Research runs autonomously
research_task = Task(
    description="Research market trends for {industry} in 2026.",
    expected_output="Comprehensive market analysis with data points.",
    agent=research_agent,
    human_input=False  # Runs without human intervention
)

# Strategy requires human approval before delivery
strategy_task = Task(
    description="Based on the research, develop a go-to-market strategy.",
    expected_output="Detailed GTM strategy with timeline and budget.",
    agent=strategy_agent,
    human_input=True,  # Human approves final strategy
    context=[research_task]
)

crew = Crew(
    agents=[research_agent, strategy_agent],
    tasks=[research_task, strategy_task],
    process=Process.sequential,
    verbose=True
)

result = crew.kickoff(inputs={"industry": "renewable energy"})
print(result.raw)
Production Consideration: When human_input=True is set, execution blocks until the human responds. For production systems with SLAs, implement timeout mechanisms or async notification patterns to prevent indefinite hangs.

2. Human Input on Execution

2.1 Agent Asking for Clarification

When human_input=True is set on a task, CrewAI presents the agent’s draft output to the user and asks for approval or corrections. The user can accept, modify, or request a complete redo:

from crewai import Agent, Task, Crew, Process

# Agent that explicitly requests clarification on ambiguous inputs
data_analyst = Agent(
    role="Data Analyst",
    goal="Analyze datasets and produce accurate insights",
    backstory="""You are meticulous about data quality.
    When inputs are ambiguous, you ask for clarification
    rather than making assumptions.""",
    llm="gpt-4o",
    verbose=True
)

# Task requiring human input for ambiguous data interpretation
analysis_task = Task(
    description="""Analyze the sales data for Q1 2026.
    Focus on: {focus_areas}
    Data source: {data_source}""",
    expected_output="""Detailed analysis report with:
    - Key metrics and trends
    - Anomalies identified
    - Actionable recommendations""",
    agent=data_analyst,
    human_input=True  # Agent shows draft, human approves/corrects
)

crew = Crew(
    agents=[data_analyst],
    tasks=[analysis_task],
    process=Process.sequential
)

# When kicked off, the agent will produce a draft, then pause
# for human review before finalizing the output
result = crew.kickoff(inputs={
    "focus_areas": "revenue growth, customer churn",
    "data_source": "CRM export from March 2026"
})
print(result.raw)

2.2 Input Prompts and Timeouts

For production deployments, you may want to configure how long the system waits for human input. CrewAI’s default behavior blocks indefinitely, but you can wrap execution in timeout logic:

import signal
from crewai import Agent, Task, Crew, Process

class HumanInputTimeout(Exception):
    pass

def timeout_handler(signum, frame):
    raise HumanInputTimeout("Human input timed out after 300 seconds")

# Set up timeout for human input (Unix systems)
# signal.signal(signal.SIGALRM, timeout_handler)
# signal.alarm(300)  # 5 minute timeout

approval_agent = Agent(
    role="Compliance Officer",
    goal="Ensure all outputs meet regulatory requirements",
    backstory="You review content for regulatory compliance.",
    llm="gpt-4o"
)

compliance_task = Task(
    description="Review this financial report for compliance: {report}",
    expected_output="Compliance-approved report or list of violations.",
    agent=approval_agent,
    human_input=True
)

crew = Crew(
    agents=[approval_agent],
    tasks=[compliance_task],
    process=Process.sequential
)

try:
    result = crew.kickoff(inputs={"report": "Q1 earnings summary..."})
    print("Approved:", result.raw)
except HumanInputTimeout:
    print("Escalating: human review not completed within timeout")

3. Human Feedback in Flows

3.1 HITL Steps in Flow Execution

CrewAI Flows provide more sophisticated HITL patterns through dedicated steps that pause for human interaction. Unlike task-level human_input, Flow-based HITL can route execution paths based on human decisions:

from crewai.flow.flow import Flow, start, listen, router
from pydantic import BaseModel
from typing import Optional

class ContentState(BaseModel):
    draft: str = ""
    human_feedback: Optional[str] = None
    approved: bool = False
    revision_count: int = 0

class ContentReviewFlow(Flow[ContentState]):

    @start()
    def generate_draft(self):
        """AI generates initial content draft."""
        self.state.draft = "AI-generated article about machine learning trends..."
        print(f"Draft generated: {self.state.draft[:50]}...")
        return self.state.draft

    @listen(generate_draft)
    def human_review_step(self, draft):
        """Pause for human review and feedback."""
        print("\n" + "=" * 50)
        print("HUMAN REVIEW REQUIRED")
        print("=" * 50)
        print(f"\nDraft:\n{draft}\n")

        # In production, this would be an API call to a review UI
        feedback = input("Approve (y), Reject with feedback (n), or Edit (e): ")

        if feedback.lower() == "y":
            self.state.approved = True
            self.state.human_feedback = "Approved as-is"
        elif feedback.lower() == "e":
            edit = input("Enter your edits: ")
            self.state.draft = edit
            self.state.approved = True
            self.state.human_feedback = "Edited by human"
        else:
            self.state.human_feedback = input("Feedback for revision: ")
            self.state.approved = False
            self.state.revision_count += 1

        return self.state.approved

    @router(human_review_step)
    def route_after_review(self, approved):
        """Route based on human decision."""
        if approved:
            return "publish"
        elif self.state.revision_count >= 3:
            return "escalate"
        else:
            return "revise"

    @listen("publish")
    def publish_content(self):
        """Publish approved content."""
        print(f"\nPublished: {self.state.draft[:50]}...")
        return self.state.draft

    @listen("revise")
    def revise_draft(self):
        """Revise based on human feedback."""
        print(f"\nRevising based on: {self.state.human_feedback}")
        self.state.draft = f"Revised: {self.state.draft} [Addressed: {self.state.human_feedback}]"
        return self.state.draft

    @listen("escalate")
    def escalate_to_manager(self):
        """Escalate after too many revisions."""
        print("\nEscalated to manager after 3 revision attempts.")
        return "ESCALATED"

# Run the flow
flow = ContentReviewFlow()
result = flow.kickoff()
print(f"\nFinal result: {result}")

3.2 Routing Based on Human Decisions

Flows can create complex decision trees where human input determines which branch of execution to follow:

from crewai.flow.flow import Flow, start, listen, router
from pydantic import BaseModel
from typing import Literal

class ApprovalState(BaseModel):
    proposal: str = ""
    budget: float = 0.0
    risk_level: Literal["low", "medium", "high"] = "low"
    approval_status: str = "pending"

class BudgetApprovalFlow(Flow[ApprovalState]):

    @start()
    def analyze_proposal(self):
        """AI analyzes the budget proposal."""
        self.state.proposal = "Cloud infrastructure upgrade: $50,000"
        self.state.budget = 50000.0
        self.state.risk_level = "medium"
        print(f"Proposal: {self.state.proposal}")
        print(f"Risk: {self.state.risk_level}")
        return self.state.risk_level

    @router(analyze_proposal)
    def route_by_risk(self, risk_level):
        """Route to appropriate approval level."""
        if risk_level == "low":
            return "auto_approve"
        elif risk_level == "medium":
            return "manager_review"
        else:
            return "executive_review"

    @listen("auto_approve")
    def auto_approve(self):
        """Low-risk items auto-approved."""
        self.state.approval_status = "approved"
        print("Auto-approved (low risk)")
        return self.state.approval_status

    @listen("manager_review")
    def manager_review(self):
        """Medium-risk items need manager approval."""
        print(f"\nManager Review: {self.state.proposal} (${self.state.budget:,.0f})")
        decision = input("Manager: Approve (y/n)? ")
        self.state.approval_status = "approved" if decision == "y" else "rejected"
        return self.state.approval_status

    @listen("executive_review")
    def executive_review(self):
        """High-risk items need executive approval."""
        print(f"\nExecutive Review: {self.state.proposal} (${self.state.budget:,.0f})")
        decision = input("Executive: Approve (y/n)? ")
        self.state.approval_status = "approved" if decision == "y" else "rejected"
        return self.state.approval_status

flow = BudgetApprovalFlow()
result = flow.kickoff()
print(f"\nFinal status: {result}")
Production Pattern: In real deployments, replace input() calls with webhook endpoints, Slack integrations, or dedicated approval UIs. The Flow persists state between steps, enabling async human responses over hours or days.
Real-World Application

Regulated Content Publishing

A pharmaceutical company uses HITL for their medical content crew: AI drafts patient-facing content, a human medical reviewer approves claims, and the agent revises based on feedback. The HITL checkpoint is mandatory for regulatory compliance (HIPAA/FDA). Result: content production 5x faster while maintaining 100% compliance with regulatory review requirements.

Human-in-the-LoopRegulatory Compliance

4. Multimodal Agents

4.1 Using Multimodal Models

CrewAI agents can leverage multimodal models (GPT-4V, Gemini) to understand images, documents, and visual content. Configure agents with a vision-capable model and pass image inputs through task descriptions:

from crewai import Agent, Task, Crew, Process

# Multimodal agent with vision capabilities
vision_agent = Agent(
    role="Visual Analyst",
    goal="Analyze images and extract meaningful insights",
    backstory="""You are an expert at understanding visual content
    including charts, diagrams, photographs, and documents.
    You provide detailed, accurate descriptions and analysis.""",
    llm="gpt-4o",  # Supports vision/multimodal
    verbose=True
)

# Task with image analysis
image_analysis_task = Task(
    description="""Analyze the architectural diagram at this URL:
    {image_url}

    Provide:
    1. Component identification
    2. Data flow description
    3. Potential bottlenecks
    4. Improvement recommendations""",
    expected_output="Detailed architectural analysis with recommendations.",
    agent=vision_agent
)

crew = Crew(
    agents=[vision_agent],
    tasks=[image_analysis_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={
    "image_url": "https://example.com/system-architecture.png"
})
print(result.raw)

4.2 Processing Visual Inputs in Workflows

Combine multimodal understanding with traditional agents for comprehensive analysis pipelines:

from crewai import Agent, Task, Crew, Process

# Agent 1: Extracts information from visual content
chart_reader = Agent(
    role="Chart Analyst",
    goal="Extract numerical data and trends from charts and graphs",
    backstory="You excel at reading financial charts and extracting precise data points.",
    llm="gpt-4o",
    verbose=True
)

# Agent 2: Analyzes extracted data
data_interpreter = Agent(
    role="Data Interpreter",
    goal="Interpret data trends and provide business insights",
    backstory="Senior business analyst who transforms data into actionable insights.",
    llm="gpt-4o",
    verbose=True
)

# Step 1: Read the chart
extract_task = Task(
    description="""Examine the quarterly revenue chart at: {chart_url}
    Extract all data points, labels, and trend indicators.""",
    expected_output="Structured data with all values from the chart.",
    agent=chart_reader
)

# Step 2: Interpret the data
interpret_task = Task(
    description="Analyze the extracted chart data and provide business insights.",
    expected_output="Executive summary with trends, anomalies, and recommendations.",
    agent=data_interpreter,
    context=[extract_task]
)

crew = Crew(
    agents=[chart_reader, data_interpreter],
    tasks=[extract_task, interpret_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"chart_url": "https://example.com/revenue-q1.png"})
print(result.raw)

5. Image Generation with DALL-E

5.1 DallETool Configuration

CrewAI includes a built-in DallETool that wraps OpenAI’s DALL-E image generation API. Agents can generate images as part of their task workflows:

from crewai import Agent, Task, Crew, Process
from crewai_tools import DallETool

# Configure the DALL-E tool
dalle_tool = DallETool(
    model="dall-e-3",
    size="1024x1024",
    quality="hd",
    n=1
)

# Creative agent with image generation capabilities
creative_agent = Agent(
    role="Creative Director",
    goal="Design compelling visual content for marketing campaigns",
    backstory="""You are an award-winning creative director who
    combines strategic thinking with visual artistry. You create
    images that tell stories and drive engagement.""",
    llm="gpt-4o",
    tools=[dalle_tool],
    verbose=True
)

# Task: Generate marketing visuals
visual_task = Task(
    description="""Create a hero image for a blog post about {topic}.
    The image should be:
    - Professional and modern
    - Suitable for a tech company blog
    - Visually striking with good composition
    Generate the image using the DALL-E tool.""",
    expected_output="Generated image URL with description of the creative choices made.",
    agent=creative_agent
)

crew = Crew(
    agents=[creative_agent],
    tasks=[visual_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"topic": "AI-powered code review"})
print(result.raw)

For multi-step creative workflows, combine image generation with content creation:

from crewai import Agent, Task, Crew, Process
from crewai_tools import DallETool

dalle_tool = DallETool(model="dall-e-3", size="1792x1024", quality="hd")

# Writer creates the concept
writer = Agent(
    role="Content Writer",
    goal="Write engaging blog posts with creative visual concepts",
    backstory="You craft compelling narratives and describe vivid imagery.",
    llm="gpt-4o"
)

# Designer brings the concept to life
designer = Agent(
    role="Visual Designer",
    goal="Generate images that perfectly match content themes",
    backstory="You translate written concepts into stunning visuals.",
    llm="gpt-4o",
    tools=[dalle_tool]
)

# Step 1: Write content with image descriptions
write_task = Task(
    description="Write a short blog post about {topic}. Include a detailed description of an ideal hero image.",
    expected_output="Blog post draft with detailed image description.",
    agent=writer
)

# Step 2: Generate matching imagery
design_task = Task(
    description="Based on the blog post, generate a hero image using DALL-E that matches the described visual concept.",
    expected_output="Image URL with explanation of design choices.",
    agent=designer,
    context=[write_task]
)

crew = Crew(
    agents=[writer, designer],
    tasks=[write_task, design_task],
    process=Process.sequential
)

result = crew.kickoff(inputs={"topic": "sustainable technology"})
print(result.raw)
Cost Tip: DALL-E 3 HD images cost $0.080 each (1024×1024) or $0.120 (1792×1024). For iterative workflows, start with standard quality during development and switch to HD for final outputs.
Try It Yourself: Build a ‘content approval’ crew with HITL: (1) a Writer agent drafts content, (2) the task has human_input=True so you review the draft, (3) based on your feedback, the Writer revises, (4) a Publisher agent formats the final version. Test the full loop with 2 revision cycles. Then add a multimodal agent that can describe uploaded images for alt-text generation.

Next in the CrewAI SDK Track

In Part 12: Advanced Patterns & Hooks, we’ll customize agent prompts, implement fingerprinting for reproducibility, use event listeners for real-time monitoring, enable checkpointing for fault tolerance, and leverage execution hooks with annotations.