Back to Software Engineering & Delivery Mastery Series

Part 36: Agile Testing — Secrets & In-Sprint Automation

May 13, 2026 Wasil Zafar 38 min read

Testing in agile is not a phase — it is woven into every sprint. Master the testing quadrants, BDD and ATDD, exploratory testing sessions, in-sprint automation patterns, and how to make quality a whole-team responsibility rather than a QA bottleneck.

Table of Contents

  1. Introduction
  2. The Agile Testing Quadrants
  3. In-Sprint Testing
  4. Behavior-Driven Development
  5. Acceptance Test-Driven Development
  6. Exploratory Testing
  7. Testing & the Sprint Cycle
  8. Test-First Acceptance
  9. Whole-Team Quality
  10. Common Agile Testing Mistakes
  11. Exercises
  12. Conclusion & Next Steps

Introduction — Testing in Agile Is Fundamentally Different

In waterfall, testing happens after development is "complete." There is a test phase, a test team, and a handoff. In agile, there is no test phase. There is no handoff. Testing is a continuous activity performed by the whole team, within the sprint, alongside development.

This fundamental difference means that agile testing is not just "doing the same testing faster." It requires different techniques, different roles, different thinking. The goal shifts from "finding defects" to "preventing defects" — and from "proving it works" to "building confidence continuously."

Key Insight: In agile, testing is not a safety net at the end — it is the structure of how you build. If testing is not part of your sprint from day one, you are doing waterfall with shorter cycles, not agile.

The Mindset Shift

Aspect Waterfall Testing Agile Testing
When After development phase Continuously within each sprint
Who Dedicated QA team Whole team (devs + testers + PO)
Goal Find defects before release Prevent defects; build confidence
Automation Separate automation phase after manual testing Automation written alongside code
Feedback loop Weeks to months Minutes to hours
Documentation Test plans, test cases, RTM Living documentation (BDD scenarios)
Defect handling Defect backlog, triaging meetings Same-day fix, zero-bug sprints

The Agile Testing Quadrants

Brian Marick's Testing Quadrants (popularised by Lisa Crispin and Janet Gregory in Agile Testing) provide a framework for understanding what kinds of testing serve what purposes. The quadrants map tests along two axes: business-facing vs technology-facing and guiding development vs critiquing the product.

Agile Testing Quadrants (Brian Marick)
quadrantChart
    title Agile Testing Quadrants
    x-axis "Technology-Facing" --> "Business-Facing"
    y-axis "Critique Product" --> "Guide Development"
    quadrant-1 "Q2: Business-Facing, Guide Dev"
    quadrant-2 "Q1: Technology-Facing, Guide Dev"
    quadrant-3 "Q4: Technology-Facing, Critique"
    quadrant-4 "Q3: Business-Facing, Critique"
    "Unit Tests": [0.2, 0.8]
    "Component Tests": [0.3, 0.7]
    "Functional Tests": [0.7, 0.8]
    "Story Tests (BDD)": [0.8, 0.75]
    "Prototypes": [0.85, 0.65]
    "Exploratory Testing": [0.8, 0.3]
    "Usability Testing": [0.75, 0.2]
    "UAT": [0.85, 0.35]
    "Performance Tests": [0.25, 0.3]
    "Security Tests": [0.2, 0.2]
    "Load Tests": [0.3, 0.25]
                            

What Fits in Each Quadrant

Quadrant Purpose Activities Automated? Who
Q1 (Tech, Guide) Support the team — verify code works as designed Unit tests, component tests, integration tests Fully automated Developers
Q2 (Business, Guide) Validate we are building the right thing BDD scenarios, story tests, prototypes, simulations Automated where possible Team + PO
Q3 (Business, Critique) Evaluate the product from user perspective Exploratory testing, usability testing, UAT, beta testing Manual (human judgment) Testers + Users
Q4 (Tech, Critique) Critique non-functional properties Performance tests, security scans, load tests, stress tests Automated with tools Specialists
Key Insight: A healthy agile team works across all four quadrants every sprint. Teams that only do Q1 (unit tests) have no business confidence. Teams that only do Q3 (manual exploratory) cannot keep up with sprint velocity. Balance is everything.

In-Sprint Testing — Same-Day Quality

In-sprint testing means that every user story is developed, tested, and verified within the same sprint — ideally within the same day or two. There is no "we'll test it next sprint" and no separate QA sprint. The Definition of Done includes testing.

The Same-Day Defect Resolution Goal

When a defect is found during the sprint (not weeks later), the cost of fixing it is minimal. The developer still has context. The code is fresh. The fix is a 30-minute task, not a 2-day archaeological dig through code you wrote months ago.

  • Day 1: Developer picks up story, writes code + unit tests
  • Day 1-2: Tester pairs with developer to write acceptance tests
  • Day 2: Story is "code complete" — tests passing, ready for review
  • Day 2-3: Exploratory testing finds edge case
  • Day 3: Developer fixes edge case same day, story is Done

Definition of Done — Testing Criteria

# Definition of Done — includes testing requirements
definition_of_done:
  code:
    - Code peer-reviewed and approved
    - No compiler warnings or linter errors
    - Feature flag configured (if applicable)

  testing:
    - Unit tests written and passing (>80% branch coverage for new code)
    - Integration tests passing
    - Acceptance criteria verified (BDD scenarios green)
    - Exploratory testing session completed (30 min minimum)
    - No open defects of severity Critical or High
    - Performance baseline not degraded (p99 latency)

  automation:
    - New automated tests added to CI pipeline
    - No increase in flaky test count
    - Test data factories updated (if new domain objects)

  documentation:
    - BDD scenarios serve as living documentation
    - API changes documented in OpenAPI spec
    - Release notes updated

Behavior-Driven Development (BDD)

BDD bridges the gap between business requirements and automated tests by expressing behaviour in a structured natural language that both humans and machines can read. The Given-When-Then format is a shared language between Product Owners, Developers, and Testers.

Given-When-Then (Gherkin Language)

# features/shopping_cart.feature
Feature: Shopping Cart
  As a customer
  I want to manage items in my shopping cart
  So that I can purchase products I need

  Background:
    Given the product catalog contains:
      | name       | price  | stock |
      | Widget A   | 19.99  | 100   |
      | Gadget B   | 49.99  | 50    |
      | Doohickey  | 9.99   | 200   |

  Scenario: Add item to empty cart
    Given my cart is empty
    When I add "Widget A" to my cart
    Then my cart should contain 1 item
    And the cart total should be $19.99

  Scenario: Add multiple quantities
    Given my cart is empty
    When I add 3 of "Widget A" to my cart
    Then my cart should contain 3 items
    And the cart total should be $59.97

  Scenario: Remove item from cart
    Given my cart contains 2 of "Gadget B"
    When I remove "Gadget B" from my cart
    Then my cart should be empty
    And the cart total should be $0.00

  Scenario: Cannot add out-of-stock item
    Given "Widget A" has 0 in stock
    When I try to add "Widget A" to my cart
    Then I should see an error "Widget A is out of stock"
    And my cart should be empty

  Scenario Outline: Discount tiers
    Given my cart total is <total>
    When the discount is calculated
    Then the discount should be <discount>%

    Examples:
      | total   | discount |
      | $49.99  | 0        |
      | $100.00 | 5        |
      | $200.00 | 10       |
      | $500.00 | 15       |

Implementation with pytest-bdd

import pytest
from pytest_bdd import scenarios, given, when, then, parsers
from shopping_cart import Cart, Product, ProductCatalog

# Load all scenarios from the feature file
scenarios('features/shopping_cart.feature')

@pytest.fixture
def catalog():
    return ProductCatalog()

@pytest.fixture
def cart():
    return Cart()

@given("the product catalog contains:", target_fixture="catalog")
def catalog_with_products(catalog, datatable):
    for row in datatable:
        catalog.add(Product(
            name=row["name"],
            price=float(row["price"]),
            stock=int(row["stock"])
        ))
    return catalog

@given("my cart is empty", target_fixture="cart")
def empty_cart():
    return Cart()

@given(parsers.parse('my cart contains {quantity:d} of "{product_name}"'))
def cart_with_items(cart, catalog, quantity, product_name):
    product = catalog.get(product_name)
    cart.add(product, quantity)

@when(parsers.parse('I add "{product_name}" to my cart'))
def add_to_cart(cart, catalog, product_name):
    product = catalog.get(product_name)
    cart.add(product, 1)

@when(parsers.parse('I add {quantity:d} of "{product_name}" to my cart'))
def add_quantity_to_cart(cart, catalog, quantity, product_name):
    product = catalog.get(product_name)
    cart.add(product, quantity)

@then(parsers.parse("my cart should contain {count:d} item"))
@then(parsers.parse("my cart should contain {count:d} items"))
def cart_item_count(cart, count):
    assert cart.item_count == count

@then(parsers.parse("the cart total should be ${total:f}"))
def cart_total(cart, total):
    assert cart.total == pytest.approx(total, rel=1e-2)
Key Insight: BDD scenarios are not test scripts — they are living documentation. When you run your BDD suite, you are simultaneously testing the system AND generating up-to-date documentation of its behaviour. This eliminates the "docs are outdated" problem because the docs are the tests.

Acceptance Test-Driven Development (ATDD)

ATDD takes BDD one step further: acceptance tests are written before development begins, as part of sprint planning. The "Three Amigos" — Product Owner, Developer, and Tester — collaborate to define acceptance criteria as executable specifications.

The Three Amigos Conversation

Three Amigos ATDD Workflow
sequenceDiagram
    participant PO as Product Owner
    participant Dev as Developer
    participant QA as Tester

    PO->>Dev: Here's the user story
    PO->>QA: Here's the user story
    Note over PO,QA: Three Amigos Session (30 min)
    QA->>PO: What about edge case X?
    PO->>QA: Good catch — here's the expected behaviour
    Dev->>PO: Is this technically feasible in sprint?
    PO->>Dev: Yes, simplified version is fine
    QA->>Dev: I'll write acceptance scenarios
    Dev->>QA: I'll make them pass
    Note over Dev,QA: Development + Testing in parallel
    QA-->>Dev: Scenarios are green ✓
    Dev-->>PO: Story is Done ✓
                            

TDD vs BDD vs ATDD — When to Use Each

Practice Scope Written By Language Primary Benefit
TDD Method/function level Developer alone Code (test framework) Better code design, regression safety
BDD Feature/behaviour level Team collaboration Gherkin (natural language) Shared understanding, living docs
ATDD Story/acceptance level Three Amigos before dev Executable specifications Right thing built, no rework

Rule of thumb: Use TDD for internal code quality. Use BDD for features that cross team boundaries. Use ATDD for stories where requirements are ambiguous or have high business impact.

Exploratory Testing — Human Intelligence

Exploratory testing is simultaneous learning, test design, and test execution. Unlike scripted testing (where you follow predefined steps), exploratory testing uses human creativity, domain knowledge, and intuition to find bugs that automation cannot.

James Bach defines it as: "Simultaneously learning about the system, designing tests, and executing those tests." It is not ad-hoc or unstructured — it is session-based with clear charters, time-boxes, and debriefs.

Session-Based Test Management (SBTM)

# Exploratory Testing Charter
session:
  id: ET-2026-05-034
  tester: "Jane Developer"
  date: "2026-05-13"
  duration: 45 minutes

charter: |
  Explore the checkout flow with international addresses
  to discover edge cases in address validation
  using various country formats (UK, Germany, Japan, Brazil)

areas:
  - Checkout address form
  - Address validation API
  - Shipping cost calculation
  - Order confirmation display

notes: |
  - UK postcodes with spaces (SW1A 1AA) accepted correctly ✓
  - German addresses with umlauts (Müller Straße) display correctly ✓
  - Japanese addresses: 3-line format not supported — BUG filed
  - Brazilian CEP format (12345-678) rejected by validator — BUG filed
  - Empty state/province field causes 500 error for countries without states — BUG filed

bugs_found: 3
  - JIRA-4521: Japanese address format not supported (Medium)
  - JIRA-4522: Brazilian CEP postal code regex too restrictive (Low)
  - JIRA-4523: 500 error when state/province is empty (High)

insights: |
  The address validation relies on a US-centric regex pattern.
  Recommend: Replace with Google Address Validation API or
  country-specific validation libraries.

debrief_notes: |
  Shared findings in standup. High-severity bug fixed same day.
  Medium bugs added to next sprint backlog.
Research

Exploratory Testing Effectiveness (Itkonen et al., 2012)

A study at the University of Helsinki compared exploratory testing with scripted test case execution across four software projects. Key findings: exploratory testing found 48% more defects per hour than scripted testing, and the defects found were of higher severity (more likely to affect users). However, scripted tests provided better coverage of specified requirements. The researchers concluded that the approaches are complementary, not competing — exploratory testing excels at finding unexpected issues that scripted tests cannot anticipate, while scripted/automated tests provide regression safety. The optimal strategy uses both: automated tests for the known, exploratory testing for the unknown.

Research Exploratory Testing Effectiveness

Testing & the Sprint Cycle

Testing is not a separate activity that happens after coding in a sprint. It is integrated into every sprint ceremony:

Testing Activities in the Sprint Cycle
flowchart TD
    A[Sprint Planning] --> B[Daily Development]
    B --> C[Daily Standup]
    C --> B
    B --> D[Sprint Review]
    D --> E[Retrospective]
    E --> A

    A -.- A1[Testability discussions]
    A -.- A2[Write acceptance criteria]
    A -.- A3[Identify test data needs]
    B -.- B1[TDD / Write tests alongside code]
    B -.- B2[Pair testing sessions]
    B -.- B3[Exploratory testing]
    C -.- C1[Test progress updates]
    C -.- C2[Blocker escalation]
    D -.- D1[Demo with confidence]
    D -.- D2[No surprises - all tested]
    E -.- E1[Testing process improvements]
    E -.- E2[Flakiness review]
                            

Testing in Each Ceremony

  • Sprint Planning: Discuss testability of stories. Estimate testing effort. Identify risky areas needing exploratory sessions. Define acceptance criteria as team.
  • Daily Standup: Report test progress (not just code progress). Raise blockers — "I can't test story X because test data isn't ready." Coordinate pairing sessions.
  • Sprint Review: Demonstrate features with confidence because they are already tested. No "we still need to verify this" disclaimers. Show BDD scenario results as evidence.
  • Retrospective: Review testing process. Were stories held up by testing? Were there defects that should have been caught earlier? Is the automation suite growing sustainably?

Test-First Acceptance — Specifications as Tests

The most powerful pattern in agile testing is writing acceptance criteria as executable specifications before development starts. This eliminates ambiguity, prevents rework, and creates living documentation automatically.

# Write BEFORE development — this IS the requirement
Feature: Loyalty Points Calculation
  As a returning customer
  I want to earn loyalty points on purchases
  So that I can redeem them for discounts

  Rule: Earn 1 point per $10 spent (rounded down)

  Scenario: Standard purchase earns points
    Given I am a loyalty member
    When I complete a purchase of $75.50
    Then I should earn 7 loyalty points
    And my points balance should increase by 7

  Rule: Double points on birthday month

  Scenario: Birthday month doubles points
    Given I am a loyalty member
    And today is within my birthday month
    When I complete a purchase of $50.00
    Then I should earn 10 loyalty points

  Rule: Points expire after 12 months of inactivity

  Scenario: Points expire when inactive
    Given I am a loyalty member with 150 points
    And I have not made a purchase in 13 months
    When the nightly expiration job runs
    Then my points balance should be 0
    And I should receive an expiration notification email

When the developer picks up this story, they have zero ambiguity about what to build. The tester can run these scenarios immediately when code is delivered. The Product Owner can read them and confirm they match intent. Everyone speaks the same language.

Whole-Team Quality

In mature agile teams, quality is everyone's responsibility, not just the tester's job. This does not mean everyone does the same testing — it means everyone contributes to quality in their area of expertise.

Role Quality Contribution Testing Activities
Developer Code quality, design quality, test automation Unit tests, integration tests, TDD, code review
Tester / QA Engineer Test strategy, exploratory skills, risk analysis Exploratory testing, E2E automation, test data strategy
Product Owner Clear acceptance criteria, priority decisions Define acceptance scenarios, UAT, prioritise bug fixes
Scrum Master Process quality, removing impediments Ensure testing is in DoD, facilitate three amigos
DevOps/Platform Pipeline reliability, environment quality CI/CD quality gates, test environment provisioning
Anti-Pattern — "Throw it over the wall": If developers say "it works on my machine, over to QA" and testers say "I found 12 bugs, back to dev," you do not have an agile team. You have a waterfall team pretending to be agile. Quality is a shared outcome, not a handoff.

Common Agile Testing Mistakes

Anti-Patterns

The Seven Deadly Sins of Agile Testing

Based on patterns observed across hundreds of agile teams, these are the most common testing anti-patterns that undermine sprint delivery:

Anti-Patterns Agile Quality
Mistake Symptom Root Cause Fix
Mini-Waterfall Dev in week 1, testing in week 2 of sprint Testing not integrated into Definition of Done Pair testing, story-level DoD with tests
QA Sprint "Hardening sprint" or "stabilisation sprint" Quality debt accumulating sprint over sprint Zero-bug policy, same-sprint testing
Automation Backlog Growing list of "tests to automate later" Automation not part of story estimation Include automation in story points
Testing Not Estimated Stories finish "code complete" but not "done" Only development effort estimated Estimate dev + test + automation together
Tester Bottleneck Stories queue up waiting for the one tester Team relies on single person for all testing Developers write tests, share testing responsibility
No Exploratory Time Only scripted/automated tests, subtle bugs escape 100% focus on automation, no time for thinking Budget 20% of testing time for exploration
Requirements as Tests Hundreds of UI tests mirroring requirements doc Confusing "acceptance criteria" with "test the UI" Test at the right level (API > UI)

The Zero-Bug Sprint Goal

A "zero-bug sprint" does not mean "no bugs are found." It means every bug found during the sprint is fixed during the sprint. No defect backlog. No "we'll fix it later." This forces the team to:

  • Limit work-in-progress (fewer stories = more focus = fewer bugs)
  • Write tests before code (prevents defects instead of finding them)
  • Fix bugs immediately while context is fresh (30-minute fix vs 2-day fix)
  • Improve quality over time (fewer defects per sprint as practices mature)

Exercises

Put agile testing concepts into practice with your current team or a sample project.

Exercise 1 — BDD Scenario Writing: Take a feature from your current project (or choose an e-commerce checkout flow). Write 5 BDD scenarios in Gherkin format covering: happy path, error case, edge case, boundary value, and a business rule. Share with a non-technical stakeholder and verify they can understand the expected behaviour without explanation.
Exercise 2 — Exploratory Testing Session: Conduct a 45-minute exploratory testing session using session-based test management. Write a charter, set a timer, explore the system, take notes, and produce a debrief document listing: bugs found, insights gained, areas needing further exploration, and time spent. Compare the defects found with what your automated suite catches.
Exercise 3 — Testing Quadrants Audit: Map your current team's testing activities to the four quadrants. For each quadrant, list: what you do, what you do not do, and what you should do. Identify the biggest gap — which quadrant is most neglected? Create an action plan to address it within the next 2 sprints.
Exercise 4 — Three Amigos Practice: Pick an upcoming user story. Before development begins, conduct a Three Amigos session (30 minutes) with a developer, tester, and product representative. Document: (1) acceptance criteria as executable scenarios, (2) edge cases discovered during discussion, (3) testability concerns raised, and (4) test data requirements identified. Compare the story's outcome (defects found post-development) with stories that did not have a Three Amigos session.

Conclusion & Next Steps

Agile testing is not "do testing faster." It is a fundamental rethinking of when, how, and by whom testing happens. The key practices are: testing quadrants (balance all four), BDD/ATDD (executable specifications before code), exploratory testing (human creativity finds what automation cannot), and whole-team quality (no handoffs, shared responsibility).

The most impactful change a team can make is adopting the zero-bug sprint policy: every defect found in the sprint is fixed in the sprint. This single practice forces all the other good habits — smaller stories, test-first development, immediate feedback, and continuous quality improvement.

Next in the Series

In Part 37: AI in Software Development, we will explore how artificial intelligence is transforming every stage of the software delivery lifecycle — from AI-assisted coding and automated test generation to intelligent deployment decisions and self-healing systems.