Detection & Localization Geometry

Prerequisites: Part 2: Set Theory (intersection, union), Part 8: Calculus (gradients, optimization). This note is the canonical math reference for detection content in PyTorch Mastery: YOLO and TensorFlow Mastery: YOLOv8.

Intersection over Union (IoU)

Given two axis-aligned bounding boxes $B^p = (x_1^p, y_1^p, x_2^p, y_2^p)$ (predicted) and $B^{gt} = (x_1^{gt}, y_1^{gt}, x_2^{gt}, y_2^{gt})$ (ground truth), their Intersection over Union is:

$$\boxed{\text{IoU}(B^p, B^{gt}) = \frac{|B^p \cap B^{gt}|}{|B^p \cup B^{gt}|} = \frac{|B^p \cap B^{gt}|}{|B^p| + |B^{gt}| - |B^p \cap B^{gt}|}}$$

Geometric computation: The intersection rectangle has corners at:

$$x_1^I = \max(x_1^p, x_1^{gt}), \quad y_1^I = \max(y_1^p, y_1^{gt}), \quad x_2^I = \min(x_2^p, x_2^{gt}), \quad y_2^I = \min(y_2^p, y_2^{gt})$$

Intersection area: $|B^p \cap B^{gt}| = \max(x_2^I - x_1^I, 0) \cdot \max(y_2^I - y_1^I, 0)$

Properties:

$\text{IoU} \in [0, 1]$ — scale-invariant (doubling both boxes preserves IoU)
$\text{IoU} = 1$ iff $B^p = B^{gt}$ (perfect overlap)
$\text{IoU} = 0$ when boxes don't overlap at all
IoU is a Jaccard index applied to area (connection to Part 2: Set Theory)

IoU as a Loss Function

The IoU loss is simply:

$$\mathcal{L}_{\text{IoU}} = 1 - \text{IoU}(B^p, B^{gt})$$

Problem with IoU loss: When $B^p \cap B^{gt} = \emptyset$ (non-overlapping boxes), $\text{IoU} = 0$ everywhere regardless of how far apart the boxes are. The gradient is zero — the loss provides no learning signal to move boxes closer.

Generalized IoU (GIoU)

GIoU (Rezatofighi et al., 2019) fixes the zero-gradient problem by penalizing the empty space between non-overlapping boxes. Define $C$ as the smallest enclosing box containing both $B^p$ and $B^{gt}$:

$$\boxed{\text{GIoU} = \text{IoU} - \frac{|C \setminus (B^p \cup B^{gt})|}{|C|}}$$

The penalty term $|C \setminus (B^p \cup B^{gt})| / |C|$ measures the fraction of the enclosing box that is "wasted" (neither predicted nor ground truth). When boxes don't overlap, this term grows, providing gradient signal.

Properties of GIoU:

$\text{GIoU} \in [-1, 1]$ — can be negative (when boxes are far apart)
$\text{GIoU} = \text{IoU}$ when one box contains the other (penalty is zero since $C = $ larger box)
$\text{GIoU} \to -1$ when boxes are infinitely far apart
Always provides non-zero gradients for non-degenerate configurations

$\mathcal{L}_{\text{GIoU}} = 1 - \text{GIoU} \in [0, 2]$

Distance-IoU (DIoU)

DIoU (Zheng et al., 2020) directly penalizes center distance. Let $\mathbf{b}^p, \mathbf{b}^{gt}$ be the center coordinates and $c$ be the diagonal of the enclosing box $C$:

$$\boxed{\text{DIoU} = \text{IoU} - \frac{\|\mathbf{b}^p - \mathbf{b}^{gt}\|^2}{c^2}}$$

where $c = \text{diag}(C) = \sqrt{(x_2^C - x_1^C)^2 + (y_2^C - y_1^C)^2}$.

Advantage over GIoU: DIoU converges faster because the center-distance penalty provides a direct gradient toward the target center, whereas GIoU's penalty is indirect (through the enclosing box area). When boxes have the same center but wrong size, DIoU still penalizes via the IoU term.

Complete IoU (CIoU)

CIoU adds an aspect ratio consistency term to DIoU:

$$\boxed{\mathcal{L}_{\text{CIoU}} = 1 - \text{IoU} + \frac{\|\mathbf{b}^p - \mathbf{b}^{gt}\|^2}{c^2} + \alpha v}$$

where the aspect ratio penalty is:

$$v = \frac{4}{\pi^2}\left(\arctan\frac{w^{gt}}{h^{gt}} - \arctan\frac{w^p}{h^p}\right)^2, \quad \alpha = \frac{v}{(1 - \text{IoU}) + v}$$

The $\alpha$ trade-off parameter ensures the aspect ratio term only dominates when IoU is already high (the box is close but has wrong shape).

Summary of gradient behavior:
            IoU loss: Zero gradient when boxes don't overlap
GIoU: Always non-zero, but slow for distant boxes (indirect signal)
DIoU: Direct center-to-center gradient, fast convergence
CIoU: Fastest — simultaneously optimizes overlap, center, and aspect ratio

        

import numpy as np

def compute_iou_variants(box_p, box_gt):
    """
    Compute IoU, GIoU, DIoU, CIoU for two boxes.
    Boxes: [x1, y1, x2, y2] format.
    """
    x1p, y1p, x2p, y2p = box_p
    x1g, y1g, x2g, y2g = box_gt
    
    # Areas
    area_p = (x2p - x1p) * (y2p - y1p)
    area_g = (x2g - x1g) * (y2g - y1g)
    
    # Intersection
    xi1 = max(x1p, x1g)
    yi1 = max(y1p, y1g)
    xi2 = min(x2p, x2g)
    yi2 = min(y2p, y2g)
    inter = max(xi2 - xi1, 0) * max(yi2 - yi1, 0)
    
    # Union and IoU
    union = area_p + area_g - inter
    iou = inter / union if union > 0 else 0
    
    # Enclosing box C
    xc1 = min(x1p, x1g)
    yc1 = min(y1p, y1g)
    xc2 = max(x2p, x2g)
    yc2 = max(y2p, y2g)
    area_c = (xc2 - xc1) * (yc2 - yc1)
    
    # GIoU
    giou = iou - (area_c - union) / area_c if area_c > 0 else iou
    
    # Center distance (DIoU)
    bp = np.array([(x1p + x2p)/2, (y1p + y2p)/2])
    bg = np.array([(x1g + x2g)/2, (y1g + y2g)/2])
    d2 = np.sum((bp - bg)**2)
    c2 = (xc2 - xc1)**2 + (yc2 - yc1)**2
    diou = iou - d2/c2 if c2 > 0 else iou
    
    # Aspect ratio (CIoU)
    wp, hp = x2p - x1p, y2p - y1p
    wg, hg = x2g - x1g, y2g - y1g
    v = (4 / np.pi**2) * (np.arctan(wg/hg) - np.arctan(wp/hp))**2
    alpha = v / ((1 - iou) + v) if (1 - iou + v) > 0 else 0
    ciou = diou - alpha * v
    
    return {'IoU': iou, 'GIoU': giou, 'DIoU': diou, 'CIoU': ciou}

# Case 1: Overlapping boxes
box_p = [1, 1, 4, 4]    # 3x3 box
box_gt = [2, 2, 5, 5]   # 3x3 box, shifted right+down
print("Case 1: Overlapping boxes")
results = compute_iou_variants(box_p, box_gt)
for k, v in results.items():
    print(f"  {k}: {v:.4f}, Loss = {1-v:.4f}")

# Case 2: Non-overlapping boxes (IoU = 0, but GIoU/DIoU still informative)
box_p2 = [0, 0, 2, 2]
box_gt2 = [5, 5, 7, 7]
print("\nCase 2: Non-overlapping (distant) boxes")
results2 = compute_iou_variants(box_p2, box_gt2)
for k, v in results2.items():
    print(f"  {k}: {v:.4f}, Loss = {1-v:.4f}")
print("  Note: IoU gives zero signal, GIoU/DIoU still provide gradients!")

Non-Maximum Suppression (NMS)

After a detector produces many overlapping candidate boxes, NMS greedily selects the best non-redundant detections:

Sort all detections by confidence score (descending)
Select the highest-scoring box $B_{\text{best}}$, add to output
Remove all remaining boxes where $\text{IoU}(B_i, B_{\text{best}}) > \tau_{\text{NMS}}$
Repeat until no boxes remain

Typical threshold $\tau_{\text{NMS}} \in [0.3, 0.7]$. This is a greedy algorithm — not optimal, but $O(n^2)$ and effective in practice. Soft-NMS replaces the hard removal with score decay: $s_i \leftarrow s_i \cdot e^{-\text{IoU}^2 / \sigma}$.

Table of Contents

Intersection over Union (IoU)

IoU as a Loss Function

Generalized IoU (GIoU)

Distance-IoU (DIoU)

Complete IoU (CIoU)

Non-Maximum Suppression (NMS)

Related Articles

Part 2: Set Theory

PyTorch Deep Dive: YOLO