Complete Math + Probability + Statistics for ML/AI/DS Bootcamp
Mathematical Thinking
Mindset, notation & functionsSet Theory & Foundations
Sets, operations & ML connectionsCombinatorics
Counting, permutations & combinationsProbability Fundamentals
Rules, Bayes & distributionsStatistics
Descriptive to inferentialInformation Theory
Entropy, cross-entropy & KL divergenceLinear Algebra
Vectors, matrices & transformationsCalculus & Optimization
Derivatives, gradients & descentML-Specific Math
Loss functions & regularizationComputational Math
NumPy, Pandas & simulationAdvanced Topics
Multivariate stats & Bayesian inferenceProjects & Applications
Build from scratch: regression, Bayes, PCAWhy Set Theory for ML?
Before learning probability, you need sets. Before defining a feature space, you need sets. Before talking about "predicted class A" or "ground truth label B", you need sets. Set theory is the bedrock of the mathematical language used across all of ML.
- Probability: Events are sets. $P(A)$ is "the probability that the outcome falls in set $A$."
- Feature spaces: Your training data $\mathcal{X} \subseteq \mathbb{R}^d$ is a subset of a Euclidean space.
- Classification: A classifier partitions the feature space into disjoint class regions.
- Evaluation: Precision and recall are set-theoretic ratios — TP, FP, FN are sets of examples.
- Boolean logic: Decision tree splits use AND (∩) and NOT (complement) implicitly.
Email Classification as Set Partition
Let $\mathcal{U}$ = all emails received. A spam filter classifies each email into one of two disjoint sets: $S$ = {spam emails} and $S^c$ = {non-spam emails}. These two sets cover all of $\mathcal{U}$ (every email is classified) and never overlap (an email is either spam or not-spam, never both). This is a partition of $\mathcal{U}$ — one of the most fundamental set-theoretic concepts in ML.
Sets and Notation
A set is an unordered collection of distinct objects (called elements or members). The key properties: order doesn't matter, and elements can't repeat.
The second form uses set-builder notation: "$B$ is the set of all real numbers $x$ such that $x > 0$" — this is how infinite sets are described compactly.
1 — Membership & Cardinality
Membership: $x \in A$ means "$x$ is an element of $A$". Its negation: $x \notin A$.
Cardinality $|A|$ is the number of elements in set $A$:
The empty set $\emptyset = \{\}$ contains no elements. It is a subset of every set (by vacuous truth).
2 — Subsets & Power Sets
$A$ is a subset of $B$, written $A \subseteq B$, if every element of $A$ is also in $B$:
If additionally $B$ has elements not in $A$, it's a proper subset: $A \subsetneq B$.
The power set $\mathcal{P}(A)$ is the set of ALL subsets of $A$ (including $\emptyset$ and $A$ itself):
The size of the power set grows exponentially:
3 — Special Number Sets
| Symbol | Name | Examples | ML Use |
|---|---|---|---|
| $\mathbb{N}$ | Natural numbers | $0, 1, 2, 3, \ldots$ | Indices, counts |
| $\mathbb{Z}$ | Integers | $\ldots, -2, -1, 0, 1, 2, \ldots$ | Discrete labels |
| $\mathbb{Q}$ | Rationals | $\frac{1}{2}, \frac{3}{4}, \ldots$ | Rational learning rates |
| $\mathbb{R}$ | Real numbers | $\pi, \sqrt{2}, 3.14\ldots$ | Model parameters, features |
| $\mathbb{R}^d$ | $d$-dim real space | $(x_1, \ldots, x_d)$ | Feature vectors |
| $[0,1]$ | Unit interval | $0.3, 0.7$ | Probabilities, dropout rates |
Note: $\mathbb{N} \subsetneq \mathbb{Z} \subsetneq \mathbb{Q} \subsetneq \mathbb{R}$ — each is a proper subset of the next.
## Python sets — basic operations
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Membership
print(3 in A) # True
print(7 in A) # False
# Cardinality
print(len(A)) # 4
# Subset check
print({1, 2} <= A) # True (subset)
print({1, 2} < A) # True (proper subset)
# Power set (manual implementation)
def power_set(s):
s = list(s)
result = [[]]
for elem in s:
result += [subset + [elem] for subset in result]
return [set(x) for x in result]
ps = power_set({1, 2, 3})
print(f"Power set size: {len(ps)}") # 8 = 2^3
for s in sorted(ps, key=len):
print(s)
Universal Set
The universal set $\mathcal{U}$ is the "master set" containing ALL objects under discussion in a given context. Every other set in the problem is automatically a subset of $\mathcal{U}$.
- Classifying emails: $\mathcal{U}$ = all emails received
- ML dataset: $\mathcal{U}$ = all possible input examples
- Survey: $\mathcal{U}$ = all survey respondents
- Number theory: $\mathcal{U}$ could be $\mathbb{N}$, $\mathbb{Z}$, or $\mathbb{R}$ depending on context
In probability, $\mathcal{U}$ directly corresponds to the sample space $\Omega$ — the set of all possible outcomes. Complement always depends on $\mathcal{U}$: if $\mathcal{U} = \{1,2,3,4,5\}$ and $A = \{1,2\}$, then $A^c = \{3,4,5\}$. Change $\mathcal{U}$ and you change the complement.
Singleton Set & Empty (Null) Set
A singleton set is a set with exactly one element:
- $5 \in \{5\}$ — the number 5 is a member of the singleton
- $|\{5\}| = 1$ — the singleton has cardinality 1
- $\{5\} \subseteq \{1, 2, 5\}$ — the singleton is a subset, but $5 \in \{1,2,5\}$ is membership
The empty set (also called the null set) $\emptyset = \{\}$ contains no elements at all: $|\emptyset| = 0$. Key properties:
The empty set appears in ML when: a filter returns no results, two disjoint classes have no overlap, or a search returns no valid solutions. It is the unique set of cardinality 0.
Set Builder Notation
Set builder notation defines a set by a rule (a property members must satisfy) rather than listing elements individually. It is essential for defining infinite sets and feature spaces:
| Set Builder Form | Roster / Description | ML Use |
|---|---|---|
| $\{x \in \mathbb{Z} : x \gt 0\}$ | $\{1, 2, 3, \ldots\}$ | Positive integer class labels |
| $\{x \in \mathbb{R} : 0 \leq x \leq 1\}$ | The unit interval $[0, 1]$ | Probabilities, dropout rates |
| $\{(x,y) \in \mathbb{R}^2 : x^2 + y^2 \leq 1\}$ | Unit disk | Constrained parameter space |
| $\{(\mathbf{x}, y) : i = 1,\ldots,N\}$ | Labeled dataset of $N$ examples | Training data $\mathcal{D}$ |
| $\{\mathbf{w} \in \mathbb{R}^d : \|\mathbf{w}\|_2 \leq \lambda\}$ | $\ell_2$-ball of radius $\lambda$ | L2 regularization constraint |
import numpy as np
# Set builder notation ↔ Python list/set comprehensions
# "All even numbers from 0 to 20"
even_numbers = {x for x in range(0, 21) if x % 2 == 0}
print("Even numbers:", sorted(even_numbers))
# {0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20}
# "All real x in [-3, 3] such that x^2 < 4"
xs = np.linspace(-3, 3, 1000)
satisfying = xs[xs**2 < 4]
print(f"\nx in [-3,3] satisfying x^2 < 4: range ({satisfying.min():.3f}, {satisfying.max():.3f})")
# Mathematically: (-2, 2)
# Feature filtering — select features with |correlation| > 0.5
correlations = {'age': 0.72, 'income': 0.61, 'height': 0.12, 'weight': 0.08, 'score': 0.89}
high_corr = {f for f, c in correlations.items() if abs(c) > 0.5}
print(f"\nHighly correlated features: {high_corr}")
# {'age', 'income', 'score'}
Equal & Equivalent Sets
These are two distinct relationships beginners frequently confuse:
Equal sets contain exactly the same elements (order does not matter in sets):
Equivalent sets (also called equinumerous) have the same cardinality but may contain completely different elements:
- $A = B \Rightarrow A \sim B$ ✓ (equal implies equivalent)
- $A \sim B \not\Rightarrow A = B$ (equivalent does NOT imply equal)
# Equal sets: same elements regardless of order
A = {1, 2, 3}
B = {3, 2, 1} # same elements
C = {'cat', 'dog', 'fish'} # same size, different elements
print("A == B:", A == B) # True — equal (same elements)
print("A == C:", A == C) # False — not equal
print("|A| == |C|:", len(A) == len(C)) # True — equivalent (same cardinality)
# ML context: predicted classes vs actual class labels
predicted = {0, 1, 2} # numeric class indices
actual_names = {'cat', 'dog', 'bird'} # string class names
print("\nSame number of classes:", len(predicted) == len(actual_names)) # True
print("Same class sets:", predicted == actual_names) # False — equivalent, not equal
Proper Subsets, Improper Subsets & Supersets
The general concept of "subset" has important subtypes that appear frequently in formal ML writing:
| Type | Symbol | Meaning | Example |
|---|---|---|---|
| Subset | $A \subseteq B$ | Every element of $A$ is in $B$ (includes $A = B$) | $\{1,2\} \subseteq \{1,2,3\}$ |
| Proper subset | $A \subsetneq B$ | $A \subseteq B$ AND $A \neq B$ — $B$ has at least one extra element | $\{1,2\} \subsetneq \{1,2,3\}$ |
| Improper subset | $A \subseteq A$ | A set is always a subset of itself — the only non-proper subset | $\{1,2,3\} \subseteq \{1,2,3\}$ |
| Superset | $B \supseteq A$ | $B$ contains all elements of $A$ (and possibly more) | $\{1,2,3\} \supseteq \{1,2\}$ |
| Proper superset | $B \supsetneq A$ | $B \supseteq A$ AND $B \neq A$ | $\{1,2,3\} \supsetneq \{1,2\}$ |
# Python set comparison operators
A = frozenset({1, 2})
B = frozenset({1, 2, 3})
print("A ⊆ B (subset):", A <= B) # True
print("A ⊂ B (proper subset):", A < B) # True — A ≠ B
print("B ⊆ A (subset):", B <= A) # False
print("B ⊇ A (superset):", B >= A) # True
print("B ⊃ A (proper superset):", B > A) # True
print("A ⊆ A (improper):", A <= A) # True — every set is its own subset
print("A ⊂ A (proper):", A < A) # False — no proper subset of itself
Finite & Infinite Sets
A finite set has a specific, countable number of elements ($|A| = n$ for some $n \in \mathbb{N}$). An infinite set continues without bound.
| Type | Description | Example | ML context |
|---|---|---|---|
| Finite | $|A| = n$, $n \in \mathbb{N}$ | $\{0, 1\}$, $\{\text{cat, dog, bird}\}$ | Class label set, training dataset |
| Countably infinite | Can be listed as a sequence $a_1, a_2, \ldots$ | $\mathbb{N}$, $\mathbb{Z}$, $\mathbb{Q}$ | Token vocabulary, layer indices |
| Uncountably infinite | Cannot be listed — strictly larger than $\mathbb{N}$ | $\mathbb{R}$, $(0,1)$, $\mathbb{R}^d$ | Feature values, model weights |
- Classification: output space is finite ($k$ classes). Regression: output space is $\mathbb{R}$ — uncountably infinite
- Continuous RVs: because $\mathbb{R}$ is uncountable, $P(X = x_0) = 0$ for any specific value — hence probability density rather than probability mass
- Neural weights: live in $\mathbb{R}^n$ — the search space is uncountably infinite, making exhaustive search impossible
Set Operations
Set operations are the algebraic tools for combining and manipulating sets. They have direct parallels in Boolean logic and SQL WHERE clauses.
4 — Union & Intersection
Union $A \cup B$: all elements in $A$ OR in $B$ (or both):
Intersection $A \cap B$: all elements in BOTH $A$ AND $B$:
Two sets are disjoint if $A \cap B = \emptyset$.
flowchart LR
A["Set A\n{1,2,3,4}"] --> U["A ∪ B\n{1,2,3,4,5,6}\nOR — everything"]
B["Set B\n{3,4,5,6}"] --> U
A --> I["A ∩ B\n{3,4}\nAND — overlap only"]
B --> I
A --> D["A \\ B\n{1,2}\nIN A but NOT B"]
B --> D
A --> S["A △ B\n{1,2,5,6}\nXOR — one but not both"]
B --> S
style A fill:#e8f4f4,stroke:#3B9797,color:#132440
style B fill:#f0f4f8,stroke:#16476A,color:#132440
style U fill:#d4edda,stroke:#3B9797,color:#132440
style I fill:#fff3cd,stroke:#F5A623,color:#132440
style D fill:#fde8ec,stroke:#BF092F,color:#132440
style S fill:#e8e0f0,stroke:#6f42c1,color:#132440
5 — Complement & Set Difference
The complement $A^c$ (also written $\bar{A}$ or $A'$) is everything in the universal set $\mathcal{U}$ that is NOT in $A$:
Set difference $A \setminus B$ (also $A - B$) removes elements of $B$ from $A$:
Symmetric difference $A \triangle B$: elements in exactly one of $A$ or $B$:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
print("Union:", A | B) # {1, 2, 3, 4, 5, 6}
print("Intersection:", A & B) # {3, 4}
print("Difference A-B:", A - B) # {1, 2}
print("Difference B-A:", B - A) # {5, 6}
print("Symmetric diff:", A ^ B) # {1, 2, 5, 6}
# Complement requires knowing the universal set
U = set(range(1, 11)) # Universal set: {1,...,10}
A_complement = U - A
print("Complement of A:", A_complement) # {5, 6, 7, 8, 9, 10}
6 — The Inclusion-Exclusion Principle
When counting elements in a union, naively adding $|A|$ and $|B|$ double-counts the intersection:
For three sets:
Precision & Recall as Set Operations
Let $P$ = {examples predicted positive} and $T$ = {examples actually positive}. Then:
- True Positives: $|P \cap T|$ — correctly predicted positive
- False Positives: $|P \setminus T|$ — predicted positive but actually negative
- False Negatives: $|T \setminus P|$ — actually positive but missed
$$\text{Precision} = \frac{|P \cap T|}{|P|} \qquad \text{Recall} = \frac{|P \cap T|}{|T|}$$
F1 = harmonic mean of precision and recall. Every classification metric reduces to set arithmetic over predictions and ground truth labels.
Venn Diagrams
A Venn diagram is a visual tool for representing sets as overlapping circles inside a rectangle (the universal set). They make set relationships immediately intuitive — especially for beginners.
Standard layout: the rectangle = $\mathcal{U}$, each circle = one set, overlapping regions = intersections.
- Left only = $A \setminus B$ — elements in A but not in B
- Overlap = $A \cap B$ — elements in both A and B
- Right only = $B \setminus A$ — elements in B but not in A
- Both circles = $A \cup B$ — elements in A or B or both
- Outside both = $(A \cup B)^c$ — elements in neither
Venn diagrams are powerful for verifying set identities visually. To check De Morgan's Law $(A \cup B)^c = A^c \cap B^c$: shade $A \cup B$ (both circles), then take its complement (everything outside both circles). This exactly equals the region outside A AND outside B — confirming the law.
Three-Set Venn Diagram: Email Features
Let $S$ = spam emails, $W_1$ = emails with "free", $W_2$ = emails with "prize". Three overlapping circles create 7 non-empty regions (plus the universal set exterior):
- Only $W_1$ (contains "free", not spam, no "prize")
- Only $W_2$ (contains "prize", not spam, no "free")
- Only $S$ (spam with neither word)
- $W_1 \cap W_2$ only (both words, not spam)
- $S \cap W_1$ only (spam with "free", no "prize")
- $S \cap W_2$ only (spam with "prize", no "free")
- $S \cap W_1 \cap W_2$ (spam with both words — core region)
Counting emails in each region and dividing by totals gives the conditional probabilities Naive Bayes needs. Venn diagrams make these conditional counts concrete.
Laws of Sets
Set theory has a complete algebra with named laws — analogous to arithmetic laws for numbers. These laws let you simplify complex expressions and are the mathematical foundation of Boolean logic (the basis of all digital computing and ML decision conditions).
| Law | Union form ($\cup$) | Intersection form ($\cap$) |
|---|---|---|
| Commutative | $A \cup B = B \cup A$ | $A \cap B = B \cap A$ |
| Associative | $(A \cup B) \cup C = A \cup (B \cup C)$ | $(A \cap B) \cap C = A \cap (B \cap C)$ |
| Distributive | $A \cup (B \cap C) = (A \cup B) \cap (A \cup C)$ | $A \cap (B \cup C) = (A \cap B) \cup (A \cap C)$ |
| Identity | $A \cup \emptyset = A$ | $A \cap \mathcal{U} = A$ |
| Domination | $A \cup \mathcal{U} = \mathcal{U}$ | $A \cap \emptyset = \emptyset$ |
| Idempotent | $A \cup A = A$ | $A \cap A = A$ |
| Complement | $A \cup A^c = \mathcal{U}$ | $A \cap A^c = \emptyset$ |
| Double complement | $(A^c)^c = A$ | |
| Absorption | $A \cup (A \cap B) = A$ | $A \cap (A \cup B) = A$ |
| De Morgan's I | $(A \cup B)^c = A^c \cap B^c$ | |
| De Morgan's II | $(A \cap B)^c = A^c \cup B^c$ | |
- $\cup$ ↔ OR, $\cap$ ↔ AND, $A^c$ ↔ NOT
- Commutative:
A or B=B or A - Distributive:
A and (B or C)=(A and B) or (A and C)— compiler optimization - De Morgan's:
not (A and B)=not A or not B— crucial for simplifying filter conditions
# Verify Laws of Sets programmatically
U = frozenset(range(1, 11)) # Universal set: {1,...,10}
A = frozenset({1, 2, 3, 4, 5})
B = frozenset({3, 4, 5, 6, 7})
C = frozenset({5, 6, 7, 8, 9})
Ac = U - A
Bc = U - B
# Commutative
assert A | B == B | A and A & B == B & A
print("Commutative: A∪B = B∪A and A∩B = B∩A ✓")
# Associative
assert (A | B) | C == A | (B | C)
assert (A & B) & C == A & (B & C)
print("Associative: (A∪B)∪C = A∪(B∪C) ✓")
# Distributive
assert A | (B & C) == (A | B) & (A | C)
assert A & (B | C) == (A & B) | (A & C)
print("Distributive: A∪(B∩C) = (A∪B)∩(A∪C) ✓")
# Identity and Domination
assert A | frozenset() == A and A & U == A
assert A | U == U and A & frozenset() == frozenset()
print("Identity and Domination: ✓")
# Complement
assert A | Ac == U and A & Ac == frozenset()
print("Complement: A∪Aᶜ = U and A∩Aᶜ = ∅ ✓")
# De Morgan's
assert U - (A | B) == Ac & Bc # (A∪B)^c = A^c ∩ B^c
assert U - (A & B) == Ac | Bc # (A∩B)^c = A^c ∪ B^c
print("De Morgan's: both laws verified ✓")
# Absorption
assert A | (A & B) == A and A & (A | B) == A
print("Absorption: A∪(A∩B) = A ✓")
De Morgan's Laws
De Morgan's Laws are among the most useful identities in set theory — and they appear directly in Boolean logic, probability, and decision tree conditions.
Reading Law 1: "NOT (A or B)" is the same as "NOT A AND NOT B." If an email is neither spam nor phishing, that's the same as "it's not spam" AND "it's not phishing."
Reading Law 2: "NOT (A and B)" is the same as "NOT A OR NOT B." If a transaction is not (large AND international), then it's either not large OR not international.
U = set(range(1, 11)) # Universal set: 1..10
A = {1, 2, 3, 4, 5}
B = {3, 4, 5, 6, 7}
# De Morgan's Law 1: (A ∪ B)^c = A^c ∩ B^c
left_1 = U - (A | B)
right_1 = (U - A) & (U - B)
print("Law 1 holds:", left_1 == right_1) # True
print("(A ∪ B)^c =", left_1) # {8, 9, 10}
# De Morgan's Law 2: (A ∩ B)^c = A^c ∪ B^c
left_2 = U - (A & B)
right_2 = (U - A) | (U - B)
print("Law 2 holds:", left_2 == right_2) # True
print("(A ∩ B)^c =", left_2) # {1, 2, 6, 7, 8, 9, 10}
Cartesian Products & Feature Spaces
The Cartesian product $A \times B$ creates a new set of all ordered pairs $(a, b)$ where $a \in A$ and $b \in B$:
Mixed feature spaces are also Cartesian products. A dataset with 3 numeric features and 2 binary categorical features lives in $\mathbb{R}^3 \times \{0,1\}^2$.
import itertools
# Cartesian product of two sets
A = {1, 2, 3}
B = {'a', 'b'}
AxB = list(itertools.product(A, B))
print("A × B:", AxB)
print("|A × B| =", len(AxB)) # 6 = 3 × 2
# Feature space example: 2 continuous + 1 binary feature
# A data point from R × R × {0,1}
data_point = (3.14, -0.5, 1) # (income, credit_score, employed)
print("Feature vector:", data_point)
# Cartesian product for hyperparameter grid search
learning_rates = [0.001, 0.01, 0.1]
hidden_sizes = [32, 64, 128]
grid = list(itertools.product(learning_rates, hidden_sizes))
print(f"\nHyperparameter grid ({len(grid)} combinations):")
for lr, hs in grid:
print(f" lr={lr}, hidden_size={hs}")
Partitions & ML Decision Regions
A partition of set $S$ is a collection of non-empty subsets $\{A_1, A_2, \ldots, A_k\}$ such that:
flowchart TD
FS["Feature Space ℝ^d\n(All possible inputs)"]
FS --> R1["Region R₁\n(Class A predictions)"]
FS --> R2["Region R₂\n(Class B predictions)"]
FS --> R3["Region R₃\n(Class C predictions)"]
R1 --> P1["R₁ ∩ R₂ = ∅\nDisjoint"]
R2 --> P1
R2 --> P2["R₁ ∪ R₂ ∪ R₃ = ℝ^d\nEvery point classified"]
R1 --> P2
R3 --> P2
style FS fill:#132440,stroke:#132440,color:#fff
style R1 fill:#e8f4f4,stroke:#3B9797,color:#132440
style R2 fill:#f0f4f8,stroke:#16476A,color:#132440
style R3 fill:#fff5f5,stroke:#BF092F,color:#132440
style P1 fill:#d4edda,stroke:#3B9797,color:#132440
style P2 fill:#d4edda,stroke:#3B9797,color:#132440
A $k$-class classifier implicitly partitions $\mathbb{R}^d$ into $k$ decision regions. The decision boundaries are the borders between these regions. For a linear classifier, each region is a convex polytope; for a neural network, the regions can be arbitrarily shaped.
Jaccard Similarity & Set-Based Distance
The Jaccard similarity measures how much two sets overlap, on a scale from 0 (no overlap) to 1 (identical):
Its complement, the Jaccard distance: $d_J(A, B) = 1 - J(A, B)$.
Jaccard in Practice
- Document similarity: Represent documents as sets of words. Jaccard measures vocabulary overlap.
- Recommendation systems: Represent users as sets of purchased items. Similar users have high Jaccard similarity.
- MinHash: Approximates Jaccard similarity efficiently for large sets — used by Google, Amazon, Netflix at scale.
- IoU in object detection: Intersection over Union is exactly Jaccard — measures overlap between predicted bounding box and ground truth box. Standard metric for YOLO, R-CNN etc.
def jaccard_similarity(A, B):
"""Jaccard similarity between two sets."""
intersection = len(A & B)
union = len(A | B)
if union == 0:
return 1.0 # both empty sets are identical
return intersection / union
def jaccard_distance(A, B):
return 1 - jaccard_similarity(A, B)
# Document similarity example
doc1_words = {"machine", "learning", "neural", "network", "deep"}
doc2_words = {"machine", "learning", "algorithm", "model", "deep"}
doc3_words = {"cooking", "recipe", "pasta", "sauce", "italian"}
print(f"doc1 vs doc2: {jaccard_similarity(doc1_words, doc2_words):.3f}") # ~0.43
print(f"doc1 vs doc3: {jaccard_similarity(doc1_words, doc3_words):.3f}") # 0.0
# Object detection IoU
def iou(box_pred, box_true):
"""
box = (x1, y1, x2, y2) — top-left and bottom-right corners
Returns Intersection over Union (Jaccard for 2D boxes)
"""
xi1 = max(box_pred[0], box_true[0])
yi1 = max(box_pred[1], box_true[1])
xi2 = min(box_pred[2], box_true[2])
yi2 = min(box_pred[3], box_true[3])
inter_area = max(0, xi2 - xi1) * max(0, yi2 - yi1)
pred_area = (box_pred[2]-box_pred[0]) * (box_pred[3]-box_pred[1])
true_area = (box_true[2]-box_true[0]) * (box_true[3]-box_true[1])
union_area = pred_area + true_area - inter_area
return inter_area / union_area if union_area > 0 else 0.0
predicted = (50, 50, 150, 150)
ground_truth = (75, 75, 175, 175)
print(f"\nIoU: {iou(predicted, ground_truth):.3f}") # 0.143
Practice Exercises
Enumerate the Power Set
For $A = \{a, b, c, d\}$: (1) What is $|\mathcal{P}(A)|$? (2) List all subsets of cardinality exactly 2. (3) How many subsets contain element $a$?
Show Answer
(1) $|\mathcal{P}(A)| = 2^4 = 16$. (2) $\binom{4}{2} = 6$ subsets of size 2: $\{a,b\},\{a,c\},\{a,d\},\{b,c\},\{b,d\},\{c,d\}$. (3) Half of all subsets contain any fixed element: $2^4 / 2 = 8$ subsets contain $a$ (for each subset, you either include $a$ or not, and the remaining 3 elements can vary freely: $2^3 = 8$).
Compute Precision, Recall & F1
Ground truth positives: $T = \{1, 3, 5, 7, 9\}$. Predicted positives: $P = \{1, 2, 3, 4, 5\}$. Compute precision, recall, F1-score, and Jaccard similarity between $P$ and $T$.
Show Answer
$|P \cap T| = |\{1,3,5\}| = 3$. Precision $= 3/5 = 0.6$. Recall $= 3/5 = 0.6$. $F1 = 2 \cdot 0.6 \cdot 0.6 / (0.6+0.6) = 0.6$. $|P \cup T| = |\{1,2,3,4,5,7,9\}| = 7$. Jaccard $= 3/7 \approx 0.43$. Note: Jaccard differs from F1 (F1 weights precision and recall equally but is not equivalent to IoU).
Apply De Morgan's in NLP
A content filter blocks a post if it contains "hate" AND "violence". Using De Morgan's Law, rewrite the condition for when a post passes the filter (is NOT blocked).
Show Answer
Blocked $= H \cap V$ where $H$ = {posts with "hate"}, $V$ = {posts with "violence"}. Passes filter $= (H \cap V)^c = H^c \cup V^c$ (De Morgan's Law 2). A post passes if it does NOT contain "hate" OR does NOT contain "violence" — it only needs to be missing at least one of the two terms.
Conclusion & Next Steps
Set theory gives us a precise language for talking about collections, membership, and relationships. The key takeaways:
- Sets are unordered, distinct collections — the atoms of probability and feature spaces
- Power sets grow as $2^{|A|}$ — explaining why exhaustive search over features is impossible
- Operations (union ∪, intersection ∩, complement, difference) have direct parallels in Boolean logic, SQL, and ML evaluation
- De Morgan's Laws let you convert OR conditions to AND conditions and vice versa
- Cartesian products define feature spaces — your data lives in $\mathbb{R}^d$
- Partitions are what classifiers create — disjoint regions covering the entire input space
- Jaccard / IoU quantifies set similarity — fundamental to recommendation and detection tasks
Next in the Series
In Part 3: Combinatorics, we'll build on sets to count: permutations, combinations, the binomial theorem, and the exponential growth of search spaces — giving you the tools to reason about hyperparameter grids, model configurations, and probabilistic counting arguments.