Part 1: Mathematical Thinking

Complete Math + Probability + Statistics for ML/AI/DS Bootcamp

Your 12-step learning path • Currently on Step 1

Mathematical Thinking

Mindset, notation & functions

You Are Here

Why Mathematical Thinking?

Imagine two engineers. Both copy a neural network from a tutorial. It runs. Then the model starts predicting garbage. Engineer A opens Stack Overflow and tweaks random hyperparameters hoping something improves. Engineer B reads the loss curve, recalls that the sigmoid activation is causing vanishing gradients for deep networks, and swaps in ReLU. The model converges in minutes.

The difference isn't raw intelligence. It's mathematical literacy.

                            Three Levels of ML Understanding:
                            Level 1 — Black Box User: "I run the library and get predictions." No understanding of what happens inside.
Level 2 — Configurator: "I know which hyperparameters to tune and why." Some intuition, limited by library docs.
Level 3 — Builder: "I can derive the algorithm from first principles and modify it." Full mathematical fluency.

                        

This bootcamp takes you from Level 1 to Level 3 — systematically. But before any formula, we need to talk about how mathematicians think. The mindset comes before the mechanics.

How Math Powers ML

Every concept in ML is underpinned by a branch of mathematics:

Mathematics Behind ML/AI

flowchart LR
    A[Your Data] --> B[Linear Algebra\nVectors & Matrices]
    A --> C[Probability\nUncertainty & Distributions]
    B --> D[Model\nFunction Composition]
    C --> D
    D --> E[Calculus\nGradient Descent]
    E --> F[Trained Model\nPredictions]
    G[Statistics\nEvaluation & Inference] --> F
    H[Information Theory\nLoss Functions] --> E

None of these areas exist in isolation — they connect and reinforce each other. This series follows a deliberate order: each phase builds the vocabulary needed for the next. We start here, at Phase 0, with the prerequisite that makes everything else learnable: mathematical thinking itself.

The Mathematical Mindset

0.1.1 — Abstraction & Modeling

Abstraction is the act of stripping away irrelevant details to reveal underlying structure. It's not dumbing things down — it's identifying what actually matters for a given question.

                            Think of it like this: You're trying to predict house prices. A house has a front door, a paint colour, noisy neighbours, memories of the previous owner, and 1,400 square feet. Only a few of those matter for price. Abstraction means you decide: size, location, age, number of rooms, proximity to schools. Everything else is noise you deliberately ignore.
                        

The mathematical model that emerges is:

\text{price} = f(\text{size},\; \text{location},\; \text{age},\; \text{rooms}) + \varepsilon

where $\varepsilon$ captures the noise we've chosen to ignore. That $\varepsilon$ is not laziness — it's an intentional modelling decision. All of machine learning is about building and refining such abstractions.

Real-World Example Spam Filter

Spam Classification as Abstraction

An email has a subject, body, sender, timestamp, HTML formatting, attached images, and metadata. A spam filter abstracts this to a vector of word frequencies — a bag of words. The model then learns: emails containing "FREE!!!", "CLICK HERE", and "WINNER" are spam. By abstracting text into numbers, we made an unstructured problem solvable with algebra.

The abstraction discards meaning, grammar, and context. That's a deliberate tradeoff: we lose nuance but gain tractability. More sophisticated models (like Transformers) preserve more context — they use a richer abstraction, not a different principle.

Key insight: When you see a mathematical formula, ask yourself: what real-world phenomenon is this abstracting, and what is it intentionally ignoring? That question will unlock 90% of what a formula is trying to say.

0.1.2 — Notation Fluency

Mathematical notation is a compressed language. Like any language, it feels intimidating until you build vocabulary. The secret is that most ML papers use a small, consistent set of symbols repeatedly.

Core ML Notation Reference:

Symbol	Name	Meaning in ML	Example
$\sum_{i=1}^{n}$	Summation	Sum over $n$ items	$\sum_{i=1}^{n} x_i = x_1 + x_2 + \cdots + x_n$
$\prod_{i=1}^{n}$	Product	Multiply $n$ items	$\prod_{i=1}^{n} p_i = p_1 \cdot p_2 \cdots p_n$
$\forall$	For all	Applies to every element	$\forall x \in \mathbb{R}$: for every real number $x$
$\exists$	There exists	At least one element satisfies	$\exists\, w$ such that $f(w) = 0$
$\in$	Element of	Belongs to a set	$x_i \in \mathbb{R}^d$: $x_i$ is a $d$-dimensional vector
$\mathbb{R}^n$	Real $n$-space	Feature vector space	$\mathbf{x} \in \mathbb{R}^{784}$ (28×28 image flattened)
$\hat{y}$	"y hat"	Predicted value	$\hat{y} = \mathbf{w}^\top \mathbf{x} + b$
$\\|\cdot\\|$	Norm	Size/length of vector	$\\|\mathbf{w}\\|^2 = \sum_i w_i^2$
$\nabla$	Nabla / Gradient	Direction of steepest ascent	$\nabla_\theta \mathcal{L}$: gradient of loss w.r.t. $\theta$
$\mathbb{E}[\cdot]$	Expectation	Average over distribution	$\mathbb{E}[X] = \sum_x x \cdot P(X=x)$

Greek letters you'll see constantly: $\alpha$ (learning rate), $\beta$ (regression coefficients), $\theta$ (model parameters), $\mu$ (mean), $\sigma$ (standard deviation), $\lambda$ (regularisation strength), $\varepsilon$ (small noise / error), $\eta$ (step size).

                            How to get comfortable with notation: Don't try to memorise. Instead, every time you see an unfamiliar symbol, pause and decode it. Say it out loud: "for all $i$ from 1 to $n$, sum up $x_i$". Within a few weeks, notation fluency becomes automatic — exactly like learning a new spoken language.
                        

import numpy as np

# Notation in code: the Sigma (summation) symbol
# Mathematical: sum_{i=1}^{n} x_i
x = np.array([3, 7, 2, 9, 1])
total = np.sum(x)           # Σ x_i
print("Sum:", total)        # 22

# The Product symbol: prod_{i=1}^{n} x_i
product = np.prod(x)
print("Product:", product)  # 378

# Norm ||w||^2 = sum_i w_i^2
w = np.array([0.3, -0.7, 0.1])
norm_squared = np.sum(w ** 2)
print("||w||²:", round(norm_squared, 4))  # 0.59

0.1.3 — Proof vs Intuition

Pure mathematics demands proof: a rigorous logical argument showing a statement is always true. Machine learning is more pragmatic — you'll use both proof and intuition, knowing when each is appropriate.

                            Think of it like this: Proof is the map. Intuition is the felt sense of the terrain. A good navigator uses both — the map to plan a route, the terrain sense to adapt when the map is incomplete. In ML, you'll often trust the intuition of a researcher who tested many approaches, then verify with proof (or experiments) once you have a hypothesis.
                        

Example — Why does gradient descent converge?

Intuition: Imagine a ball rolling down a hilly surface. At every step, it moves in the direction of steepest descent. Eventually it reaches a valley (minimum). The ball "knows" where to go locally without needing to see the whole landscape.
Proof: For a convex function $f$ with Lipschitz-continuous gradient, with step size $\alpha \leq \frac{1}{L}$ (where $L$ is the Lipschitz constant), gradient descent converges at rate $O(1/k)$ after $k$ iterations.

The proof tells you when it works and how fast. The intuition tells you why it makes sense. Non-convex neural network loss landscapes complicate the proof — but the intuition still guides practice.

For this bootcamp: We'll develop both. Some sections are proof-heavy (set theory, probability axioms), others are intuition-led (gradient descent, neural networks). You'll learn to recognise which is appropriate for each context.

0.1.4 — Approximation vs Exactness

Here's a fact that surprises many beginners: virtually nothing in machine learning is exact. And that's completely intentional.

Key Concept Floating Point

Why Computers Can't Be Exact

The number $\frac{1}{3} = 0.33333...$ has infinitely many decimal places. Computers store numbers in 32-bit or 64-bit floating point format, which means every irrational number (like $\pi$, $e$, $\sqrt{2}$) is truncated. This creates floating point error. In most ML applications, this error is negligible — but in certain operations (like subtracting two nearly-equal numbers), it can amplify catastrophically.

import numpy as np

# Floating point arithmetic — not always exact
a = 0.1 + 0.2
print(a)                         # 0.30000000000000004 (not 0.3!)
print(a == 0.3)                  # False

# Use np.isclose() for safe comparisons
print(np.isclose(a, 0.3))        # True

# Catastrophic cancellation — subtracting nearby numbers
x = 1.0000001
y = 1.0000000
result = x - y
print(result)                    # ~1e-7, but accumulated errors can dominate

# ML implication: always use log-sum-exp trick for numerical stability
# Instead of log(exp(a) + exp(b)):
a_val, b_val = 1000.0, 999.0
# Naive (overflows):
# np.log(np.exp(a_val) + np.exp(b_val))  # inf!
# Stable version:
m = max(a_val, b_val)
stable = m + np.log(np.exp(a_val - m) + np.exp(b_val - m))
print("log-sum-exp:", stable)    # 1000.3132...

In ML, we accept approximations willingly — stochastic gradient descent uses a random subset of data each step (an approximation of the full gradient). Monte Carlo methods estimate integrals by sampling. Variational inference approximates intractable posterior distributions. The art is knowing how much error is acceptable and where exactness genuinely matters.

                            When exactness matters in ML: Numerical stability in loss computation (use log_softmax, not log(softmax(x))); matrix inversion (prefer pseudoinverse for ill-conditioned matrices); probability calculations (work in log-space to prevent underflow with very small values).
                        

Functions & Graphs

A function is a rule that takes an input and produces a unique output. Every ML model is, fundamentally, a function: it takes in features and outputs a prediction. Before building complex models, you need deep familiarity with the building blocks.

f: X \to Y \quad \text{means "} f \text{ maps elements of } X \text{ to elements of } Y \text{"}

For example, a classifier maps from feature space to label space: $f: \mathbb{R}^d \to \{0, 1\}$. A regressor maps to real numbers: $f: \mathbb{R}^d \to \mathbb{R}$.

0.2.1 — Linear Functions

The simplest and most important class. A linear function of one variable:

$$f(x) = mx + b$$

where $m$ is the slope (rate of change) and $b$ is the y-intercept (value when $x=0$). The slope tells you: "for every unit increase in $x$, $f(x)$ increases by $m$."

In multiple dimensions (the foundation of linear regression):

f(\mathbf{x}) = w_1 x_1 + w_2 x_2 + \cdots + w_n x_n + b = \mathbf{w}^\top \mathbf{x} + b

where $\mathbf{w} = [w_1, w_2, \ldots, w_n]^\top$ is the weight vector and $b$ is the bias. Every linear regression model, every neuron in a neural network before activation, is exactly this formula.

                            Geometric interpretation: In 2D, a linear function draws a straight line. In 3D, it draws a plane. In $n$ dimensions, it draws a hyperplane — the decision boundary of a linear classifier. When you hear "linear model", visualise a straight hyperplane cutting through your feature space.
                        

import numpy as np
import matplotlib.pyplot as plt

# Linear function: f(x) = 2x + 1
x = np.linspace(-5, 5, 100)
f = 2 * x + 1  # slope=2, intercept=1

plt.figure(figsize=(7, 4))
plt.plot(x, f, color='#3B9797', linewidth=2.5, label='f(x) = 2x + 1')
plt.axhline(0, color='gray', linewidth=0.5)
plt.axvline(0, color='gray', linewidth=0.5)
plt.scatter([0], [1], color='#BF092F', s=80, zorder=5, label='y-intercept (0, 1)')
plt.xlabel('x')
plt.ylabel('f(x)')
plt.title('Linear Function: f(x) = mx + b')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

0.2.2 — Polynomial Functions

A polynomial extends the linear idea to higher powers:

f(x) = a_n x^n + a_{n-1} x^{n-1} + \cdots + a_1 x + a_0

The degree $n$ determines the shape and complexity:

Degree 1 (linear): straight line — $f(x) = 2x + 1$
Degree 2 (quadratic): parabola — $f(x) = x^2 - 4$, opens up or down
Degree 3 (cubic): S-curve — one inflection point, can model more complex relationships
Degree $n$: up to $n-1$ turning points

                            ML connection — Polynomial Feature Engineering: Linear models struggle with curved relationships. The fix is to add polynomial features. If $x$ is house size, add $x^2$ and $x^3$ as new features. Now your linear model in the new feature space becomes a polynomial model in the original space — more flexible, still analytically tractable.
                        

import numpy as np
from sklearn.preprocessing import PolynomialFeatures

# Original feature: one variable x
x = np.array([[1], [2], [3], [4], [5]])

# Add polynomial features up to degree 3
poly = PolynomialFeatures(degree=3, include_bias=False)
x_poly = poly.fit_transform(x)

# Now each sample has features: [x, x^2, x^3]
print("Original features:\n", x.T)
print("Polynomial features:\n", x_poly.T)
# Output row 0: [1, 2, 3, 4, 5]    <- x
# Output row 1: [1, 4, 9, 16, 25]  <- x^2
# Output row 2: [1, 8, 27, 64, 125] <- x^3

                            Danger of high-degree polynomials: A degree-100 polynomial can pass through every training point perfectly — but will behave wildly between points. This is overfitting: memorising training data instead of learning the underlying pattern. Understanding polynomial behavior is prerequisite to understanding overfitting.
                        

0.2.3 — Exponential & Logarithmic Functions

The exponential function is everywhere in ML:

f(x) = e^x \quad \text{where } e \approx 2.71828\ldots \text{ (Euler's number)}

Its defining property: the derivative of $e^x$ is $e^x$ itself. It's the only function that equals its own rate of change. This makes it uniquely tractable in calculus-heavy ML derivations.

The natural logarithm $\ln(x) = \log_e(x)$ is the inverse:

\ln(e^x) = x \qquad e^{\ln(x)} = x

Where you'll see these in ML:

Sigmoid activation: $\sigma(x) = \dfrac{1}{1 + e^{-x}}$ — squashes any value into $(0, 1)$
Softmax: $\text{softmax}(x_i) = \dfrac{e^{x_i}}{\sum_j e^{x_j}}$ — turns logits into a probability distribution
Cross-entropy loss: $\mathcal{L} = -\sum_i y_i \ln(\hat{p}_i)$ — penalises wrong probability assignments harshly
Log-likelihood: products of small probabilities become sums of logs — numerically stable and analytically convenient

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(-3, 3, 200)

# Three core exponential/log functions in ML
sigmoid = 1 / (1 + np.exp(-x))
exp_x   = np.exp(x)

x_pos = np.linspace(0.01, 3, 200)
log_x = np.log(x_pos)

fig, axes = plt.subplots(1, 3, figsize=(13, 4))

axes[0].plot(x, sigmoid, color='#3B9797', linewidth=2.5)
axes[0].set_title('Sigmoid: 1 / (1 + e^{-x})')
axes[0].axhline(0.5, linestyle='--', color='gray', alpha=0.5)
axes[0].set_xlabel('x'); axes[0].set_ylabel('σ(x)')

axes[1].plot(x, exp_x, color='#16476A', linewidth=2.5)
axes[1].set_title('Exponential: e^x')
axes[1].set_xlabel('x'); axes[1].set_ylabel('e^x')

axes[2].plot(x_pos, log_x, color='#BF092F', linewidth=2.5)
axes[2].set_title('Natural Log: ln(x)')
axes[2].set_xlabel('x'); axes[2].set_ylabel('ln(x)')

for ax in axes:
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

0.2.4 — Function Transformations

Every activation function is a transformed version of a simpler function. Understanding transformations means you can decode any function visually:

Transformation	Formula	Effect on Graph	ML Example
Vertical shift up by $c$	$f(x) + c$	Move graph $c$ units up	Adding bias $b$: $\mathbf{w}^\top\mathbf{x} + b$
Horizontal shift right by $c$	$f(x - c)$	Move graph $c$ units right	Shifted activation threshold
Vertical scale by $a$	$a \cdot f(x)$	Stretch ($\|a\|>1$) or compress ($\|a\|<1$) vertically	Learning rate scaling gradient
Reflection about $x$-axis	$-f(x)$	Flip upside down	Negating loss for maximisation
Horizontal scale by $\frac{1}{a}$	$f(ax)$	Compress horizontally by $a$	Temperature in softmax: $e^{x/T}$

                            Softmax temperature — a transformation in action: The softmax function can be parameterised with temperature $T$: $\text{softmax}(x_i/T)$. At $T=1$: standard softmax. At $T \to 0$: output becomes a hard argmax (one class gets probability 1). At $T \to \infty$: output becomes uniform (maximum uncertainty). This is a horizontal scaling transformation with profound practical implications for language model sampling.
                        

0.2.5 — Composition of Functions

Composition means applying one function to the output of another:

(f \circ g)(x) = f(g(x))

Read it right to left: first apply $g$, then apply $f$ to the result. Example:

g(x) = 2x + 1, \quad f(u) = u^2 \implies (f \circ g)(x) = (2x+1)^2

Neural networks ARE composition of functions. Every layer applies a linear transformation then an activation:
                            $$h_1 = \sigma(W_1 \mathbf{x} + b_1), \quad h_2 = \sigma(W_2 h_1 + b_2), \quad \hat{y} = \text{softmax}(W_3 h_2 + b_3)$$

                            Expanding: $\hat{y} = \text{softmax}(W_3 \cdot \sigma(W_2 \cdot \sigma(W_1 \mathbf{x} + b_1) + b_2) + b_3)$. This is a deeply nested composition. Understanding composition is essential for understanding why the chain rule (the engine of backpropagation) works the way it does.
                        

Neural Network as Nested Function Composition

flowchart LR
    X["x ∈ ℝ^d\n(Input)"] --> L1["Layer 1\nh₁ = σ(W₁x + b₁)"]
    L1 --> L2["Layer 2\nh₂ = σ(W₂h₁ + b₂)"]
    L2 --> OUT["Output\nŷ = softmax(W₃h₂ + b₃)"]
    OUT --> LOSS["Loss ℒ(ŷ, y)"]
    LOSS -->|"Backprop:\nchain rule"| L2
    LOSS -->|"chain rule"| L1

During training, we need $\frac{\partial \mathcal{L}}{\partial W_1}$ — how much does the loss change with respect to the first layer's weights? The chain rule for compositions gives us exactly that, layer by layer. We'll derive this fully in Part 8 (Calculus). For now, appreciate that knowing how composition works is what makes that derivation possible.

import numpy as np

# Composition of functions — the neural network forward pass
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def softmax(z):
    e = np.exp(z - np.max(z))  # subtract max for numerical stability
    return e / e.sum()

# Simulate a 3-layer network: input -> hidden1 -> hidden2 -> output
np.random.seed(42)
x  = np.random.randn(4)        # 4 input features

W1 = np.random.randn(6, 4) * 0.1   # 6 hidden units, 4 inputs
b1 = np.zeros(6)

W2 = np.random.randn(6, 6) * 0.1   # 6 -> 6
b2 = np.zeros(6)

W3 = np.random.randn(3, 6) * 0.1   # 6 -> 3 output classes
b3 = np.zeros(3)

# Forward pass: composition f3(f2(f1(x)))
h1 = sigmoid(W1 @ x + b1)          # g1(x)
h2 = sigmoid(W2 @ h1 + b2)         # g2(g1(x))
y_hat = softmax(W3 @ h2 + b3)      # g3(g2(g1(x)))

print("Input:        ", x.round(3))
print("Hidden layer 1:", h1.round(3))
print("Hidden layer 2:", h2.round(3))
print("Output probs: ", y_hat.round(4))
print("Predicted class:", np.argmax(y_hat))

Interactive Function Explorer

The chart below visualises the four core function families on the same axes. Use it to develop geometric intuition — notice how their rates of change differ.

Practice Exercises

Exercise 1 Notation

Decode These Expressions

Translate each mathematical expression into plain English, then compute the value:

$\displaystyle\sum_{i=1}^{4} i^2$ (hint: $1^2 + 2^2 + \ldots$)
$\displaystyle\prod_{k=1}^{3} (2k - 1)$ (hint: $1 \times 3 \times 5$)
$\mathbf{w}^\top \mathbf{x}$ where $\mathbf{w} = [2, -1, 3]^\top$ and $\mathbf{x} = [1, 4, 2]^\top$

Show Answers

$1 + 4 + 9 + 16 = 30$
$(2\cdot1-1)(2\cdot2-1)(2\cdot3-1) = 1 \cdot 3 \cdot 5 = 15$
$2(1) + (-1)(4) + 3(2) = 2 - 4 + 6 = 4$

Exercise 2 Functions

Identify the Function Family

For each ML formula below, identify which function family it belongs to (linear, polynomial, exponential, or composition), and explain why:

Sigmoid: $\sigma(z) = \dfrac{1}{1 + e^{-z}}$
Linear regression prediction: $\hat{y} = \mathbf{w}^\top \mathbf{x} + b$
ReLU activation: $\text{ReLU}(x) = \max(0, x)$
A two-layer neural net: $\hat{y} = \sigma(W_2 \sigma(W_1 x + b_1) + b_2)$

Show Answers

Composition of exponential and linear: First compute $-z$ (linear), then $e^{-z}$ (exponential), then $1/(1 + \cdot)$ (rational/transformation)
Linear: It's $\mathbf{w}^\top \mathbf{x} + b$ — a dot product (sum of products) plus a constant. Exactly $f(x) = mx + b$ in multiple dimensions.
Piecewise linear: Two linear pieces joined at $x=0$. Not polynomial — it has a non-smooth "kink".
Nested composition: Apply linear, then sigmoid, then linear again, then sigmoid — $(f_4 \circ f_3 \circ f_2 \circ f_1)(x)$.

Exercise 3 Coding

Implement from Scratch

Without using any ML library, implement a linear function and evaluate it on a dataset. Then plot it against the data points.

import numpy as np
import matplotlib.pyplot as plt

# TODO: complete the linear function
def linear(x, w, b):
    # Your implementation here
    pass

# Generate toy data: y = 3x - 2 + noise
np.random.seed(0)
x_data = np.linspace(-3, 3, 30)
y_data = 3 * x_data - 2 + np.random.randn(30) * 0.8

# Evaluate your function with w=3, b=-2
# y_pred = linear(x_data, w=3, b=-2)

# Plot: scatter data, line prediction
# plt.scatter(x_data, y_data, ...)
# plt.plot(x_data, y_pred, ...)
# plt.show()

Show Solution

import numpy as np
import matplotlib.pyplot as plt

def linear(x, w, b):
    return w * x + b   # f(x) = wx + b

np.random.seed(0)
x_data = np.linspace(-3, 3, 30)
y_data = 3 * x_data - 2 + np.random.randn(30) * 0.8

y_pred = linear(x_data, w=3, b=-2)

plt.figure(figsize=(7, 4))
plt.scatter(x_data, y_data, color='#3B9797', label='Data', alpha=0.7)
plt.plot(x_data, y_pred, color='#BF092F', linewidth=2, label='f(x) = 3x - 2')
plt.xlabel('x'); plt.ylabel('y')
plt.legend(); plt.grid(True, alpha=0.3)
plt.title('Linear Function vs Data')
plt.tight_layout(); plt.show()

Conclusion & Next Steps

In this first part of the bootcamp, we've laid the psychological and conceptual groundwork for everything that follows:

Abstraction & Modeling — every ML model is an intentional simplification of reality; the art is choosing what to keep
Notation Fluency — $\Sigma$, $\Pi$, $\nabla$, $\mathbb{E}[\cdot]$, Greek letters — these are vocabulary, not barriers
Proof vs Intuition — use intuition to understand, use proof to validate; both are necessary
Approximation — nearly everything in ML is approximate by design; know when exactness matters (log-space, numerical stability)
Functions — linear, polynomial, exponential, and compositions are the atoms of all ML models

Next in the Series

In Part 2: Set Theory & Foundations, we'll build the rigorous language for talking about collections of objects — sets, subsets, power sets, operations (union, intersection, complement), De Morgan's Laws, and their direct connections to feature spaces and probability event spaces in ML.

Cookie Consent

Table of Contents

Complete Math + Probability + Statistics for ML/AI/DS Bootcamp

Mathematical Thinking

Set Theory & Foundations

Combinatorics

Probability Fundamentals

Statistics

Information Theory

Linear Algebra

Calculus & Optimization

ML-Specific Math

Computational Math with Python

Advanced Topics

Projects & Applications

Why Mathematical Thinking?

How Math Powers ML

The Mathematical Mindset

0.1.1 — Abstraction & Modeling

Spam Classification as Abstraction

0.1.2 — Notation Fluency

0.1.3 — Proof vs Intuition

0.1.4 — Approximation vs Exactness

Why Computers Can't Be Exact

Functions & Graphs

0.2.1 — Linear Functions

0.2.2 — Polynomial Functions

0.2.3 — Exponential & Logarithmic Functions

0.2.4 — Function Transformations

0.2.5 — Composition of Functions

Interactive Function Explorer

Practice Exercises

Decode These Expressions

Identify the Function Family

Implement from Scratch

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 1: Mathematical Thinking

Table of Contents

Complete Math + Probability + Statistics for ML/AI/DS Bootcamp

Mathematical Thinking

Set Theory & Foundations

Combinatorics

Probability Fundamentals

Statistics

Information Theory

Linear Algebra

Calculus & Optimization

ML-Specific Math

Computational Math with Python

Advanced Topics

Projects & Applications

Why Mathematical Thinking?

How Math Powers ML

The Mathematical Mindset

0.1.1 — Abstraction & Modeling

Spam Classification as Abstraction

0.1.2 — Notation Fluency

0.1.3 — Proof vs Intuition

0.1.4 — Approximation vs Exactness

Why Computers Can't Be Exact

Functions & Graphs

0.2.1 — Linear Functions

0.2.2 — Polynomial Functions

0.2.3 — Exponential & Logarithmic Functions

0.2.4 — Function Transformations

0.2.5 — Composition of Functions

Interactive Function Explorer

Practice Exercises

Decode These Expressions

Identify the Function Family

Implement from Scratch

Conclusion & Next Steps

Next in the Series

Continue the Series

Part 2: Set Theory & Foundations

Part 3: Combinatorics

Part 4: Probability Fundamentals