Back to Mathematics

Part 10: Computational Math with Python

April 26, 2026 Wasil Zafar 18 min read

Theory only takes you so far. This part is the hands-on counterpart to the rest of the series — using Python's scientific stack (NumPy, SciPy, SymPy) to compute, simulate, and verify the mathematical concepts we've built up, from numerical integration to symbolic differentiation and Monte Carlo methods.

Table of Contents

  1. NumPy Foundation
  2. Broadcasting
  3. np.linalg
  4. SciPy: Optimization & Stats
  5. SymPy: Symbolic Math
  6. Monte Carlo Methods
  7. Numerical Integration
  8. Vectorization vs Loops
  9. Practice Exercises
  10. Conclusion & Next Steps

NumPy Foundation

NumPy provides contiguous-memory arrays and BLAS-backed math operations, making it 10–1000× faster than pure Python loops for numerical work.

import numpy as np

# Create arrays from Python lists
a = np.array([1, 2, 3, 4, 5], dtype=float)
b = np.array([5, 4, 3, 2, 1], dtype=float)

# Element-wise operations
print("a + b =", a + b)
print("a * b =", a * b)
print("a ** 2 =", a ** 2)
print("dot product:", np.dot(a, b))  # 1*5 + 2*4 + 3*3 + 4*2 + 5*1 = 35

# Aggregation
print("\nStatistics on a:")
print(f"  mean={a.mean():.2f}, std={a.std():.2f}, min={a.min()}, max={a.max()}")
print(f"  sum={a.sum()}, cumsum={np.cumsum(a)}")

# 2-D arrays (matrices)
M = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]], dtype=float)

print("\nMatrix M:")
print(M)
print(f"Shape: {M.shape}, Rank (actual): {np.linalg.matrix_rank(M)}")
print(f"Trace (sum of diagonal): {np.trace(M):.2f}")
print(f"Column sums: {M.sum(axis=0)}")
print(f"Row sums:    {M.sum(axis=1)}")

Broadcasting

Broadcasting lets NumPy perform operations on arrays of different (but compatible) shapes without explicit copying. The rule: dimensions are aligned from the right; a size-1 dimension is stretched to match the other.

import numpy as np

# Feature normalization via broadcasting (common ML preprocessing)
X = np.array([[1.0, 200.0,  0.5],
              [2.0, 150.0,  1.5],
              [3.0, 300.0,  0.8],
              [4.0, 250.0,  2.1]])  # (4, 3) matrix: 4 samples, 3 features

mu  = X.mean(axis=0)   # (3,) — mean per feature
std = X.std(axis=0)    # (3,) — std per feature
X_norm = (X - mu) / (std + 1e-8)   # broadcast: (4,3) - (3,) → (4,3)

print("Feature means:", np.round(mu, 2))
print("Feature stds: ", np.round(std, 2))
print("\nNormalized X (should have ~zero mean, ~unit variance per column):")
print(np.round(X_norm, 3))
print("\nVerification — column means:", np.round(X_norm.mean(axis=0), 6))
print("Verification — column stds: ", np.round(X_norm.std(axis=0),  4))

# Broadcasting with outer product (no loops needed)
u = np.array([1, 2, 3])
v = np.array([10, 20])
outer = u[:, np.newaxis] * v[np.newaxis, :]   # (3,1) * (1,2) → (3,2)
print("\nOuter product u ⊗ v:")
print(outer)
print("Expected:", np.outer(u, v))

np.linalg — Linear Algebra Operations

NumPy's linear algebra submodule wraps LAPACK/BLAS for high-performance matrix operations:

import numpy as np

# Construct a well-conditioned symmetric positive definite matrix
rng = np.random.default_rng(42)
A_raw = rng.standard_normal((4, 4))
A = A_raw @ A_raw.T + 4 * np.eye(4)  # guaranteed SPD

print("Matrix A (SPD):")
print(np.round(A, 3))

# Core operations
det   = np.linalg.det(A)
inv_A = np.linalg.inv(A)
eigvals, eigvecs = np.linalg.eigh(A)   # eigh for symmetric matrices
U, S, Vt = np.linalg.svd(A)

print(f"\nDeterminant:        {det:.4f}")
print(f"Eigenvalues:        {np.round(eigvals, 4)}")
print(f"Singular values:    {np.round(S, 4)}")
print(f"Condition number:   {S.max() / S.min():.4f}  (S_max / S_min)")

# Solve linear system Ax = b (preferred over inverting A — numerically stable)
b = np.array([1.0, 2.0, 3.0, 4.0])
x = np.linalg.solve(A, b)
print(f"\nSolution x to Ax=b: {np.round(x, 4)}")
print(f"Verification Ax:     {np.round(A @ x, 4)}")
print(f"Residual norm:       {np.linalg.norm(A @ x - b):.2e}")

# Norm varieties
v = np.array([3.0, 4.0])
print(f"\nL1 norm: {np.linalg.norm(v, 1):.2f}, L2 norm: {np.linalg.norm(v, 2):.2f}, L∞ norm: {np.linalg.norm(v, np.inf):.2f}")

SciPy: Optimization & Statistics

SciPy extends NumPy with optimization algorithms, probability distributions, and numerical methods used throughout scientific computing and ML.

import numpy as np
from scipy import optimize, stats

# --- Optimization: minimize a function ---
def rosenbrock(x):
    """Classic optimization test function: banana-shaped valley."""
    return (1 - x[0])**2 + 100 * (x[1] - x[0]**2)**2

x0 = np.array([0.0, 0.0])    # initial guess

result_nelder = optimize.minimize(rosenbrock, x0, method='Nelder-Mead')
result_bfgs   = optimize.minimize(rosenbrock, x0, method='BFGS')

print("Rosenbrock minimization (true minimum at [1, 1]):")
print(f"  Nelder-Mead: x={np.round(result_nelder.x, 6)}, f={result_nelder.fun:.2e}, iters={result_nelder.nit}")
print(f"  BFGS:        x={np.round(result_bfgs.x, 6)}, f={result_bfgs.fun:.2e}, iters={result_bfgs.nit}")

# --- Scalar root finding ---
def f_scalar(x):
    return x**3 - x - 2

root = optimize.brentq(f_scalar, 1, 2)
print(f"\nRoot of x³ - x - 2 = 0 on [1,2]: x ≈ {root:.6f}")
print(f"Verification f({root:.4f}) = {f_scalar(root):.2e}")

# --- Statistical testing ---
np.random.seed(42)
sample_a = np.random.normal(loc=5.0, scale=1.0, size=30)
sample_b = np.random.normal(loc=5.5, scale=1.0, size=30)

t_stat, p_value = stats.ttest_ind(sample_a, sample_b)
print(f"\nTwo-sample t-test:")
print(f"  Mean A={sample_a.mean():.3f}, Mean B={sample_b.mean():.3f}")
print(f"  t-statistic={t_stat:.3f}, p-value={p_value:.4f}")
print(f"  Conclusion: {'Reject H₀ (significant difference)' if p_value < 0.05 else 'Fail to reject H₀'} at α=0.05")

SymPy: Symbolic Mathematics

SymPy performs exact symbolic computation — computing derivatives, integrals, and solving equations in closed form. Essential for verifying gradient derivations and exploring mathematical relationships.

from sympy import symbols, diff, integrate, solve, simplify, exp, log, pi, sqrt, latex, sin, cos

# Define symbolic variables
x, mu, sigma, lam = symbols('x mu sigma lambda', real=True, positive=True)

# --- Differentiate: sigmoid function ---
sigmoid_expr = 1 / (1 + exp(-x))
sigmoid_grad = diff(sigmoid_expr, x)
print("σ(x) =", sigmoid_expr)
print("σ'(x) =", simplify(sigmoid_grad))
# Should simplify to σ(x)(1 - σ(x))

# --- Chain rule: compose functions ---
u_expr = x**2 + 1
composite = sin(u_expr)
print("\nd/dx sin(x² + 1) =", diff(composite, x))  # cos(x²+1) * 2x

# --- Integrate: Gaussian normalizer ---
gaussian_unnorm = exp(-x**2 / (2 * sigma**2))
gaussian_integral = integrate(gaussian_unnorm, (x, -pi * 100, pi * 100))  # use large bounds
# Exact result: sigma * sqrt(2*pi)
from sympy import oo
exact_integral = integrate(exp(-x**2 / (2 * sigma**2)), (x, -oo, oo))
print("\n∫ exp(-x²/2σ²) dx from -∞ to ∞ =", simplify(exact_integral))
# Output: sqrt(2)*sqrt(pi)*sigma = σ√(2π) — the Gaussian normalizer

# --- Solve: critical points ---
f_poly = x**4 - 4*x**3 + 6*x**2 - 4*x + 1
f_prime = diff(f_poly, x)
critical_points = solve(f_prime, x)
print(f"\nf(x) = (x-1)⁴  →  f'(x) = {f_prime}")
print(f"Critical points: {critical_points}")

# --- LaTeX output for typesetting ---
expr = x**2 / (1 + sqrt(x))
print(f"\nLaTeX for x²/(1+√x): {latex(expr)}")

Monte Carlo Methods

Monte Carlo methods approximate deterministic quantities using random sampling. They shine when analytical solutions are intractable or integration is high-dimensional.

import numpy as np

# --- Monte Carlo π estimation ---
np.random.seed(42)
n_samples = 1_000_000

x = np.random.uniform(-1, 1, n_samples)
y = np.random.uniform(-1, 1, n_samples)

inside_circle = (x**2 + y**2) <= 1.0
pi_estimate = 4 * inside_circle.mean()
print(f"Monte Carlo π estimate (n={n_samples:,}): {pi_estimate:.6f}")
print(f"True π:                                  {np.pi:.6f}")
print(f"Error: {abs(pi_estimate - np.pi):.6f}")

# Standard error of the estimator (σ/√n)
p_hat = inside_circle.mean()
se = np.sqrt(p_hat * (1 - p_hat) / n_samples)
print(f"95% CI: ({4*(p_hat - 1.96*se):.5f}, {4*(p_hat + 1.96*se):.5f})")

# --- Monte Carlo integration: E[X²] for X ~ Uniform(0,1) ---
# True value: ∫₀¹ x² dx = 1/3 ≈ 0.3333
np.random.seed(0)
samples = np.random.uniform(0, 1, 100_000)
mc_integral = (samples**2).mean()
print(f"\nMC estimate of ∫₀¹ x² dx = {mc_integral:.6f}  (true: {1/3:.6f})")

# --- Bootstrap confidence interval for the median ---
np.random.seed(1)
data = np.random.lognormal(mean=0, sigma=0.5, size=50)
true_median = np.median(data)

boot_medians = [np.median(np.random.choice(data, size=len(data), replace=True))
                for _ in range(5000)]
boot_ci = np.percentile(boot_medians, [2.5, 97.5])
print(f"\nBootstrap median CI (n=50, B=5000 resamples):")
print(f"  Sample median: {true_median:.4f}")
print(f"  95% Bootstrap CI: [{boot_ci[0]:.4f}, {boot_ci[1]:.4f}]")

Numerical Integration

When analytical antiderivatives are unavailable, we approximate integrals numerically. The accuracy improves with smaller step sizes.

import numpy as np
from scipy import integrate as sci_integrate

def f(x):
    """Integrate sin(x²) from 0 to √π — no closed form."""
    return np.sin(x**2)

a, b = 0.0, np.sqrt(np.pi)

# Trapezoidal rule: O(h²) error
def trapezoidal(func, a, b, n):
    x = np.linspace(a, b, n + 1)
    y = func(x)
    h = (b - a) / n
    return h * (0.5 * y[0] + np.sum(y[1:-1]) + 0.5 * y[-1])

# Simpson's 1/3 rule: O(h⁴) error (requires even n)
def simpsons(func, a, b, n):
    if n % 2 != 0:
        n += 1
    x = np.linspace(a, b, n + 1)
    y = func(x)
    h = (b - a) / n
    return h / 3 * (y[0] + 4*np.sum(y[1:-1:2]) + 2*np.sum(y[2:-2:2]) + y[-1])

# SciPy adaptive quadrature (high accuracy reference)
scipy_result, scipy_error = sci_integrate.quad(f, a, b)

print(f"∫₀^√π sin(x²) dx:")
print(f"  SciPy adaptive quadrature: {scipy_result:.8f} (error bound: {scipy_error:.2e})")
for n in [10, 100, 1000]:
    trap = trapezoidal(f, a, b, n)
    simp = simpsons(f, a, b, n)
    print(f"  n={n:5d}: Trapezoidal={trap:.8f} (err={abs(trap-scipy_result):.2e}), "
          f"Simpson={simp:.8f} (err={abs(simp-scipy_result):.2e})")

# Expected: Simpson's converges ~1000× faster than Trapezoidal for smooth functions

Vectorization vs Loops

Python loops are slow for numerical work because they are interpreted one step at a time. NumPy vectorization delegates computation to compiled C/Fortran routines operating on entire arrays at once.

import numpy as np
import time

n = 5_000_000

# --- Python loop ---
a = list(range(n))
b = list(range(n, 2*n))

t0 = time.perf_counter()
result_loop = [a[i] * b[i] for i in range(n)]
t_loop = time.perf_counter() - t0

# --- NumPy vectorized ---
a_np = np.arange(n, dtype=float)
b_np = np.arange(n, n + n, dtype=float)

t0 = time.perf_counter()
result_np = a_np * b_np
t_np = time.perf_counter() - t0

print(f"Element-wise multiplication of {n:,} elements:")
print(f"  Python loop: {t_loop:.3f} s")
print(f"  NumPy:       {t_np:.4f} s")
print(f"  Speedup:     {t_loop / t_np:.0f}×")

# --- Softmax: loop vs vectorized ---
def softmax_loop(z):
    exp_z = [np.exp(zi) for zi in z]
    total = sum(exp_z)
    return [e / total for e in exp_z]

def softmax_vectorized(z):
    z_shifted = z - z.max()   # numerical stability
    exp_z = np.exp(z_shifted)
    return exp_z / exp_z.sum()

z = np.random.randn(10000)
t0 = time.perf_counter(); _ = softmax_loop(z);        t_loop = time.perf_counter() - t0
t0 = time.perf_counter(); _ = softmax_vectorized(z);  t_vec  = time.perf_counter() - t0
print(f"\nSoftmax on 10,000-dim vector:")
print(f"  Loop:        {t_loop*1000:.2f} ms")
print(f"  Vectorized:  {t_vec*1000:.2f} ms")
print(f"  Speedup:     {t_loop/t_vec:.0f}×")
Key rule: In NumPy/ML code, replace any loop that operates elementwise on an array with a vectorized operation. Use loops only for true sequential dependencies (e.g., iterating over epochs, processing one document at a time). Profile first with %timeit in Jupyter before optimizing.

Practice Exercises

Exercise 1Monte Carlo
Monte Carlo Integration — Expected Value of a Normal

Estimate $\mathbb{E}[X^2]$ for $X \sim \mathcal{N}(0, 1)$ using Monte Carlo with $n = 100{,}000$ samples. The true answer is $\text{Var}(X) + \mathbb{E}[X]^2 = 1 + 0 = 1$.

Show Answer
import numpy as np
np.random.seed(42)
samples = np.random.randn(100_000)
estimate = (samples**2).mean()
print(f"MC estimate of E[X²]: {estimate:.4f}  (true: 1.0)")
print(f"Error: {abs(estimate - 1.0):.4f}")
# Typical output: 0.9989, error ≈ 0.001
Exercise 2SymPy
Symbolic Gradient of Cross-Entropy + Softmax

Using SymPy, define softmax for 2 classes ($z_1, z_2$) and compute $\partial \mathcal{L} / \partial z_1$ where $\mathcal{L} = -\log(\text{softmax}_1)$. Confirm it equals $p_1 - 1$.

Show Answer
from sympy import symbols, exp, log, diff, simplify
z1, z2 = symbols('z1 z2', real=True)
p1 = exp(z1) / (exp(z1) + exp(z2))
loss = -log(p1)
grad = simplify(diff(loss, z1))
print("∂L/∂z1 =", grad)  # Should simplify to exp(z2)/(exp(z1)+exp(z2)) = p2 = 1 - p1
# i.e., gradient = p1 - 1 for the true class (y=1), p1 for non-true class

Conclusion & Next Steps

Python's scientific stack maps almost perfectly onto the math of this series:

  • NumPy — vectors, matrices, norms, eigenvalues (Parts 7, 3)
  • SciPy optimize — gradient descent and constrained optimization (Part 8)
  • SciPy stats — hypothesis testing, distributions (Parts 4, 5)
  • SymPy — symbolic calculus, exact derivatives (Part 8), MLE derivations (Part 9)
  • Monte Carlo — integration, bootstrap CI, simulation (Parts 4, 5)
  • Vectorization — the performance foundation of every ML framework

Next in the Series

In Part 11: Advanced Topics, we push into graduate-level material: multivariate Gaussians, Bayesian inference, Markov chains, concentration inequalities, and the EM algorithm — the math behind some of ML's most powerful techniques.