Training, Alignment & Evaluation Math

Modern training math: optimization controls whether a model learns; alignment objectives control what it prefers; evaluation statistics control whether improvements are real.

AdamW

Adam keeps exponential moving averages of gradients and squared gradients:

$$m_t=\beta_1m_{t-1}+(1-\beta_1)g_t,\quad v_t=\beta_2v_{t-1}+(1-\beta_2)g_t^2$$

AdamW decouples weight decay from the adaptive update, which is often more stable for large neural networks.

import numpy as np

grad = np.array([0.4, -0.2, 0.1])
w = np.array([1.0, -1.0, 0.5])
m = np.zeros_like(w)
v = np.zeros_like(w)
lr, beta1, beta2, eps, wd = 1e-3, 0.9, 0.999, 1e-8, 0.01
m = beta1 * m + (1 - beta1) * grad
v = beta2 * v + (1 - beta2) * grad**2
m_hat = m / (1 - beta1)
v_hat = v / (1 - beta2)
w = w - lr * (m_hat / (np.sqrt(v_hat) + eps) + wd * w)
print(np.round(w, 6))

Schedules & Clipping

Warmup prevents early unstable updates. Cosine decay gradually reduces learning rate. Gradient clipping rescales gradients when $\|g\|_2$ exceeds a threshold, preventing rare huge updates from destabilizing training.

Perplexity & Calibration

For language models, perplexity is exponentiated average negative log likelihood:

$$\text{PPL}=\exp\left(-\frac{1}{N}\sum_{i=1}^{N}\log p(x_i|x_{<i})\right)$$

Calibration asks whether probabilities mean what they say. If a model predicts 80% confidence on 100 examples, about 80 should be correct.

Preference Optimization

RLHF trains a reward model from comparisons, then optimizes a policy with a KL penalty to avoid drifting too far from the reference model. DPO writes the preference objective directly in terms of policy log probabilities:

$$\mathcal{L}_{DPO}=-\log\sigma\left(\beta\left[\log\frac{\pi_\theta(y_w|x)}{\pi_{ref}(y_w|x)}-\log\frac{\pi_\theta(y_l|x)}{\pi_{ref}(y_l|x)}\right]\right)$$

Evaluation Uncertainty

Benchmarks are samples. A 1% improvement on 200 examples may be noise; a 1% improvement on 20,000 examples is more convincing. Always pair score changes with uncertainty estimates and error analysis.

ExerciseEvaluation

Confidence Interval for Accuracy

A model gets 870 out of 1000 examples correct. Estimate a 95% confidence interval using $\hat{p}\pm1.96\sqrt{\hat{p}(1-\hat{p})/n}$.

Cookie Consent

Table of Contents

AdamW

Schedules & Clipping

Perplexity & Calibration

Preference Optimization

Evaluation Uncertainty

Confidence Interval for Accuracy

Cookie Consent

Training, Alignment & Evaluation Math

Table of Contents

AdamW

Schedules & Clipping

Perplexity & Calibration

Preference Optimization

Evaluation Uncertainty

Confidence Interval for Accuracy

Related Pages

Generative Model Math

Math for AI Hub