Generative Model Math

Big picture: generative models estimate, transform, or sample from distributions. The core language is likelihood, divergence, latent variables, and stochastic processes.

Model Family Map

Family	Core Objective	Math Tool
VAE	Maximize ELBO	Variational inference + KL
GAN	Adversarial min-max game	JS divergence intuition
Flow	Exact likelihood	Change of variables
Diffusion	Denoising / score matching	Markov chains + gradients of log density

VAEs & ELBO

A VAE introduces latent variables $z$ and optimizes a lower bound on the log likelihood:

$$\log p_\theta(x) \ge \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)] - D_{KL}(q_\phi(z|x)\|p(z))$$

The first term rewards reconstruction. The KL term regularizes the latent space toward a simple prior, usually $\mathcal{N}(0,I)$.

import numpy as np

# KL between diagonal Gaussian q(z|x)=N(mu, sigma^2) and p(z)=N(0, I)
mu = np.array([0.2, -0.5, 1.0])
log_var = np.array([-0.1, 0.3, 0.0])
kl = -0.5 * np.sum(1 + log_var - mu**2 - np.exp(log_var))
print("KL(q || p):", round(float(kl), 4))

GAN Objectives

The original GAN objective is a game:

$$\min_G \max_D \mathbb{E}_{x\sim p_{data}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1-D(G(z)))]$$

The discriminator estimates whether a sample is real; the generator learns to produce samples that fool it.

Normalizing Flows

Flows use invertible transformations $x=f(z)$ and the change-of-variables formula:

$$\log p_X(x)=\log p_Z(f^{-1}(x)) + \log\left|\det \frac{\partial f^{-1}}{\partial x}\right|$$

They trade architectural flexibility for exact likelihood computation.

Diffusion & Score Matching

Diffusion gradually adds noise, then trains a network to reverse the process. A common simplified objective predicts the noise $\epsilon$ added to a clean sample $x_0$:

$$\mathcal{L}=\mathbb{E}_{t,x_0,\epsilon}\|\epsilon - \epsilon_\theta(x_t,t)\|_2^2$$

import numpy as np

np.random.seed(0)
x0 = np.array([1.0, -0.5])
alpha_bar = 0.7
eps = np.random.randn(*x0.shape)
xt = np.sqrt(alpha_bar) * x0 + np.sqrt(1 - alpha_bar) * eps
print("noisy sample:", np.round(xt, 3))

Guidance

Classifier-free guidance combines unconditional and conditional denoising predictions:

$$\epsilon_{guided}=\epsilon_{uncond}+s(\epsilon_{cond}-\epsilon_{uncond})$$

The scale $s$ increases prompt adherence but can reduce diversity or create artifacts when pushed too high.

Cookie Consent

Table of Contents

Model Family Map

VAEs & ELBO

GAN Objectives

Normalizing Flows

Diffusion & Score Matching

Guidance

Cookie Consent

Generative Model Math

Table of Contents

Model Family Map

VAEs & ELBO

GAN Objectives

Normalizing Flows

Diffusion & Score Matching

Guidance

Related Math

Information Theory

Training, Alignment & Evaluation