Back to Math for AI Hub

Generative Model Math

April 30, 2026Wasil Zafar24 min read

Generative models learn probability distributions, not just decision boundaries. This extension explains the math families behind VAEs, GANs, normalizing flows, and diffusion models.

Table of Contents

  1. Model Family Map
  2. VAEs & ELBO
  3. GAN Objectives
  4. Normalizing Flows
  5. Diffusion & Score Matching
  6. Guidance
Big picture: generative models estimate, transform, or sample from distributions. The core language is likelihood, divergence, latent variables, and stochastic processes.

Model Family Map

FamilyCore ObjectiveMath Tool
VAEMaximize ELBOVariational inference + KL
GANAdversarial min-max gameJS divergence intuition
FlowExact likelihoodChange of variables
DiffusionDenoising / score matchingMarkov chains + gradients of log density

VAEs & ELBO

A VAE introduces latent variables $z$ and optimizes a lower bound on the log likelihood:

$$\log p_\theta(x) \ge \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)] - D_{KL}(q_\phi(z|x)\|p(z))$$

The first term rewards reconstruction. The KL term regularizes the latent space toward a simple prior, usually $\mathcal{N}(0,I)$.

import numpy as np

# KL between diagonal Gaussian q(z|x)=N(mu, sigma^2) and p(z)=N(0, I)
mu = np.array([0.2, -0.5, 1.0])
log_var = np.array([-0.1, 0.3, 0.0])
kl = -0.5 * np.sum(1 + log_var - mu**2 - np.exp(log_var))
print("KL(q || p):", round(float(kl), 4))

GAN Objectives

The original GAN objective is a game:

$$\min_G \max_D \mathbb{E}_{x\sim p_{data}}[\log D(x)] + \mathbb{E}_{z\sim p_z}[\log(1-D(G(z)))]$$

The discriminator estimates whether a sample is real; the generator learns to produce samples that fool it.

Normalizing Flows

Flows use invertible transformations $x=f(z)$ and the change-of-variables formula:

$$\log p_X(x)=\log p_Z(f^{-1}(x)) + \log\left|\det \frac{\partial f^{-1}}{\partial x}\right|$$

They trade architectural flexibility for exact likelihood computation.

Diffusion & Score Matching

Diffusion gradually adds noise, then trains a network to reverse the process. A common simplified objective predicts the noise $\epsilon$ added to a clean sample $x_0$:

$$\mathcal{L}=\mathbb{E}_{t,x_0,\epsilon}\|\epsilon - \epsilon_\theta(x_t,t)\|_2^2$$

import numpy as np

np.random.seed(0)
x0 = np.array([1.0, -0.5])
alpha_bar = 0.7
eps = np.random.randn(*x0.shape)
xt = np.sqrt(alpha_bar) * x0 + np.sqrt(1 - alpha_bar) * eps
print("noisy sample:", np.round(xt, 3))

Guidance

Classifier-free guidance combines unconditional and conditional denoising predictions:

$$\epsilon_{guided}=\epsilon_{uncond}+s(\epsilon_{cond}-\epsilon_{uncond})$$

The scale $s$ increases prompt adherence but can reduce diversity or create artifacts when pushed too high.