Back to Math for AI Hub

Tiny Next-Token Model

April 30, 2026Wasil Zafar16 min read

A small language-modeling capstone that predicts the next character from the current character and connects probability tables to cross-entropy and perplexity.

Table of Contents

  1. Count Model
  2. Cross-Entropy
  3. Sampling

Count Model

A next-token model estimates $p(x_t|x_{t-1})$. This toy version learns from character transition counts with add-one smoothing.

import numpy as np

text = "math for ai math for ml"
chars = sorted(set(text))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for ch, i in stoi.items()}
V = len(chars)
counts = np.ones((V, V))
for a, b in zip(text[:-1], text[1:]):
    counts[stoi[a], stoi[b]] += 1
probs = counts / counts.sum(axis=1, keepdims=True)
print("vocab:", chars)
print("p(next | 'm'):", np.round(probs[stoi['m']], 3))

Cross-Entropy & Perplexity

Cross-entropy is average surprise; perplexity is $\exp(H)$.

import numpy as np

text = "math for ai"
chars = sorted(set(text))
stoi = {ch: i for i, ch in enumerate(chars)}
V = len(chars)
probs = np.ones((V, V)) / V
losses = []
for a, b in zip(text[:-1], text[1:]):
    losses.append(-np.log(probs[stoi[a], stoi[b]]))
ce = float(np.mean(losses))
print("cross entropy:", round(ce, 4))
print("perplexity:", round(np.exp(ce), 4))

Sampling

Temperature changes randomness by rescaling logits before softmax.

import numpy as np

logits = np.array([2.0, 1.0, 0.2])
for temp in [0.5, 1.0, 2.0]:
    scaled = logits / temp
    p = np.exp(scaled - scaled.max())
    p = p / p.sum()
    print(temp, np.round(p, 3))