Count Model
A next-token model estimates $p(x_t|x_{t-1})$. This toy version learns from character transition counts with add-one smoothing.
import numpy as np
text = "math for ai math for ml"
chars = sorted(set(text))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for ch, i in stoi.items()}
V = len(chars)
counts = np.ones((V, V))
for a, b in zip(text[:-1], text[1:]):
counts[stoi[a], stoi[b]] += 1
probs = counts / counts.sum(axis=1, keepdims=True)
print("vocab:", chars)
print("p(next | 'm'):", np.round(probs[stoi['m']], 3))Cross-Entropy & Perplexity
Cross-entropy is average surprise; perplexity is $\exp(H)$.
import numpy as np
text = "math for ai"
chars = sorted(set(text))
stoi = {ch: i for i, ch in enumerate(chars)}
V = len(chars)
probs = np.ones((V, V)) / V
losses = []
for a, b in zip(text[:-1], text[1:]):
losses.append(-np.log(probs[stoi[a], stoi[b]]))
ce = float(np.mean(losses))
print("cross entropy:", round(ce, 4))
print("perplexity:", round(np.exp(ce), 4))Sampling
Temperature changes randomness by rescaling logits before softmax.
import numpy as np
logits = np.array([2.0, 1.0, 0.2])
for temp in [0.5, 1.0, 2.0]:
scaled = logits / temp
p = np.exp(scaled - scaled.max())
p = p / p.sum()
print(temp, np.round(p, 3))