Cosine Similarity
Cosine similarity compares direction, not magnitude: $$\cos(x,y)=\frac{x\cdot y}{\|x\|\|y\|}.$$ For embedding search, normalize every vector once, then dot products become cosine scores.
import numpy as np
def normalize(X):
return X / (np.linalg.norm(X, axis=1, keepdims=True) + 1e-12)
docs = np.array([[1, 0, 0], [0.8, 0.2, 0], [0, 1, 0], [0, 0.2, 0.9]], dtype=float)
query = np.array([[1, 0.1, 0]], dtype=float)
D = normalize(docs)
q = normalize(query)
scores = (D @ q.T).ravel()
print(np.round(scores, 3))Retriever
import numpy as np
documents = ["linear algebra and vectors", "bread recipes", "attention uses dot products", "probability and Bayes"]
embeddings = np.array([[1, .8, 0], [0, 0, 1], [1, .9, .1], [.2, 1, 0]], dtype=float)
query = np.array([[1, .85, 0]], dtype=float)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
query = query / np.linalg.norm(query, axis=1, keepdims=True)
scores = embeddings @ query.T
order = np.argsort(-scores.ravel())[:2]
for idx in order:
print(round(float(scores[idx, 0]), 3), documents[idx])Ranking Metrics
Common retrieval metrics include recall@k, precision@k, mean reciprocal rank, and nDCG. They measure whether useful context appears early enough for the generator to use it.
Chunking Experiment
Split a paragraph into 50-token and 200-token chunks. Compare which chunk size retrieves the most precise context for a specific question.