Embedding Search Mini-RAG

Cosine Similarity

Cosine similarity compares direction, not magnitude: $$\cos(x,y)=\frac{x\cdot y}{\|x\|\|y\|}.$$ For embedding search, normalize every vector once, then dot products become cosine scores.

import numpy as np

def normalize(X):
    return X / (np.linalg.norm(X, axis=1, keepdims=True) + 1e-12)

docs = np.array([[1, 0, 0], [0.8, 0.2, 0], [0, 1, 0], [0, 0.2, 0.9]], dtype=float)
query = np.array([[1, 0.1, 0]], dtype=float)
D = normalize(docs)
q = normalize(query)
scores = (D @ q.T).ravel()
print(np.round(scores, 3))

Retriever

import numpy as np

documents = ["linear algebra and vectors", "bread recipes", "attention uses dot products", "probability and Bayes"]
embeddings = np.array([[1, .8, 0], [0, 0, 1], [1, .9, .1], [.2, 1, 0]], dtype=float)
query = np.array([[1, .85, 0]], dtype=float)
embeddings = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
query = query / np.linalg.norm(query, axis=1, keepdims=True)
scores = embeddings @ query.T
order = np.argsort(-scores.ravel())[:2]
for idx in order:
    print(round(float(scores[idx, 0]), 3), documents[idx])

Ranking Metrics

Common retrieval metrics include recall@k, precision@k, mean reciprocal rank, and nDCG. They measure whether useful context appears early enough for the generator to use it.

ExerciseRAG

Chunking Experiment

Split a paragraph into 50-token and 200-token chunks. Compare which chunk size retrieves the most precise context for a specific question.

Cookie Consent

Table of Contents

Cosine Similarity

Retriever

Ranking Metrics

Chunking Experiment

Cookie Consent

Embedding Search Mini-RAG

Table of Contents

Cosine Similarity

Retriever

Ranking Metrics

Chunking Experiment

Related Reading

Linear Algebra