search that understands me, running on my own laptop

A small robot figure on a desk next to a laptop

Keyword search over my own notes has always let me down in the same way. I know I wrote something about a printer that kept dropping off the network, but I called it "the bloody printer issue" at the time, and grepping for "printer offline" finds nothing. The words I search with are never the words I wrote. What I actually want is search by meaning, and that is exactly what embeddings give you.

The good news, and the reason I am writing this, is that you do not need an API or a cloud service to do it. A small embedding model runs perfectly well on a laptop CPU, and the whole thing stays local.

the idea in one paragraph

An embedding model turns a piece of text into a vector, a list of a few hundred numbers, positioned so that text with similar meaning ends up close together in that space. "Printer keeps disconnecting" and "the bloody printer issue" land near each other even though they share no words. To search, you embed your query the same way and find the nearest stored vectors. That nearness is the whole trick.

A circuit board representing on-device computation

doing it locally

I used sentence-transformers with the all-MiniLM-L6-v2 model. It is small, around 80 MB, it produces 384-dimensional vectors, and it runs comfortably on CPU. For a few thousand notes you do not need a GPU and you do not need a vector database. A NumPy array and a dot product is plenty.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

# docs is a list of note bodies
embeddings = model.encode(docs, normalize_embeddings=True)
np.save("index.npy", embeddings)

Normalising the embeddings means a dot product is the same as cosine similarity, which saves a step at query time. Searching is then just embedding the query and taking the top scores:

q = model.encode(["printer keeps dropping off wifi"],
                 normalize_embeddings=True)
scores = embeddings @ q[0]
top = np.argsort(scores)[::-1][:5]
for i in top:
    print(f"{scores[i]:.2f}  {docs[i][:60]}")

That ran the query against a couple of thousand notes in a few milliseconds, and the top hit was indeed my furious note about the printer, titled nothing of the sort.

what it is good and bad at

It is excellent at the "I know roughly what I meant" search, which is most of how I look for old notes. It is not a replacement for exact search. If I want a specific error string or a function name, grep is still faster and more precise, because embeddings smooth over exactly the detail you sometimes need. So I kept both: grep for the literal, embeddings for the vague. They cover different failure modes.

The encoding is the only slow part, and it is one-off. I embed a note when I save it and append the vector to the index, so search itself is always instant. Re-embedding the whole corpus from scratch takes under a minute on this laptop, which is fast enough that I have not bothered optimising it.

why bother keeping it local

Two reasons, and neither is ideology. The first is latency: a local dot product is faster than a round trip to any API, full stop. The second is that these are my private notes, and the calculus of sending every note I have ever written to a third party so I can search them is a poor trade when an 80 MB model on my own disk does the job. Nothing leaves the machine, there is no key to rotate, and it works on a train with no signal.

Semantic search used to feel like infrastructure: a service, a database, an account. It turns out that for a personal corpus it is a small model, a NumPy array, and about thirty lines of Python. That is a much nicer place for it to live.