I wanted to find things in my notes by meaning rather than by exact wording, and I wanted it to happen entirely on my own machine. The first part is what embeddings are for. The second part is the bit people assume needs an API key, and it doesn't.
The whole thing is smaller than you'd think. Run every note through a sentence-transformer model to turn it into a vector, keep the vectors in memory, and when I search, embed the query the same way and pull back the closest matches by cosine similarity. No cloud, no token bill, nothing leaving the laptop.
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")
corpus_emb = model.encode(notes, convert_to_tensor=True)
q = model.encode("that thing about cron firing twice", convert_to_tensor=True)
hits = util.semantic_search(q, corpus_emb, top_k=5)
all-MiniLM-L6-v2 is the model that made this practical. It's about 80MB and runs fine on CPU, so there's no GPU to feed and no warm-up to wait for. The vectors are 384 dimensions, which is small enough that a few thousand notes fit in memory without me thinking about a vector database at all. When I eventually have too many to brute-force, I'll reach for FAISS, but I'm nowhere near that and I refuse to add the dependency early.
What surprised me was how good "good enough" felt. I searched for "the cron job that ran twice" and it found a post where I'd never used the word cron in the title, because it understood I meant duplicate scheduled jobs. That's the entire point. Keyword search can't do that, and I didn't have to send a word of my notes to anyone to get it.