If you've spent any time on AI architecture Twitter lately, you'd think it was impossible to build a Retrieval-Augmented Generation (RAG) system without a massive, managed cloud vector database like Pinecone, Milvus, or Qdrant.
And if you're building a semantic search engine indexed against Wikipedia's 6 million articles... they are right. You need that scale. But what if you are building an autonomous agent for a specific enterprise client?
The Reality of Agentic Memory
When Odigos builds an agent for a client to monitor their servers or manage internal documents, the context window required isn't millions of rows. It's usually thousands, maybe tens of thousands of highly specific entities and conversational logs.
Injecting a third-party managed SaaS database creates two massive architectural flaws:
- Latency: Every recall requires an external HTTPS request, JSON parsing, and round-trip wait.
- Data Sovereignty: You are piping proprietary knowledge into another startup's cloud infrastructure.
Enter SQLite with Vector Extensions
By compiling SQLite with a vector search extension (like sqlite-vec), we achieve something elegant. The entire database, including multi-dimensional embeddings, lives in a single .db file alongside the agent's execution environment.
Zero Network Latency: The database is read from local disk (or RAM), so retrieving top-K nearest neighbors takes single-digit milliseconds.
ACID Compliance: We store standard relational graphs in the same file, ensuring transactional integrity when the agent updates its state.
True Self-Hosting: When we hand over an agent, we hand over a Docker container. No external database API keys to manage. The container holds logic, tools, and memory. It can run air-gapped.
Conclusion
For 95% of enterprise agent deployments, scaling to millions of vectors is premature optimization. Speed, security, and simplicity should be the priority. Embrace embedded databases for significantly more resilient systems.