Why Not a Graph Database?
When developers realize their AI agent needs to remember relational data (“Who reviewed the PR that broke production last week?”), the first instinct is to spin up Neo4j, Memgraph, or Amazon Neptune.
Graph databases solve real problems. They also require JVM tuning, custom query languages (Cypher), schema management, and background daemon processes. For a personal AI assistant running on a laptop or a single container, deploying a full-blown graph database is massive over-engineering.
An agent’s memory bank is essentially a single-user dataset. The “scale” problem is the breadth of the user’s personal history, not thousands of concurrent read-write transactions per second. In membox, we skipped the heavy graph database entirely. Instead, we map the graph into flat SQLite tables and perform graph traversals directly in Python memory.
Representing the Graph in SQLite
If you strip away the abstraction, a knowledge graph is just nodes and edges. In SQLite, this maps trivially to three tables:
entitiesholds the nodes (e.g.,Alice,Acme Corp).relationsstores the edges connecting them (e.g.,(Alice, works_at, Acme Corp)).relation_evidencelinks edges back to the source text that proved them.
With this structure, finding the “neighbors” of an entity is just a SELECT statement joining the relations table where the entity is either the subject or the object.
The BFS Algorithm Implementation
Once you have the starting nodes (the “seed entities” extracted from the user’s prompt), multi-hop retrieval is just a classic Breadth-First Search (BFS).
The implementation in src/membox/core/store/retrieval.py:
def bfs_query(self, seed_ids: list[int], max_hops: int) -> HopResult:
visited: set[int] = set(seed_ids)
frontier: set[int] = set(seed_ids)
collected: dict[int, tuple[int, int, int, str]] = {}
for _ in range(max_hops):
if not frontier:
break
# Fetch all edges where subject or object is in the frontier
edges = self.get_neighbors(frontier)
new_frontier: set[int] = set()
for rid, src, tgt, pred in edges:
collected[rid] = (rid, src, tgt, pred)
for neighbor in (src, tgt):
if neighbor not in visited:
visited.add(neighbor)
new_frontier.add(neighbor)
frontier = new_frontier
# ... name resolution and evidence fetching follows ...
It maintains a frontier of entity IDs to explore next, fetches all connected relations in a single batch query (self.get_neighbors(frontier)), and aggregates the newly discovered entities for the next hop.
Because SQLite runs in-process via the standard library, 2 or 3 SELECT queries per loop take single-digit milliseconds.
Limiting Hops and Tracking Provenance
Unbounded graph traversal is a recipe for context window explosion. By strictly enforcing a max_hops parameter (typically 2 or 3), the agent only retrieves immediate context.
But triples alone (Alice works_at Acme) don’t give the model enough context. It needs to see why it believes that triple, which is what provenance tracking provides.
Because we decoupled graph extraction from the source documents, we can fetch the exact text snippets that generated the relations discovered during the BFS pass:
# Gather evidence documents, dedup by doc_id, preserve insertion order
evidence = self.get_evidence_docs(list(collected.keys()))
seen_docs: set[int] = set()
docs: list[str] = []
for _, did, content in evidence:
if did not in seen_docs:
seen_docs.add(did)
docs.append(content)
We also tag these snippets with structured metadata so the LLM can trace each memory back to its source. In membox, we build a provenance tag for every piece of evidence:
def _provenance_tag(project, source_path, section, doc_date) -> str:
# Builds tags like "myproject src/main.py ## setup 2026-06-10"
...
When the LLM answers, it can point to the exact file and section the memory came from.
Performance and Tradeoffs
The main advantages: zero ops overhead (the graph is a .db file, no containers or daemons), fast queries (a 3-hop query over tens of thousands of edges runs in milliseconds thanks to B-Tree indexes on source_id and target_id), and easy debugging (inspect with plain SQL).
The tradeoff: this doesn’t scale horizontally. If your graph hits billions of nodes, loading neighbor frontiers into Python sets will OOM your machine. But as we said at the start, this is a local-first memory layer for a personal AI agent. At that scale, a pure-Python BFS over SQLite isn’t just good enough, it’s the right choice.
Up Next
Now we have a clean graph (thanks to deduplication) and a fast way to query it (via Python BFS). But what happens when multiple agents try to update this SQLite database at the exact same time?
The next post covers how we made SQLite safe for concurrent agents without giving up the local-first architecture.
