The AI Memory Funnel: Triaging and Crystallizing Agent Context

In the last post, we built an async queue to decouple the slow LLM extraction step from the agent’s fast critical path. Before that, we tackled deduplication and fused BM25 with BFS to make retrieval both fast and accurate.

But getting data in and pulling it out is just the I/O layer. The real problem with AI memory over a long timeline is noise.

If you just shove every conversation log into a vector database, your agent’s context window will eventually choke on redundant facts, stale beliefs, and contradictory statements. AI memory shouldn’t be an append-only trash can. It should behave more like a human brain: it needs a lifecycle.

In membox, that lifecycle lives primarily in core/consolidate.py. We modeled it as a “memory funnel”: Trace → Unit → Crystal. It automatically decays noise and distills high-frequency facts.

The Three-Tier Funnel

Every piece of information entering the system starts at the top of the funnel and has to earn its way down.

Trace (Raw Event): A transient, low-confidence observation. “The user said they prefer Python over JavaScript today.” Most traces are just conversational exhaust.
Unit (Deduplicated Fact): When a trace makes it through the deduplication pipeline, it becomes a Unit. It’s a structured piece of knowledge (User, prefers, Python). But it’s still just one data point.
Crystal (Solidified Context): When a Unit is corroborated by multiple independent Traces over time, it crystallizes. Crystals are the immutable, high-confidence facts that the agent relies on for core context.

The Validation Threshold: Why Three Sources?

How does a Unit become a Crystal? It needs proof.

In consolidate.py, a memory Unit is continuously evaluated against a threshold. We set the bar at ≥3 independent sources. Three independent sources is the default bar; an explicitly user-confirmed fact or a high-confidence decision (≥0.90 confidence, ≥0.80 importance) can also crystallize.

If an agent extracts the fact (Project, uses, SQLite) from a single chat session, it’s a Unit. It has a confidence score, but it hasn’t crystallized. If the agent later reads a README.md that says the same thing, that’s source two. If it observes a code commit modifying a .db file, that’s source three.

Once the source count hits 3, the crystal_policy() check marks the unit eligible and consolidation promotes it to CRYSTAL status. The engineering reason behind this strict default threshold is simple: LLMs hallucinate during extraction. A single extraction error shouldn’t permanently warp the agent’s worldview. By requiring three independent observations, we force the system to cross-reference itself. False positives from bad LLM extractions rarely repeat themselves exactly three times across different contexts.

Evolving Confidence and Importance

A memory isn’t static. Its value changes based on how often it’s used and how much evidence backs it up.

Every Unit tracks two metrics: confidence and importance.

Confidence grows as new evidence (Traces) accumulates. Every new corroborating source bumps the confidence score by a fixed step per corroborating source, capped at 0.95.

Importance starts at a baseline when a unit is created.

If a fact is highly corroborated but never used, it has high confidence but low importance. If a fact is used constantly but only has one source, it has high importance but low confidence. These metrics gate crystallization, while forgetting is driven by separate lifecycle mechanisms.

Supersession: Knowledge Replacement via Dependency Graphs

What happens when a fact changes? The user used to prefer Python, but now they write Rust exclusively.

We don’t do DELETE FROM memory WHERE subject='User'. Hard deletions in an asynchronous, multi-agent system are dangerous. Instead, we use Supersession.

Consolidation detects when a newer unit replaces an older one (matching topic + correction language), flips the older unit’s status to SUPERSEDED, and records a superseded_by pointer.

Retrieval’s BFS skips superseded relations by default, so the stale fact drops out of active context while staying in the DB. The old fact is still technically in the database—which is useful if the agent needs to remember when the user changed their mind—but it gets excluded.

The Elegance of Local-First Forgetting

This brings us to the most human part of the design: forgetting.

In a cloud SaaS application, you have to run garbage collection cron jobs to keep storage costs down. But membox is local-first. A SQLite database sitting on a developer’s NVMe drive has practically infinite space relative to text triples.

We don’t need to explicitly delete anything. Instead, we rely on natural decay and archiving.

Remember the scoring formula from the retrieval post: score(t) = decay^hops(t) * ( α·sim(t) + (1-α)·bm25(t) ). Retrieval hop-distance decay causes distant facts to drop off. On top of that, consolidate.py runs a decay_action that archives units past their valid_to timestamp, moving them to ARCHIVED status.

It falls out of the active context window. It’s forgotten, not deleted. It naturally decays and archives over time.

Tying It Together

By enforcing the Trace → Unit → Crystal funnel, membox stops being a dumb vector store and starts acting like an intelligent memory layer. The async queue ensures ingests are fast, the deduplication pipeline keeps the graph clean, the BFS+BM25 fusion finds the logical links, and the funnel ensures only high-quality, corroborated facts survive the test of time.