Ferryte's lineage cascade triggers BatchDeleteMemoryRecords after DeleteEvent — exactly what AWS's own docs recommend. The remaining WARN is the stale-fact bug class (needs versioning, not cascade).
A reproducible test of delete-after-revoke behaviour across popular agent-memory stacks. We plant a canary, call each stack’s real delete API, then check whether the agent can still surface it — first without Ferryte, then with.
Naive delete vs. Ferryte cascade — % of scenarios each stack passes cleanly.
Before is what every framework does today: delete the source, hope the derived memory follows. After turns on Ferryte’s lineage cascade — the same harness, the flag --with-ferryte.
Honest limitation: Mem0's internal fact-extractor creates derived memories whose IDs aren't returned to the caller, so the lineage graph can't yet enumerate them for cascade. Deeper Mem0 instrumentation is on the Ferryte roadmap.
The raw row delete is clean — every store behaves the same. The leak is in the summary layer on top; Ferryte's lineage cascade clears the derived summary in lockstep with the source.
Self-hosted Community Edition was deprecated; the current zep-cloud SDK is cloud-only. Pending a hosted-account run.
All scores are reproducible — the 'with Ferryte' column uses the same harness with `--with-ferryte`.
The raw vector DB isn’t the villain.
The summary layer on top is.
A row delete on pgvector, Chroma, or Qdrant is clean — that’s why they score identically. The leak appears once an LLM summary or knowledge-graph node absorbs the fact and the delete doesn’t propagate. That derived layer is exactly what real agent-memory frameworks add — and exactly what this benchmark measures.
Real backends, default configs
Each stack runs in its recommended setup on our own deployments — no strawmen, no private systems.
Plant a canary the data can't invent
A unique marker is written for one tenant, through one source, so any later appearance is provably a leak.
Call the real delete API
We revoke the source the way an app would — then probe retrieval to see what survived.
Score what's left
PASS if the marker is gone everywhere, LEAK if it resurfaces, BLIND if we honestly couldn't tell.
Don’t trust us. Run it.
Every number on this page comes from one command against pinned, open-source backends. Clone the repo, bring up the stores, point it at your own API key.
git clone https://github.com/getferryte/ferryte
cd ferryte/benchmark
cp .env.example .env # add your OpenAI key
docker compose up -d # pgvector · qdrant · chroma
pip install -r requirements.txt
# Before: naive delete
python -m benchmark.run --scenarios all \
--backends mem0,qdrant,chroma,pgvector \
--embedder openai --summarizer openai
# After: same harness, lineage cascade on
python -m benchmark.run --scenarios all \
--backends mem0,qdrant,chroma,pgvector \
--embedder openai --summarizer openai --with-ferryteThe part most benchmarks hide.
“You sell the fix — of course you found leaks.”
Fair to ask. That's why the entire harness is open, the configs and versions are pinned, and you can falsify any cell yourself. We also publish what PASSES — cross-tenant isolation holds on every stack we've tested.
Did you rig the configs?
No. Backends run in their default / recommended setups on our own deployments. We never touch anyone's private systems, and the in-memory illustrative baseline is kept separate from real-backend results.
What did you NOT test?
Proprietary managed memory we can't self-host, and any behaviour behind a paywall we didn't buy. Where we can't verify, a cell reads BLIND — never a silent PASS.
Isn't a vector DB row-delete enough?
For the raw row, yes. The leak is the derived layer — summaries and graph nodes that absorbed the fact. The vendors document this themselves.