All posts
Architecture Published May 17, 2026 6 min

The vector store is not your agent's memory

A new survey from BigAI-NLCO splits LLM memory into three layers. Most production agents I review have built the middle one, called it memory, and skipped the layer on top. Here is what the taxonomy actually buys you.

Jigar JoshiJigar JoshiAgentic AI Architect and Consultant
In this post (6 sections)

Last month I reviewed an agent built on a 4 million chunk vector store. The team called the vector store "memory". They could not work out why the agent kept losing track of what the user had said three turns earlier.

The vector store was fine. The problem was that retrieval and memory are not the same thing, and the team had bolted on a retrieval pipeline while assuming it would also give them an agent that remembered conversations. It does not. There is now a survey from BigAI-NLCO that does a clean job of naming why, and the taxonomy in it is the most useful thing I have read on this in a while.

Three layers, not one

The survey (Awesome-AI-Memory on GitHub, paper in TMLR) splits memory into implicit, explicit, and agentic. They borrow the names loosely from neuroscience but you do not have to care about the analogy to use the split. The split itself is what matters: at a glance you can tell which layer a given paper is about, and which layer your own system has not built yet.

The three memory layers
Agentic memoryPer-session + per-user state. What this user said, what was decided, what to remember next time.Explicit memoryExternal stores (RAG, vector DBs, graphs, long-context stuffing). Shared, read-mostly, queried per request.Implicit memoryWhat the model already learned in pretraining and fine-tuning. Lives in the weights; cannot be changed at runtime.

Implicit memory: what the weights already know

This is everything the model learned in pretraining and fine-tuning. The research the survey points at is interesting (transformer feed-forward layers behave a lot like key-value stores, knowledge neurons can be localised, memorisation scales differently from reasoning) but the practical takeaway is short: you cannot change this layer at runtime. What it does decide is how much of the next two layers you actually need. A model that already knows your domain needs less retrieval. A model that does not, needs more.

Explicit memory: the stuff you bolt on outside

RAG. Vector stores. Knowledge graphs. Long-context stuffing. This is the layer almost everyone calls "memory" in production, which is where the trouble starts. It is read-mostly, queried per request, and shared across users. It is good at grounding answers in current facts. It is not an agent's memory of a conversation. It has no concept of who asked what, when, or what got decided.

A rule that has saved me from a lot of design arguments: use this layer when the question is "what does the world know about X?". Do not use it when the question is "what did this specific user just tell me?". Those are different problems with different storage shapes.

Agentic memory: the layer most teams skip

This is the one that goes missing. The survey covers it in some depth: single-agent vs multi-agent, short-term vs long-term, the architecture of ingestion → storage → retrieval → invocation, and how you evaluate it. The simplest framing I have found that survives a code review:

  • A short-term scratchpad that lives inside the session. What the user said, which tools got called, what was decided. Most teams approximate this with raw prompt history. Fine for a prototype. In production you want it instrumented so you can see what fell out of the window and why.
  • A long-term store keyed per user or per workspace. What this user prefers, what they tried last week, what they explicitly asked you to remember. This is its own database. It is not your RAG corpus and it should not share infrastructure with it.
  • A consolidation step. Something that, every so often, promotes things from short-term to long-term and summarises old turns out of the live context. Without it, sessions grow unbounded and the agent ends up "forgetting" things it actually saw, just because they aged off the prefix.

I have seen teams build any one of these and call the problem solved. The leverage is in building all three.

Why the taxonomy is worth holding in your head

Most of the "the agent feels dumb" reports I get from clients are layer-assignment problems in disguise. Agent does not know a product fact? Implicit plus explicit. Fine-tune or retrieve. Agent answers correctly but cannot remember what the user just said? Short-term agentic. Your context-window management is leaking. Agent treats every session as if it is meeting the user for the first time? Long-term agentic. There is no per-user store, and adding more RAG will not produce one no matter how much money you throw at it.

A related thing worth pulling out: the survey's coverage of memory editing and unlearning is the right thing to read before any team commits to fine-tuning facts into the weights. That is the most expensive memory layer you can pick and the hardest to roll back. The survey walks the gradient of cheaper alternatives (knowledge editing, retrieval-augmented updates, agentic memory writes) which is usually where the answer lives.

Where to start reading

If you have thirty minutes, three things, in this order. The repo README itself, for the taxonomy and the neuroscience anchors. The "Transformer Feed-Forward Layers Are Key-Value Memories" paper (2012.14913) which will recalibrate how you think about weights. And the agent-memory survey at 2404.13501, which reads as a practitioner companion to the BigAI synthesis and is the one I keep open on the side when I am whiteboarding the layer-three design.

The reason most production agents in 2026 feel shallow is not that the models are weak. It is that engineering teams imported the RAG pattern from 2023, called it memory, and stopped there. The BigAI survey gives the missing layer a name and a literature. Open your system architecture diagram and look for the box that is not there yet. That is usually the work.

Reference: https://github.com/bigai-nlco/Awesome-AI-Memory. TMLR paper https://openreview.net/forum?id=Sk7pwmLuAY. arXiv https://arxiv.org/abs/2601.09113.

The weekly take

Agentic AI patterns, delivered Thursdays

What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.

Shipping an agentic AI project this quarter?
Book a 30-min consult
Frequently asked

Questions readers ask about this post

Share this post
LinkedIn Facebook