The vector store is not your agent's memory
A new survey from BigAI-NLCO splits LLM memory into three layers. Most production agents I review have built the middle one, called it memory, and skipped the layer on top. Here is what the taxonomy actually buys you.
In this post (6 sections)
Last month I reviewed an agent built on a 4 million chunk vector store. The team called the vector store "memory". They could not work out why the agent kept losing track of what the user had said three turns earlier.
The vector store was fine. The problem was that retrieval and memory are not the same thing, and the team had bolted on a retrieval pipeline while assuming it would also give them an agent that remembered conversations. It does not. There is now a survey from BigAI-NLCO that does a clean job of naming why, and the taxonomy in it is the most useful thing I have read on this in a while.
Three layers, not one
The survey (Awesome-AI-Memory on GitHub, paper in TMLR) splits memory into implicit, explicit, and agentic. They borrow the names loosely from neuroscience but you do not have to care about the analogy to use the split. The split itself is what matters: at a glance you can tell which layer a given paper is about, and which layer your own system has not built yet.
Implicit memory: what the weights already know
This is everything the model learned in pretraining and fine-tuning. The research the survey points at is interesting (transformer feed-forward layers behave a lot like key-value stores, knowledge neurons can be localised, memorisation scales differently from reasoning) but the practical takeaway is short: you cannot change this layer at runtime. What it does decide is how much of the next two layers you actually need. A model that already knows your domain needs less retrieval. A model that does not, needs more.
Explicit memory: the stuff you bolt on outside
RAG. Vector stores. Knowledge graphs. Long-context stuffing. This is the layer almost everyone calls "memory" in production, which is where the trouble starts. It is read-mostly, queried per request, and shared across users. It is good at grounding answers in current facts. It is not an agent's memory of a conversation. It has no concept of who asked what, when, or what got decided.
A rule that has saved me from a lot of design arguments: use this layer when the question is "what does the world know about X?". Do not use it when the question is "what did this specific user just tell me?". Those are different problems with different storage shapes.
Agentic memory: the layer most teams skip
This is the one that goes missing. The survey covers it in some depth: single-agent vs multi-agent, short-term vs long-term, the architecture of ingestion → storage → retrieval → invocation, and how you evaluate it. The simplest framing I have found that survives a code review:
- A short-term scratchpad that lives inside the session. What the user said, which tools got called, what was decided. Most teams approximate this with raw prompt history. Fine for a prototype. In production you want it instrumented so you can see what fell out of the window and why.
- A long-term store keyed per user or per workspace. What this user prefers, what they tried last week, what they explicitly asked you to remember. This is its own database. It is not your RAG corpus and it should not share infrastructure with it.
- A consolidation step. Something that, every so often, promotes things from short-term to long-term and summarises old turns out of the live context. Without it, sessions grow unbounded and the agent ends up "forgetting" things it actually saw, just because they aged off the prefix.
I have seen teams build any one of these and call the problem solved. The leverage is in building all three.
Why the taxonomy is worth holding in your head
Most of the "the agent feels dumb" reports I get from clients are layer-assignment problems in disguise. Agent does not know a product fact? Implicit plus explicit. Fine-tune or retrieve. Agent answers correctly but cannot remember what the user just said? Short-term agentic. Your context-window management is leaking. Agent treats every session as if it is meeting the user for the first time? Long-term agentic. There is no per-user store, and adding more RAG will not produce one no matter how much money you throw at it.
A related thing worth pulling out: the survey's coverage of memory editing and unlearning is the right thing to read before any team commits to fine-tuning facts into the weights. That is the most expensive memory layer you can pick and the hardest to roll back. The survey walks the gradient of cheaper alternatives (knowledge editing, retrieval-augmented updates, agentic memory writes) which is usually where the answer lives.
Where to start reading
If you have thirty minutes, three things, in this order. The repo README itself, for the taxonomy and the neuroscience anchors. The "Transformer Feed-Forward Layers Are Key-Value Memories" paper (2012.14913) which will recalibrate how you think about weights. And the agent-memory survey at 2404.13501, which reads as a practitioner companion to the BigAI synthesis and is the one I keep open on the side when I am whiteboarding the layer-three design.
The reason most production agents in 2026 feel shallow is not that the models are weak. It is that engineering teams imported the RAG pattern from 2023, called it memory, and stopped there. The BigAI survey gives the missing layer a name and a literature. Open your system architecture diagram and look for the box that is not there yet. That is usually the work.
Reference: https://github.com/bigai-nlco/Awesome-AI-Memory. TMLR paper https://openreview.net/forum?id=Sk7pwmLuAY. arXiv https://arxiv.org/abs/2601.09113.
Agentic AI patterns, delivered Thursdays
What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.