All posts
Architecture Published May 29, 2026 11 min

Your coding agent has amnesia. Persistent memory is the fix.

Claude Code forgets your architecture, your decisions, and why you ruled things out the moment a session ends. The reliability tax is not tokens, it is re-establishing context every morning. Here is what persistent agent memory actually is, how an open-source engine like Cortex implements it, and how to evaluate a memory layer for your own agents.

Jigar JoshiJigar JoshiAgentic AI Architect and Consultant
In this post (6 sections)

Every Claude Code session starts the same way for me. I open the project, and the agent knows nothing. Not the architecture we settled on last week, not the three approaches we tried and threw away, not the reason we are on Postgres instead of the thing the agent will, helpfully, suggest again in about four minutes. So I re-explain it. Again. That re-explaining is the actual cost of working with a coding agent day to day, and almost nobody puts it on the invoice.

We spend a lot of words on context windows and token budgets. The 1M context era was supposed to make this go away. It did not, because the problem was never that the window is too small. The problem is that the window is empty at the start of every session. A bigger empty room is still empty. What you want is for the agent to walk in already knowing what it knew yesterday.

The real tax is re-establishing context, not tokens

Think about what a senior engineer carries between days on a project. Not the full source. They carry a compressed model: the shape of the system, the decisions that are load-bearing, the dead ends, the one file everyone is scared to touch and why. That carried state is what makes the second week faster than the first. A coding agent resets to week one every single morning.

You can paper over it. A long-lived CLAUDE.md, a hand-maintained notes file, pasting yesterday's decisions back in. I have done all of these and they all rot. The notes file goes stale because updating it is manual and boring. The pasted context is partial because you only paste what you remember to. The honest fix is to make memory a system the agent maintains, not a document you maintain for the agent. That is the line between a prompt trick and an architecture.

What persistent agent memory actually means

I drew the broader map of this in the three paradigms of LLM memory: implicit (in the weights), explicit (in the prompt), and agentic (the model decides what to store and recall through tools). Persistent memory for a coding agent is the agentic paradigm made durable. A real memory layer has to do three jobs, and most "memory" features only do the middle one.

  • Capture. Record what happened in a session as structured memory, not a transcript dump. What was decided, what was tried, what broke, what the files mean.
  • Consolidate. Merge new memories with old ones, resolve contradictions, decay what stopped mattering, and promote what keeps coming up. This is the step bolt-on vector stores skip, and skipping it is why naive RAG-over-history degrades into noise.
  • Retrieve. Surface the right slice at the right moment, scoped to the project and the task, without flooding the context with everything it ever saw.

If a tool only does retrieval, it is a search box over your history. Useful, but it will drift. The capture and consolidation steps are what turn a pile of logs into something that behaves like carried knowledge. If you want the one-glance version of which memory type fits which job, I sketched it in a visual note on the four memory types.

Cortex: a working implementation worth studying

The cleanest open-source example I have looked at recently is Cortex by Clement Deust. It calls itself a persistent memory engine for Claude Code built on computational neuroscience, which sounds like a stretch until you see what it is actually borrowing: consolidation cycles that run in the background like sleep, decay so stale memories fade, pattern separation so similar-but-distinct events do not collapse into one. It is MIT licensed, and it runs entirely on your machine over MCP stdio, with PostgreSQL plus the pgvector extension as the store. Nothing leaves the box.

The part that makes it concrete for builders is the interface. Cortex exposes 49 MCP tools, so the agent reads and writes memory the same way it calls any other tool. That is the right shape. Memory is not a magic context injection, it is a set of capabilities the model invokes: store this decision, recall what we know about this module, link these two discussions. The v3.17 line added autonomous wiki curation, a headless Claude agent that rewrites per-project documentation every few hours from the memory graph, so the docs track reality instead of decaying the moment they are written.

I am not telling you to adopt it in production tomorrow. It is a young project with one primary author, and a neuroscience-flavored memory engine is a lot of moving parts to take on trust. But as a reference for how to think about agent memory, the design choices are unusually well argued, and the local-first, MCP-native, Postgres-backed shape is exactly what I would reach for if I were building this for a client who cannot send code off-premise.

How to evaluate a memory layer for your own agents

If you are weighing a memory layer, do not start from the feature list. Start from where your context actually lives and how much of it can leave your network. Then pressure-test the three jobs above. This is the comparison I run.

Choosing a memory approach for a coding agent
DimensionNo memory (CLAUDE.md)Vector store over historyPersistent memory engine
CaptureManual, rots fastAuto, raw transcriptAuto, structured
ConsolidationNoneNone, just appendsBackground cycles, decay
Retrieval scopingWhole file every timeTop-k similarityProject and task scoped
Data residencyLocalOften a hosted vector DBLocal (pgvector) if you choose it
InterfaceStatic fileOne search callTools the agent composes
Failure modeGoes staleDrifts into noiseComplexity to operate

The right answer is not always the rightmost column. For a solo side project, a disciplined CLAUDE.md is fine and a memory engine is overkill. For a team shipping a long-lived system where the cost of the agent re-proposing a rejected design is real, the structured engine pays for its operational weight. Measure it the way you would measure RAG: run a recall benchmark on your own decisions and see how often the agent surfaces the right prior context unprompted.

The security caveat nobody puts in the README headline

A memory engine is one more thing running with your privileges, reading your project, and holding a Postgres connection. That is another layer of the agent supply chain, which is exactly the surface that got hit this month.

I wrote the full version of this argument in your agent's supply chain is the attack surface now. Memory engines, skills, MCP servers, and extensions are all things that execute with privileges you granted. A memory layer is worth adopting. It is not worth adopting blind.

Where I would and would not use this today

  • Would: a multi-week build with a stable team, where re-establishing context daily is a measurable drag and the code must stay on-premise.
  • Would: as a study reference for designing your own memory layer, even if you do not adopt the tool. The capture, consolidate, retrieve split is the part to steal.
  • Would not: a quick one-off script or a throwaway prototype. The memory has nothing to consolidate and the operational cost is pure overhead.
  • Would not: anywhere you cannot run and patch a local Postgres responsibly, given the security note above.

The thing I keep coming back to is that this is the same lesson as tools. Agents are not broken because the model is weak, they are broken because the scaffolding around the model is thin. Memory is scaffolding. Get it right and the agent stops feeling like a brilliant intern with no long-term memory, and starts feeling like someone who has been on the project as long as you have.

The weekly take

Agentic AI patterns, delivered Thursdays

What I am shipping, watching, and pruning out of client stacks each week. One email. No fluff.

Shipping an agentic AI project this quarter?
Book a 30-min consult
Frequently asked

Questions readers ask about this post

Share this post
LinkedIn Facebook