Does a 1M token context window remove the need for agent memory?

No. A large window solves how much the agent can read in one session, not what it remembers between sessions. The window starts empty every time. Persistent memory is about carrying structured state across sessions so the agent does not reset to day one each morning.

What is Cortex and how does it relate to MCP?

Cortex is an open-source (MIT) persistent memory engine for Claude Code. It runs locally over MCP stdio on PostgreSQL with pgvector and exposes 49 MCP tools, so the agent stores and recalls memory the same way it calls any other tool. Reference: https://github.com/cdeust/Cortex.

Is a vector store over chat history the same as a memory engine?

Not quite. A vector store gives you retrieval, but without consolidation it just appends raw history and drifts into noise over time. A memory engine adds capture as structured memory and consolidation (merging, decay, promotion), which is what keeps recall useful as the project grows.

What are the security risks of adding a memory layer to a coding agent?

It is another component running with your privileges, reading your code, and holding a database connection. Treat it as part of your agent supply chain: install from trusted sources, pin and patch versions (Cortex shipped an RCE fix on May 27, 2026, so stay past v3.17.2), and scope the database and file access you grant.

Persistent Memory for Coding Agents (Claude Code)

In this post (6 sections)

In this post

Every Claude Code session starts the same way for me. I open the project, and the agent knows nothing. Not the architecture we settled on last week, not the three approaches we tried and threw away, not the reason we are on Postgres instead of the thing the agent will, helpfully, suggest again in about four minutes. So I re-explain it. Again. That re-explaining is the actual cost of working with a coding agent day to day, and almost nobody puts it on the invoice.

We spend a lot of words on context windows and token budgets. The 1M context era was supposed to make this go away. It did not, because the problem was never that the window is too small. The problem is that the window is empty at the start of every session. A bigger empty room is still empty. What you want is for the agent to walk in already knowing what it knew yesterday.

The real tax is re-establishing context, not tokens

Think about what a senior engineer carries between days on a project. Not the full source. They carry a compressed model: the shape of the system, the decisions that are load-bearing, the dead ends, the one file everyone is scared to touch and why. That carried state is what makes the second week faster than the first. A coding agent resets to week one every single morning.

You can paper over it. A long-lived CLAUDE.md, a hand-maintained notes file, pasting yesterday's decisions back in. I have done all of these and they all rot. The notes file goes stale because updating it is manual and boring. The pasted context is partial because you only paste what you remember to. The honest fix is to make memory a system the agent maintains, not a document you maintain for the agent. That is the line between a prompt trick and an architecture.

What persistent agent memory actually means

I drew the broader map of this in the three paradigms of LLM memory: implicit (in the weights), explicit (in the prompt), and agentic (the model decides what to store and recall through tools). Persistent memory for a coding agent is the agentic paradigm made durable. A real memory layer has to do three jobs, and most "memory" features only do the middle one.

Capture. Record what happened in a session as structured memory, not a transcript dump. What was decided, what was tried, what broke, what the files mean.
Consolidate. Merge new memories with old ones, resolve contradictions, decay what stopped mattering, and promote what keeps coming up. This is the step bolt-on vector stores skip, and skipping it is why naive RAG-over-history degrades into noise.
Retrieve. Surface the right slice at the right moment, scoped to the project and the task, without flooding the context with everything it ever saw.

If a tool only does retrieval, it is a search box over your history. Useful, but it will drift. The capture and consolidation steps are what turn a pile of logs into something that behaves like carried knowledge. If you want the one-glance version of which memory type fits which job, I sketched it in a visual note on the four memory types.

Cortex: a working implementation worth studying

The cleanest open-source example I have looked at recently is Cortex by Clement Deust. It calls itself a persistent memory engine for Claude Code built on computational neuroscience, which sounds like a stretch until you see what it is actually borrowing: consolidation cycles that run in the background like sleep, decay so stale memories fade, pattern separation so similar-but-distinct events do not collapse into one. It is MIT licensed, and it runs entirely on your machine over MCP stdio, with PostgreSQL plus the pgvector extension as the store. Nothing leaves the box.

The part that makes it concrete for builders is the interface. Cortex exposes 49 MCP tools, so the agent reads and writes memory the same way it calls any other tool. That is the right shape. Memory is not a magic context injection, it is a set of capabilities the model invokes: store this decision, recall what we know about this module, link these two discussions. The v3.17 line added autonomous wiki curation, a headless Claude agent that rewrites per-project documentation every few hours from the memory graph, so the docs track reality instead of decaying the moment they are written.

I am not telling you to adopt it in production tomorrow. It is a young project with one primary author, and a neuroscience-flavored memory engine is a lot of moving parts to take on trust. But as a reference for how to think about agent memory, the design choices are unusually well argued, and the local-first, MCP-native, Postgres-backed shape is exactly what I would reach for if I were building this for a client who cannot send code off-premise.

How to evaluate a memory layer for your own agents

If you are weighing a memory layer, do not start from the feature list. Start from where your context actually lives and how much of it can leave your network. Then pressure-test the three jobs above. This is the comparison I run.

Choosing a memory approach for a coding agent

Dimension	No memory (CLAUDE.md)	Vector store over history	Persistent memory engine
Capture	Manual, rots fast	Auto, raw transcript	Auto, structured
Consolidation	None	None, just appends	Background cycles, decay
Retrieval scoping	Whole file every time	Top-k similarity	Project and task scoped
Data residency	Local	Often a hosted vector DB	Local (pgvector) if you choose it
Interface	Static file	One search call	Tools the agent composes
Failure mode	Goes stale	Drifts into noise	Complexity to operate

The right answer is not always the rightmost column. For a solo side project, a disciplined CLAUDE.md is fine and a memory engine is overkill. For a team shipping a long-lived system where the cost of the agent re-proposing a rejected design is real, the structured engine pays for its operational weight. Measure it the way you would measure RAG: run a recall benchmark on your own decisions and see how often the agent surfaces the right prior context unprompted.

The security caveat nobody puts in the README headline

A memory engine is one more thing running with your privileges, reading your project, and holding a Postgres connection. That is another layer of the agent supply chain, which is exactly the surface that got hit this month.

I wrote the full version of this argument in your agent's supply chain is the attack surface now. Memory engines, skills, MCP servers, and extensions are all things that execute with privileges you granted. A memory layer is worth adopting. It is not worth adopting blind.

Where I would and would not use this today

Would: a multi-week build with a stable team, where re-establishing context daily is a measurable drag and the code must stay on-premise.
Would: as a study reference for designing your own memory layer, even if you do not adopt the tool. The capture, consolidate, retrieve split is the part to steal.
Would not: a quick one-off script or a throwaway prototype. The memory has nothing to consolidate and the operational cost is pure overhead.
Would not: anywhere you cannot run and patch a local Postgres responsibly, given the security note above.

The thing I keep coming back to is that this is the same lesson as tools. Agents are not broken because the model is weak, they are broken because the scaffolding around the model is thin. Memory is scaffolding. Get it right and the agent stops feeling like a brilliant intern with no long-term memory, and starts feeling like someone who has been on the project as long as you have.

Your coding agent has amnesia. Persistent memory is the fix.

The real tax is re-establishing context, not tokens

What persistent agent memory actually means

Cortex: a working implementation worth studying

How to evaluate a memory layer for your own agents

The security caveat nobody puts in the README headline

Where I would and would not use this today

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Your coding agent has amnesia. Persistent memory is the fix.

The real tax is re-establishing context, not tokens

What persistent agent memory actually means

Cortex: a working implementation worth studying

How to evaluate a memory layer for your own agents

The security caveat nobody puts in the README headline

Where I would and would not use this today

Agentic AI patterns, delivered Thursdays

Questions readers ask about this post

Read next

Code agents vs skill agents: when to give an agent the keyboard and when to give it the toolbox

AI agent vs agentic AI: what the distinction actually means when you ship one

Gemini 3.5 Flash vs Sonnet 4.6: should you re-route your agent stack?