All notes
Architecture May 29, 2026 2 min
Part ofAgentic AIClaude APIAI ObservabilityAI Engineering

Wrong memory. Dead agent.

Four memory types. Four use cases. Pick wrong and your agent forgets, hallucinates, or costs 10x.

Wrong memory choice kills more agents in production than wrong model choice. Context window, RAG, CAG, and a database each solve a different job. Use one for everything and you get slow retrieval, hallucinated answers, runaway token costs, and agents that forget the moment a session ends. A one-glance decision rule for matching the memory type to the problem.

Tags#Memory#RAG#CAG#AgenticAI#ContextEngineering#VectorDatabase#PromptCaching#AIEngineering

Key takeaways

  • 1Wrong memory choice kills more agents in production than wrong model choice. There are four memory types and four use cases. Pick wrong and the agent forgets, hallucinates, or costs 10x what it should.
  • 2Short-term memory is the context window: session memory that is free, disposable, and gone the moment the session ends. Use it for the current conversation, recent tool results, and the active task. If it only matters for this turn, keep it in context.
  • 3Long-term vector memory (RAG) retrieves only what is relevant from a large, changing corpus like company docs or support history. Use it when the corpus is large and changes often. Skip it for a small, stable policy that fits in context, where it only adds latency.
  • 4Cached context (CAG) keeps stable knowledge in the prompt cache, up to 90 percent cheaper than uncached input tokens. Use it when the agent needs all of it every time and it fits in context. Avoid it when the data changes frequently or is too large to cache.
  • 5Structured memory is a database, for facts the agent reads and writes: order status, ticket state, user preferences. If the agent needs to write, use Postgres, MongoDB, or Redis, not a vector store. Semantic search on a primary key is a bug, not a feature.
  • 6The decision rule is one question per piece of state. Retrieve from a large corpus means RAG. Need everything every time means CAG. Need to write a fact means database. Only matters this session means context.
  • 7The common failure is using one memory type for everything: RAG everywhere, context-window-only, or a vector DB for rows that belong in a table. That is what produces slow retrieval, hallucinations, and runaway token costs.
  • 8Audit your agent today. For every piece of state, decide which of the four it belongs to and move the misplaced ones. Latency and cost both improve.

Get the visual notes by email

New agentic AI notes and breakdowns, plus what I am shipping for clients — one email on Thursdays.