Agents need memory to maintain coherence across interactions and to accumulate knowledge over time. How you manage memory directly affects an agent’s ability to handle complex, multi-step tasks without losing track of earlier reasoning or user intent.
Short-Term Memory: Conversation Context
The most immediate form of memory is the conversation history that gets passed to the model with each request. This includes the user’s messages, the agent’s responses, tool calls, and tool results. Because it lives entirely within the context window, it is fast and reliable --- but it is also ephemeral and bounded by the model’s maximum token limit.
For multi-turn interactions, keep the conversation history clean. Avoid letting tool outputs with large payloads accumulate unchecked, as they consume context budget that could be better used for reasoning. Summarize or truncate verbose results when the full detail is no longer needed.
Long-Term Memory: Persistent Storage
For knowledge that must survive beyond a single conversation --- user preferences, project context, previously completed research --- you need external persistence. Common approaches include:
- Key-value stores for structured facts (user settings, configuration, entity metadata).
- Vector databases for semantic retrieval of past interactions, documents, or accumulated knowledge. This is the foundation of retrieval-augmented generation (RAG).
- File-based memory where the agent reads and writes to files that persist between sessions, often used in coding agents that maintain project context.
The choice depends on how the memory will be accessed. If the agent needs to look up specific facts, structured storage works well. If it needs to find contextually relevant information from a large corpus, vector search is more appropriate.
Working Memory Patterns
Working memory refers to the information an agent actively holds and manipulates while processing a task. Several patterns help manage it:
- Scratchpads. Give the agent a dedicated space to write intermediate reasoning, plans, or partial results. This offloads cognitive work from the model’s implicit reasoning into explicit, inspectable text.
- State objects. Maintain a structured object (often JSON) that tracks the current state of a workflow --- what has been done, what remains, and any intermediate results. Pass this state into each step.
- Context summarization. Periodically summarize the conversation so far and replace the full history with the summary, freeing up context space for new reasoning while retaining the essential thread.
Managing the Context Window
Context window management is a practical engineering concern. When the window fills up, the agent either loses access to earlier information or fails outright. Strategies to manage this include truncating old messages, compressing tool outputs, prioritizing the most relevant context via retrieval, and splitting long tasks into sub-tasks that each operate within a fresh context. The goal is to ensure the agent always has the information it needs for its current reasoning step, even if it cannot hold the entire history at once.