Stop Agent Amnesia: Persistent Operational Memory for AI Agents with Brain OS, Memory Notes

Your Agent Forgot Everything Again

You are running a multi-step code-review agent. It has already fetched the PR diff, identified three files with coverage gaps, and started drafting inline comments. Then the process crashes, rate-limit error, container OOM, doesn't matter. You restart it.

The agent starts over. It fetches the diff again. It re-identifies the same three files. It drafts the same comments. Forty seconds of LLM time wasted, and if it was touching external state, posting draft comments, updating a ticket, you now have duplicates.

If you have read the recent wave of "AI agent memory" posts, you have seen one fix on repeat: capture the conversation, then retrieve it later with a vector search. That solves a real problem, but a different one. It helps an agent remember what was said. It does nothing for an agent that needs to remember where the work stands.

This is agent amnesia. It is not a model problem. The model would remember if you gave it something to remember. The problem is architectural: the agent was never told to persist its operational state between invocations.

Operational state is the short-term working memory of a running agent: what it decided, what it planned next, what it already completed, and what is currently blocking it. It is categorically different from the chat transcript that most memory discussions focus on. Transcripts tell you what was said. Operational memory tells you where the work stands.

Why Replaying the Chat Transcript Does Not Fix This

The most common patch is to replay the full message history into the context window on restart. Developers reach for this because it is easy, the transcript is already there. It does not work reliably for four reasons.

Context bloat. A long-running agent accumulates thousands of tokens of conversation. Replaying all of it on every restart pushes costs up and, past a context threshold, starts degrading the model's attention on what actually matters right now.
Implicit decisions don't survive replay. The agent may have silently decided to skip a file because it was auto-generated. That decision never appeared as an explicit message, it was a mid-step inference. It disappears on restart and the agent re-examines the file.
Ordering artifacts. Transcript replay re-introduces the full deliberation path, including dead ends the agent already abandoned. The model may second-guess conclusions it already reached correctly.
No structured query surface. You cannot efficiently ask a raw transcript: what tasks are still pending? You can ask that of a structured operational memory store instantly.

Operational memory must be written separately, explicitly, and in a structure you can read back without the model.

What Operational Memory Actually Contains

Operational memory for an agent run has a predictable shape. You do not need to invent a schema, you need to commit to one and write to it consistently.

Goal. The top-level objective this agent was invoked to accomplish. Immutable once set.
Plan. The current ordered list of steps the agent intends to take. Mutable, the agent revises it as it learns more.
Completed steps. A log of steps that have been executed successfully, with timestamps and output summaries. Append-only.
Current focus. The single step the agent is actively working on. Updated atomically when the agent moves forward.
Blockers. Conditions the agent detected that prevent forward progress. Human-readable, not just error codes.
Decisions. Explicit records of non-obvious choices: skipping file X because it matches the generated-code pattern. This is the field most teams omit and most regret missing later.
Momentum score. Optional, but useful: a simple indicator of whether the agent is making progress or spinning. A counter that increments on completion and resets on blocker is enough.

None of this is radical. It is essentially what a senior engineer writes in a work log. The difference is that for an agent, it must be machine-writable and machine-readable, not just human-readable.

Writing Operational Memory Without Slowing the Agent Down

The write path has to be cheap or agents will skip it. Two patterns work in practice.

Atomic step transitions. Every time the agent moves from one step to the next, it writes a single record: mark the previous step complete, update current focus to the next step. This is one write per transition, not one write per token. The cost is negligible.

// pseudocode, framework-agnostic
await memory.completeStep(runId, stepId, { summary: result.summary });
await memory.setFocus(runId, nextStepId);

Decision logging on branch points. When the agent evaluates a condition and chooses a path, it logs the decision before executing the branch. This is the only way to recover silent decisions on restart.

if (isAutoGenerated(file)) {
  await memory.logDecision(runId, `Skipping ${file.path}: matched auto-generated pattern`);
  continue;
}

Both writes are synchronous-feeling but can be fire-and-forget if your store is local. If you are writing to a remote store, batch the writes at transition points, not mid-step, so a write failure does not interrupt execution.

The read path is equally simple: on startup, the agent loads its operational memory record and injects a compact summary, not the raw JSON, into the system prompt. Fifty tokens describing current state beats five thousand tokens of transcript replay.

How Brain OS Structures This as a First-Class Layer

Most agent frameworks treat memory as an afterthought, a memory parameter backed by a vector store optimised for semantic search. That design retrieves past conversations; it does not track the live operational state of a running process.

Brain OS treats operational memory as a distinct layer. It is a local-first MCP server: state lives in a .brain/ folder of plain JSON inside your project, and your agent reads and writes it through typed MCP tools rather than appending freeform text. The core tools map directly onto the operational schema above:

decision_log, record a decision with its reasoning and the alternatives you rejected. decision_check scans prior decisions first and returns conflict, caution, or clear, so the agent can push back instead of silently contradicting an earlier call.
plan_set and plan_advance, set an ordered plan, then move the active step forward as work ships.
focus_get, ask "what matters now?" and get an operational judgment from urgency, blockers, and momentum, not a guess from chat history.
entity_update, update momentum, blockers, and the next move on whatever you’re working on.
pattern_detect, surface recurring blockers and avoidance loops across sessions.

Because Brain OS reads fresh from disk on every call, no in-memory cache, the same .brain/ folder is shared across every MCP client. A decision logged in Claude Code on Monday is there in Cursor on Tuesday. Nothing is keyed to a single process or a single tool.

The Restart Recovery Pattern, Step by Step

Here is the concrete loop, whether you wire it by hand or let Brain OS handle it.

On session start, load the state. Brain OS’s /focus command (or the focus_get tool) reads the .brain/ folder and surfaces the active plan step, open blockers, and what matters next, as a compact summary, not a raw transcript replay.
Resume from the active step. plan_read returns the active step and overall progress. The agent re-validates that step rather than restarting from step one.
Treat a half-finished step as a retry. If the previous session crashed mid-step, the active step is still open. Log the retry with decision_log so the reasoning survives.
Advance explicitly. When a step ships, plan_advance marks it complete and surfaces the next one. Completed records stay, useful for debugging and for the next agent that opens the project.
Close with /wrap. It captures what changed, decisions logged, patterns observed, plan steps advanced, as a restart-ready handoff. The next session opens with /focus already knowing where things stand.

This loop eliminates duplicated work on restart and gives you an audit trail of every run without parsing conversation logs.

Three Things to Add to Your Agent This Week

If your agents are currently stateless between runs, here is a practical prioritised list.

Priority 1: Add a run ID and a step completion log. Even a JSON file on disk is better than nothing. The goal is to stop re-executing completed work on restart. This is one day of work and eliminates the most painful class of agent bug immediately.
Priority 2: Add explicit decision logging at every branch point. This is mostly a discipline change, you already have the branch points, you just need to add a write call before each one. The payoff is that future you can read the log and understand why the agent did what it did without replaying the entire conversation.
Priority 3: Build a compact context serialiser for system prompt injection. A function that takes your operational context object and returns a fifty-token summary string. This is the piece that makes restarts seamless. It does not need to be smart, a deterministic template is better than an LLM-generated summary for this use case because it is fast, cheap, and predictable.

None of these require a new framework. They require treating operational memory as a real engineering concern, not an edge case.

If you want this out of the box, Brain OS implements the whole loop as a local-first MCP server, install it with npx brain-os init, connect it to Claude Code or any MCP client, then run /focus to open a session and /wrap to close one.

I'm building Brain OS around this idea. It's an open-source MCP server for operational state, decisions, plans, blockers, focus, momentum, that lives in a .brain/ folder in your project and works across every MCP client.

If you use Claude Code, Cursor, Zed, Copilot, or any MCP workflow, I'm looking for people to try it for a week and tell me honestly whether it helped.

Try Brain OS on one real project →

Have an example, disagreement, or field note? Start a discussion on GitHub or join the Discord.