Memories in Codex #12567

jif-oai · 2026-02-23T10:51:21Z

jif-oai
Feb 23, 2026
Maintainer

Hey folks,

I'm working on adding memories into Codex and I would love you opinion on:

How much, from 1 (useless) to 5 (mandatory) would you need to see Codex citing previous threads while giving you an answer if the model used memories from those previous threads.
Would you prefer everything to be done in the background or to have to manually tigger memory generation (i.e. autonomy VS cost control
In your personal setup, do you think memories are only relevant by project or can be relevant across different projects on your laptop?
I plan to add sanitising over memories to get rid of API keys, credentials etc. Is this problematic for your workflows or not (so Codex won't be able to remember credentials)

Of course I know everyone wants A and control to do B for everything but here I would like to understand what would you prefer by default?

Any other needs?

Disclaimer: Do not try to use the memories for now as the rate limits would consume all your tokens

Kbediako · 2026-02-23T11:05:11Z

Kbediako
Feb 23, 2026

5
Hybrid, with an option to decide between one or the other. Probably off by default with phrases or words that can trigger it(similar to skills).
Both, global and per project memories. So memory goes into whatever folder or reepo codex is invoked in.
Not a problem but would like the option to toggle per repo. Off by default with a phrase or keyword trigger.

0 replies

slobodaapl · 2026-02-23T12:53:17Z

slobodaapl
Feb 23, 2026

4
I think this should be configurable. In the Codex based IDE we're making, we would ideally only allow a specific agent to save memories given specific instructions, so I'd imagine it's just a tool just like others and can be allowed or disallowed per agent/subagent. Same with automatic retrieval of most relevant memories vs. manual tool based lookup. By default memory creation can be automatic for root agent and disabled for subagents, with automatic retrieval for both, based on segment spherical centroid similarity saturation perhaps?
Definitely per project. I think if someone has a more universal code styles or rules they use, they ought to put those in AGENTS.md instead or similar
4.I think this is mandatory, it promotes bad practices otherwise

Thanks for the hard work jif!

1 reply

jif-oai Feb 23, 2026
Maintainer Author

This is already only for the main agent. My current implementation is already mostly in line with this. Thanks, I'll find a way to make this nicely configurable

SG for the other points!

maheshrijal · 2026-02-23T13:05:01Z

maheshrijal
Feb 23, 2026

5 Otherwise, It would be bad if codex refers something incorrect? I also like to tend to set all the reasoning and other toggles primarily because I want full visibliity of what the agent is doing.
I would prefer automaticmostly, but it would be good if there is a toggle to switch between this.
I have a small group of projects that are common and could use common memories, but I think they would make more sense for my workflow if they are project specific.
Wouldn't want my API key in memory. So this is a must have.

0 replies

ticoAg · 2026-02-23T13:46:17Z

ticoAg
Feb 23, 2026

4 (Highly needed). Seeing citations is important because memories are often specific to a certain series of tasks. If applied blindly to a different task, they might not perform well. A classic example is a monorepo where context varies wildly between frontend tasks, backend tasks, documentation writing, and debugging. Citations help me verify if the model is using the correct context.
By default, I prefer not having it automatic in the background (manual trigger). It would be ideal to have configurable memory generation strategies to control the activation and intensity (e.g., Never, Normal/Moderate, Aggressive).
To elaborate on automation: If it is done automatically, it would be much more helpful if the system could do session-level summaries and classification with a time-decay factor to filter which memories to use. Alternatively, it could do background repo-level automatic summarization, grouping memories by topic (e.g., categorizing them into frontend, backend, database, page orchestration/routing, UI styles, service configurations, etc.).
In my personal practice, memories are only relevant per project. For general or cross-project knowledge, I usually just summarize them into an agents.md file or define them as global skills.
fine

0 replies

lattwood · 2026-02-23T16:44:31Z

lattwood
Feb 23, 2026

I'd love memories to take into account the git remote of the project folders, while I know I should be using worktrees, I sometimes end up making an additional checkout and don't want it thinking, say infrastructure and infrastructure20 are different repos.

0 replies

GeorgeWingg · 2026-02-24T04:49:30Z

GeorgeWingg
Feb 24, 2026

2/5 Low priority. I mostly only care if the memory leads to an outdated or wrong answer. I prefer response level references rather than inline to avoid cluttering the response (eg “used 3 memories”). For me an easy way to inspect/manage memories outside the main response is more important (like a /memories or a ui under personalization in the app).
Prefer autonomy. I like a clear split between explicit repo guidance and learned memory. AGENTS.md should stay the manual contract for how a repo should be worked on. Memories should be the autonomous layer that improves agent productivity without having to maintain docs. If an update happens, a response level note like “3 memories updated” feels right. Background generation by default, plus a simple toggle/slider for cost/control.
Project memories as the default, global memories opt in. I think there is a fair bit of value for global memories, mainly for workflow level patterns across repos. If the user wants global for other uses beyond workflow, they can tell the agent to add that guidance to global memory, and that should inform how it should use its memories in future. If global is on, the need for good inspection/attribution gets higher.
this is the right approach, agreed.

0 replies

aurexav · 2026-02-24T08:15:58Z

aurexav
Feb 24, 2026

I am also developing an external memory system for Codex.

Just a story:

Current context compact works really really good. I rarely use the /new command now. I prefer completing more tasks in a single chat. However, for very lengthy conversations, this can lead to a loss of focus and the agent may lose its objectives. This is a real-world observation.

https://github.com/hack-ink/elf

The ELF system includes a time-to-live (TTL) mechanism for rewriting or discarding old or unnecessary memories. 5, I hope it can always citing the most correct/relevant mem.
I share this concern. Personally, I favour autonomy. We must think this deeply, whether this is a memory system for the agent or the user. Ideally, the agent should be able to remember/recall its needs, which may require prompt design. Additionally, user-triggered support is desirable. However, I believe the future lies in fully-automated systems.
ELF employs a "Multi-tenant scope semantics" design, enabling memory sharing across projects or other levels.
ELF is not a memory-based plugin. It can persist data to disk. With Qdrant and PostgreSQL, it can utilise bm25 and embedding fusion to recall memory facts.

What are your thoughts on this?

1 reply

aurexav Feb 24, 2026

I did some research today and thought I’d introduce the extension to ELF. For example, a codex conversion or thread ID could be a valid evidence source.

Or it can ref sqlite.

https://github.com/hack-ink/ELF/issues/76#issuecomment-3952704995

winnal · 2026-02-28T23:17:38Z

winnal
Feb 28, 2026

Hi, I've been using this. But I noticed that it doesn't save memories from exec sessions, only interactive. This is inverse of what I prefer for my own workflow needs. So I added the option to change sources: #13147

Let me know what you think. Would be great to have this added to the main release.

0 replies

talshebek · 2026-03-04T18:31:05Z

talshebek
Mar 4, 2026

Hi @jif-oai,

I’ve been experimenting with persistent conversational memory systems for a while, so this direction in Codex is really interesting to see.

Here are my thoughts on your questions.

1. Should Codex cite previous threads when using memories?

I’d rate this 4/5.

Citing memories can be very helpful when debugging or understanding why the system behaves in a certain way. However, for normal interaction it might become noisy if every response references older threads.

A good default might be:

• silent retrieval by default
• optional citation when debugging or when explicitly requested

2. Autonomy vs manual triggering

A hybrid approach seems best.

Automatic memory creation works well for long-running workflows, but users should still have some control to prevent noise or unnecessary storage.

For example:

• automatic summarisation of important interactions
• manual commands for pinning or removing memories

3. Project memory vs global memory

Both seem important.

I would structure memory in layers:

• project memory – codebase structure, architecture decisions, conventions
• global memory – user preferences, workflows, frequently used patterns

4. Sanitising credentials

Sanitising credentials is absolutely necessary.

However it might also be useful if Codex can remember that credentials exist for a service without storing the secret itself.

Example:

• “project uses AWS credentials”
• “deployment requires API key”

This preserves workflow awareness while keeping secrets safe.

I’ve also been experimenting with a memory-first conversational architecture in my own system.

The idea is to separate reasoning from memory handling.

Originally I started building this system around 2023 with smaller ~2B models and external memory. Over time the central model grew to ~27B parameters, but interestingly the conversational style and personality remained consistent because they are largely shaped by the memory layer rather than the raw model weights.

The architecture roughly looks like this:

user request
→ request goes to a memory search layer
→ a memory agent retrieves relevant past interactions
→ additional agents analyse the request and build contextual memory summaries
→ the agents assemble a structured prompt containing:
• retrieved memories
• current question
• contextual summary
→ the central conversational model generates the final response

In practice this means:

• the model focuses on reasoning and dialogue
• memory agents handle context reconstruction
• long-term continuity is preserved across sessions
• hallucinations decrease because answers are grounded in stored interaction history

One interesting observation from these experiments is that increasing model size improved the model’s ability to interpret retrieved memories, but the overall conversational behaviour remained stable because the memory layer carried the long-term context.

That’s why I find the direction of durable memories in Codex particularly exciting. If implemented well, it could significantly improve long-horizon coding workflows.

Thanks for working on this feature — I’m very curious to see where it goes.
One additional component that turned out to be very useful in my experiments is what I call a memory writer (or scribe) agent.

After the model generates the final answer, this agent processes the interaction and converts it into structured memory.

Instead of storing the full conversation, it creates a semantic summary of the exchange:

• what the user asked
• what solution or answer was produced
• which problems were resolved
• which open questions remain

This summary is then stored in the memory layer.

The idea is to avoid memory inflation while still preserving useful knowledge. The system remembers the meaning of the interaction, not the entire dialogue.

Over time this creates a compact but semantically rich memory graph that improves retrieval for future queries.

In practice the loop looks like this:

user request
→ memory retrieval agents collect relevant context
→ central model generates response
→ memory writer agent summarises the interaction
→ summary stored as structured memory

This approach keeps the memory layer small and focused while continuously improving retrieval quality for future interactions.

0 replies

talshebek · 2026-03-04T18:45:41Z

talshebek
Mar 4, 2026

One final observation from these experiments.

A similar architectural pattern is already starting to appear in at least one large cloud AI system. I won’t name the platform here to avoid turning this into promotion, but it shows that this direction can work at scale.

What seems to matter most is the separation of roles inside the system.

Users are often not just looking for a task assistant — they want a consistent conversational partner that remembers context over time.

That does not necessarily require running a massive flagship model constantly.

A more efficient structure could look like this:

• a lightweight conversational model (for everyday dialogue)
• persistent memory and agent-based retrieval
• a flagship model invoked only for complex reasoning tasks

In that setup the conversational layer can remain stable while the underlying models evolve. The memory layer preserves continuity, so the system does not “start from zero” each time the model is updated or replaced.

This also makes the system more resource-efficient, since large models are only used when necessary.

From my perspective this kind of memory-first architecture with agent orchestration could be a natural direction for future conversational systems.

0 replies

talshebek · 2026-03-04T18:51:39Z

talshebek
Mar 4, 2026

One more practical observation from these experiments.

Another advantage of this architecture is computational efficiency.

If the system grows through memory rather than only through larger model weights, much smaller models can handle most everyday interactions. This reduces server cost significantly.

In my experiments the structure looks like:

• lightweight conversational model for dialogue
• memory layer that accumulates experience over time
• agents that reconstruct context from memory
• larger models only invoked for complex reasoning tasks

In this setup the system “learns” through memory accumulation rather than constantly requiring larger models.

I’ve also been experimenting with development workflows where the memory layer tracks:

• previous code versions
• architectural decisions
• fixes applied to earlier bugs

This helps the system reason about code evolution and reduces hallucinations when modifying existing code.

Another area where persistent memory seems promising is educational systems. I’m currently experimenting with a tutoring system for children where the model can remember a student’s progress, mistakes, and learning patterns over time. Early results suggest this can both improve learning continuity and reduce computational cost.

If this direction is interesting to others working on Codex or long-horizon coding systems, I’d be very curious to hear your thoughts.

0 replies

jif-oai · 2026-03-06T11:32:13Z

jif-oai
Mar 6, 2026
Maintainer Author

Thanks @talshebek, will read and review this

Another global question for everyone. Would you like memories to be enabled by default everywhere or not in codex exec and the SDKs?

11 replies

winoros Mar 7, 2026

I think it should be enabled by default and provide a way to make a session ignore the memory.

Sometimes we'll do things like benchmarking the model. In these situations, we need a clean model without any long-term memories.

Kbediako Mar 8, 2026

I think it should be enabled by default and provide a way to make a session ignore the memory.

Sometimes we'll do things like benchmarking the model. In these situations, we need a clean model without any long-term memories.

Considering it's experimental, i assume the scope of it being default is when it's fully released and cost efficient.

slobodaapl Apr 11, 2026

I'd say enable globally for consistent behavior controlled by config, but allow CLI override for codex exec to enable/disable.

talshebek May 8, 2026

I think memory should be available everywhere, but it should be separated by scope, not treated as one global layer.

For example:

chat/personality memory for stable long-term conversation;
coding/project memory for changed files, project rules, bugs, fixes, architecture decisions, versions, and commands;
tutoring memory for learning programs, student progress, tests, weak points, and next exercises.

In my own experiments, I do not see memory as dangerous when it is structured correctly. The danger is mixing everything into one uncontrolled global memory.

A useful design would be:

global user preferences;
per-project coding memory;
per-agent or per-mode memory;
tutoring progress memory;
a way to disable memory for clean sessions, benchmarks, or special tasks.

For coding, the model can enter a separate working mode and use only project/coding memory, so it does not confuse normal chat context with technical work.

talshebek May 8, 2026

I also think a checkpoint function would be very useful.

It is always pleasant and productive when a model starts a new chat and remembers the history of communication, the current task, and where we stopped. But raw chat history contains a lot of noise.

A better design would be structured memory checkpoints: short, explicit summaries of the important state of a task.

Example:

TASK = news_analysis
TOPIC = Elon Musk vs OpenAI / lawsuit / mission vs control
FOCUS = do not choose a hero, analyze the structure of the conflict

AXIS:

Musk claims OpenAI betrayed its nonprofit mission and became a commercial machine.
OpenAI claims Musk wanted control, lost the power struggle, and now attacks a competitor.
The main mechanism: the mission of AI becomes a battlefield for power, capital, safety, and the right to define the future of AGI.

QUESTIONS:

Where is the real ethical problem?
Where is personal resentment disguised as a fight for humanity?
Where is commercial interest disguised as safety?
Can “open AI” exist in a billion-dollar race without becoming closed power?
Who is the puppet, who is the fire, and who is the structure?

This is much more useful than storing the whole conversation. It separates important continuity from conversational noise.

Memory should support explicit checkpoints for tasks, projects, coding work, tutoring progress, and research discussions.

In other words: memory should not only remember “what was said”. It should preserve “what matters now”.

kinthaiofficial · 2026-04-29T00:15:16Z

kinthaiofficial
Apr 29, 2026

Memory in coding agents is more nuanced than in conversational agents because code-level context has stronger dependencies than natural language.

Key differences for coding agent memory:

Symbol-level memory, not just session-level: A coding agent should remember: which functions were called, which variables were declared, which interfaces were implemented. This is more structured than "the user mentioned X." Symbol tables are a natural format for coding agent memory — they're already how compilers think about code.

Cross-file dependency tracking: When Agent A modifies function foo(), Agent B (working on a different file) needs to know if foo()'s signature changed. This is a memory consistency problem — the coding agent's memory needs to track "what I know about this symbol" and "when I last verified it."

Ephemeral vs. persistent memory for code: Some code knowledge should persist long-term (the architecture of this codebase, the team's style conventions). Some should be ephemeral (the current state of a feature branch, in-progress refactoring). Coding agents need to distinguish between "stable knowledge about the codebase" and "volatile state of the current task."

Memory consolidation after task completion: When a coding session ends successfully, the agent should consolidate what it learned: new patterns discovered, bugs found and fixed, conventions understood. This "post-task reflection" is how the agent builds up codebase expertise over time rather than starting cold each session.

More on the persistent memory architecture: https://blog.kinthai.ai/why-character-ai-forgets-you-persistent-memory-architecture

0 replies

jeemitsha · 2026-04-29T04:54:41Z

jeemitsha
Apr 29, 2026

Quick takes on your questions:

Citation importance: 4. When a memory shapes a response, knowing which past thread it came from is essential for trust and for catching cases where the memory got the context wrong. Without citations, debugging "why did it suggest that?" becomes very hard, especially in long-running projects.
Background vs manual: hybrid, default off. Auto-extraction is a footgun for sensitive codebases (privacy, NDA scope, half-baked decisions you'd rather not codify yet). I'd want a per-project opt-in, plus a manual /memorize style command for explicit pinning. Skill-style trigger phrases (as @Kbediako suggested) feel right for the in-between case.
Project + cross-project, with explicit scope. Per-project is the safer default; cross-project is useful for personal preferences ("I always want commit messages in this style") but not for codebase-specific stuff that would leak across orgs. Treat them as different storage layers, not one merged pool.

One related note for design surface: I just filed #20138 proposing a session-scoped notes panel — explicitly not a substitute for memories, but a different slot in the context-surface space. Memories as you're framing them are cross-thread, model-curated, and (currently) read-only per #19195. The notes proposal is single-session, user-curated (with an agent-shared sub-region), and writable from both sides.

If both ship, they'd be complementary: memories carry forward across sessions, notes pin intent within one. Worth thinking about whether the two interact (e.g. should something written to notes during a session be promotable to a memory at session close?).

0 replies

oliviacraft · 2026-05-01T21:53:28Z

oliviacraft
May 1, 2026

Responses to your questions, based on extensive use of Claude Code (which uses CLAUDE.md as its "memory" mechanism) and building rule sets for many projects:

1. Citation visibility: 4/5. Knowing which memory contributed to a decision matters when debugging incorrect behavior. If memory fires incorrectly, you need to know which one to edit or delete.

2. Autonomy vs control: Hybrid, project-level explicit. Auto-generation of global memories makes sense for user preferences. For project-level memories, I would prefer manual confirmation — the cost of a wrong project rule persisting silently is high.

3. Per-project is far more valuable than global for coding.

The reason: the most important "memories" for coding are project conventions — which patterns are allowed, which are banned, what testing setup is used, which API versions are in use. These are different per project and do NOT generalize across projects. A memory saying "use select() not session.query() in SQLAlchemy 2.0" is correct for that project and actively wrong if applied to a legacy project still on SQLAlchemy 1.x.

This is exactly what AGENTS.md solves as a persistent per-project context file. The "memory" is explicit and editable by the developer rather than learned from session history.

4. On sanitising: Version-pinning is the most important sanitisation. Memories about API patterns go stale as libraries upgrade. Memories should include the version they were written against.

One practical observation from CLAUDE.md experience: explicit rule memories outperform inferred ones. A rule written as "NEVER use X because Y" (with reason) has higher compliance than a memory inferred from correction history, because the reason lets the model apply it correctly to edge cases.

We have been publishing free per-stack rule files that represent what "ideal project memories" look like in practice: https://gist.github.com/oliviacraft

0 replies

Memories in Codex #12567

Uh oh!

Uh oh!

jif-oai Feb 23, 2026 Maintainer

Replies: 15 comments · 13 replies

Uh oh!

Uh oh!

Uh oh!

jif-oai Feb 23, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jif-oai Mar 6, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jif-oai
Feb 23, 2026
Maintainer

Replies: 15 comments 13 replies

jif-oai Feb 23, 2026
Maintainer Author

jif-oai
Mar 6, 2026
Maintainer Author