Replies: 15 comments 13 replies
-
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the hard work jif! |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
I'd love memories to take into account the git remote of the project folders, while I know I should be using worktrees, I sometimes end up making an additional checkout and don't want it thinking, say |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
I am also developing an external memory system for Codex. Just a story:
https://github.com/hack-ink/elf
What are your thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
|
Hi, I've been using this. But I noticed that it doesn't save memories from exec sessions, only interactive. This is inverse of what I prefer for my own workflow needs. So I added the option to change sources: #13147 Let me know what you think. Would be great to have this added to the main release. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @jif-oai, I’ve been experimenting with persistent conversational memory systems for a while, so this direction in Codex is really interesting to see. Here are my thoughts on your questions. 1. Should Codex cite previous threads when using memories? I’d rate this 4/5. Citing memories can be very helpful when debugging or understanding why the system behaves in a certain way. However, for normal interaction it might become noisy if every response references older threads. A good default might be: • silent retrieval by default 2. Autonomy vs manual triggering A hybrid approach seems best. Automatic memory creation works well for long-running workflows, but users should still have some control to prevent noise or unnecessary storage. For example: • automatic summarisation of important interactions 3. Project memory vs global memory Both seem important. I would structure memory in layers: • project memory – codebase structure, architecture decisions, conventions 4. Sanitising credentials Sanitising credentials is absolutely necessary. However it might also be useful if Codex can remember that credentials exist for a service without storing the secret itself. Example: • “project uses AWS credentials” This preserves workflow awareness while keeping secrets safe. I’ve also been experimenting with a memory-first conversational architecture in my own system. The idea is to separate reasoning from memory handling. Originally I started building this system around 2023 with smaller ~2B models and external memory. Over time the central model grew to ~27B parameters, but interestingly the conversational style and personality remained consistent because they are largely shaped by the memory layer rather than the raw model weights. The architecture roughly looks like this: user request In practice this means: • the model focuses on reasoning and dialogue One interesting observation from these experiments is that increasing model size improved the model’s ability to interpret retrieved memories, but the overall conversational behaviour remained stable because the memory layer carried the long-term context. That’s why I find the direction of durable memories in Codex particularly exciting. If implemented well, it could significantly improve long-horizon coding workflows. Thanks for working on this feature — I’m very curious to see where it goes. After the model generates the final answer, this agent processes the interaction and converts it into structured memory. Instead of storing the full conversation, it creates a semantic summary of the exchange: • what the user asked This summary is then stored in the memory layer. The idea is to avoid memory inflation while still preserving useful knowledge. The system remembers the meaning of the interaction, not the entire dialogue. Over time this creates a compact but semantically rich memory graph that improves retrieval for future queries. In practice the loop looks like this: user request This approach keeps the memory layer small and focused while continuously improving retrieval quality for future interactions. |
Beta Was this translation helpful? Give feedback.
-
|
One final observation from these experiments. A similar architectural pattern is already starting to appear in at least one large cloud AI system. I won’t name the platform here to avoid turning this into promotion, but it shows that this direction can work at scale. What seems to matter most is the separation of roles inside the system. Users are often not just looking for a task assistant — they want a consistent conversational partner that remembers context over time. That does not necessarily require running a massive flagship model constantly. A more efficient structure could look like this: • a lightweight conversational model (for everyday dialogue) In that setup the conversational layer can remain stable while the underlying models evolve. The memory layer preserves continuity, so the system does not “start from zero” each time the model is updated or replaced. This also makes the system more resource-efficient, since large models are only used when necessary. From my perspective this kind of memory-first architecture with agent orchestration could be a natural direction for future conversational systems. |
Beta Was this translation helpful? Give feedback.
-
|
One more practical observation from these experiments. Another advantage of this architecture is computational efficiency. If the system grows through memory rather than only through larger model weights, much smaller models can handle most everyday interactions. This reduces server cost significantly. In my experiments the structure looks like: • lightweight conversational model for dialogue In this setup the system “learns” through memory accumulation rather than constantly requiring larger models. I’ve also been experimenting with development workflows where the memory layer tracks: • previous code versions This helps the system reason about code evolution and reduces hallucinations when modifying existing code. Another area where persistent memory seems promising is educational systems. I’m currently experimenting with a tutoring system for children where the model can remember a student’s progress, mistakes, and learning patterns over time. Early results suggest this can both improve learning continuity and reduce computational cost. If this direction is interesting to others working on Codex or long-horizon coding systems, I’d be very curious to hear your thoughts. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @talshebek, will read and review this Another global question for everyone. Would you like memories to be enabled by default everywhere or not in |
Beta Was this translation helpful? Give feedback.
-
|
Memory in coding agents is more nuanced than in conversational agents because code-level context has stronger dependencies than natural language. Key differences for coding agent memory: Symbol-level memory, not just session-level: A coding agent should remember: which functions were called, which variables were declared, which interfaces were implemented. This is more structured than "the user mentioned X." Symbol tables are a natural format for coding agent memory — they're already how compilers think about code. Cross-file dependency tracking: When Agent A modifies function Ephemeral vs. persistent memory for code: Some code knowledge should persist long-term (the architecture of this codebase, the team's style conventions). Some should be ephemeral (the current state of a feature branch, in-progress refactoring). Coding agents need to distinguish between "stable knowledge about the codebase" and "volatile state of the current task." Memory consolidation after task completion: When a coding session ends successfully, the agent should consolidate what it learned: new patterns discovered, bugs found and fixed, conventions understood. This "post-task reflection" is how the agent builds up codebase expertise over time rather than starting cold each session. More on the persistent memory architecture: https://blog.kinthai.ai/why-character-ai-forgets-you-persistent-memory-architecture |
Beta Was this translation helpful? Give feedback.
-
|
Quick takes on your questions:
One related note for design surface: I just filed #20138 proposing a session-scoped notes panel — explicitly not a substitute for memories, but a different slot in the context-surface space. Memories as you're framing them are cross-thread, model-curated, and (currently) read-only per #19195. The notes proposal is single-session, user-curated (with an agent-shared sub-region), and writable from both sides. If both ship, they'd be complementary: memories carry forward across sessions, notes pin intent within one. Worth thinking about whether the two interact (e.g. should something written to notes during a session be promotable to a memory at session close?). |
Beta Was this translation helpful? Give feedback.
-
|
Responses to your questions, based on extensive use of Claude Code (which uses CLAUDE.md as its "memory" mechanism) and building rule sets for many projects: 1. Citation visibility: 4/5. Knowing which memory contributed to a decision matters when debugging incorrect behavior. If memory fires incorrectly, you need to know which one to edit or delete. 2. Autonomy vs control: Hybrid, project-level explicit. Auto-generation of global memories makes sense for user preferences. For project-level memories, I would prefer manual confirmation — the cost of a wrong project rule persisting silently is high. 3. Per-project is far more valuable than global for coding. The reason: the most important "memories" for coding are project conventions — which patterns are allowed, which are banned, what testing setup is used, which API versions are in use. These are different per project and do NOT generalize across projects. A memory saying "use This is exactly what AGENTS.md solves as a persistent per-project context file. The "memory" is explicit and editable by the developer rather than learned from session history. 4. On sanitising: Version-pinning is the most important sanitisation. Memories about API patterns go stale as libraries upgrade. Memories should include the version they were written against. One practical observation from CLAUDE.md experience: explicit rule memories outperform inferred ones. A rule written as "NEVER use X because Y" (with reason) has higher compliance than a memory inferred from correction history, because the reason lets the model apply it correctly to edge cases. We have been publishing free per-stack rule files that represent what "ideal project memories" look like in practice: https://gist.github.com/oliviacraft |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey folks,
I'm working on adding memories into Codex and I would love you opinion on:
Of course I know everyone wants A and control to do B for everything but here I would like to understand what would you prefer by default?
Any other needs?
Disclaimer: Do not try to use the memories for now as the rate limits would consume all your tokens
Beta Was this translation helpful? Give feedback.
All reactions