A fundamental problem with AI coding assistants is that every session starts from scratch. Last time you told it "our project uses pnpm, not npm," the next session it uses npm again.
Claude Code's Memory system solves this problem — it allows the model to remember user preferences, project context, and working patterns across sessions.
~/.claude/
└─ projects/
└─ {project-hash}/
└─ memory/
├─ MEMORY.md # Entry file (index)
├─ user_role.md # User role memory
├─ feedback_testing.md # Feedback memory
├─ project_auth.md # Project memory
└─ ... # Up to 200 files
Each memory file uses frontmatter format:
---
name: User Preference - Testing
description: User prefers using a real database instead of mocks for integration tests
type: feedback
---
Integration tests must use a real database, not mocks.
**Why:** An incident last quarter where mock tests passed but production migration failed.
**How to apply:** When writing tests, configure a connection to the test database rather than mocks.MEMORY.md is an index file, not memory itself:
- [User Role](user_role.md) — Senior backend engineer, proficient in Go and React
- [Testing Preferences](feedback_testing.md) — Integration tests use real database
- [Project Auth](project_auth.md) — Auth rewrite driven by compliance requirementsKey constraints:
- Maximum 200 lines (
MAX_ENTRYPOINT_LINES) - Maximum 25KB (
MAX_ENTRYPOINT_BYTES) - Truncated with a warning when limits are exceeded
User input: "Help me write tests"
│
├─ 1. Scan memory directory (memoryScan)
│ └─ Read frontmatter of each file (first 30 lines)
│ └─ Sort by modification time
│ └─ Up to 200 files
│
├─ 2. Build memory manifest
│ └─ "[feedback] feedback_testing.md (3 days ago): Integration tests use real database"
│ └─ "[user] user_role.md (7 days ago): Senior backend engineer"
│
├─ 3. Call Claude Sonnet to select relevant memories
│ └─ Input: user message + memory manifest + recently used tools
│ └─ Output: JSON { selected_memories: ["feedback_testing.md", ...] }
│ └─ Up to 5 selected
│
└─ 4. Read the full content of selected memory files
└─ Injected into context as AttachmentMessage
// findRelevantMemories.ts:39
async function findRelevantMemories({
query,
memoryDir,
recentlyUsedTools, // Recently used tools (suppresses API-doc-type memories)
alreadySurfacedPaths, // Memories already surfaced (avoids duplicates)
}) {
// Exclude MEMORY.md (already in system prompt)
// Exclude memories already read via FileRead
// Exclude memories already surfaced in this session
}Tool-aware filtering is a clever design — if the user just read a memory file via FileRead, there is no need to inject it again through the Memory system.
const MAX_SESSION_BYTES = 60 * 1024 // 60KB cumulative limitThe total bytes injected via Memory throughout a session must not exceed 60KB. This prevents memory accumulation from filling up the context window during long sessions.
// query.ts:301-304
using pendingMemoryPrefetch = startRelevantMemoryPrefetch(
state.messages,
state.toolUseContext,
)The using keyword ensures lifecycle management for the prefetch:
- Start: on the first loop iteration
- Consume: after tool execution, before the next API call
- Cleanup: automatically disposed when the generator exits (logging telemetry data)
Loop iteration starts
│
├─ startRelevantMemoryPrefetch() → starts asynchronously
│ └─ calls findRelevantMemories()
│ └─ records settledAt timestamp
│
├─ Four-layer context compression
├─ API call
├─ Tool execution
│
└─ getAttachmentMessages()
└─ Checks if pendingMemoryPrefetch has completed
└─ If complete → create AttachmentMessage
└─ If not complete → skip (non-blocking)
Non-blocking is key — memory retrieval typically takes 1–3 seconds (calling Sonnet), and blocking the main loop would hurt response speed. Through async prefetch, memory retrieval overlaps with model calls and tool execution.
// Logged on dispose
logEvent('tengu_memdir_prefetch_collected', {
settledAt: prefetch.settledAt,
consumedOnIteration: prefetch.consumedOnIteration,
// -1=never consumed, 0=hidden (filtered out), N=visible on iteration N
})| Type | Purpose | Example |
|---|---|---|
user |
User role and preferences | "Senior backend engineer, proficient in Go" |
feedback |
Behavioral feedback (do/don't) | "Don't mock the database" |
project |
Project context and progress | "Auth rewrite driven by compliance requirements" |
reference |
Pointers to external resources | "Bug tracking in Linear INGEST project" |
- Code patterns and architecture (read from the code)
- Git history (read from git log)
- Debugging solutions (the fix is already in the code)
- Content already present in CLAUDE.md
- Temporary information from the current session
Session in progress...
│
├─ Token/tool call threshold reached
│
├─ shouldExtractMemory() → true
│
├─ Launch background sub-agent (non-blocking)
│ └─ buildSessionMemoryUpdatePrompt()
│ └─ Agent analyzes conversation, extracts key information
│ └─ Writes/updates memory files
│
└─ markSessionMemoryInitialized()
└─ Periodic updates thereafter
Session Memory is not extracted all at once — it is incremental:
- Initialization: extracted after the first threshold is reached
- Incremental updates: tracks
lastSummarizedMessageId, processes only new messages - Periodic triggering: based on token delta and number of tool calls
~/.claude/memories/team/ # Team memory directory
├─ onboarding.md # Onboarding guide
├─ code_style.md # Code style
└─ deployment.md # Deployment process
Team Memory lives in a separate team/ directory and is shared among team members via a sync mechanism.
When Team Memory is enabled, the system prompt includes two layers of memory:
Personal memory (~/.claude/memories/)
+ Team memory (~/.claude/memories/team/)
→ Merged and injected into system prompt
The content of MEMORY.md is injected as part of the system prompt at the start of each turn:
// constants/prompts.ts:495
systemPromptSection('memory', () => loadMemoryPrompt())This is a cache-friendly operation — because MEMORY.md typically does not change during a session, this section can be reused by the prompt cache.
Additional memories found by findRelevantMemories() are dynamically injected via AttachmentMessage:
System Prompt (containing MEMORY.md index)
+ AttachmentMessage(memory: "full content of feedback_testing.md")
→ The model sees both the index and the specific memory
Claude Code's Memory system is a multi-layer, asynchronous, intelligent memory architecture:
- Storage layer: File system + frontmatter format, simple and reliable
- Retrieval layer: Sonnet-driven relevance matching, more accurate than keyword search
- Injection layer: Static (MEMORY.md) + dynamic (AttachmentMessage), tiered injection
- Extraction layer: Session Memory automatic extraction, no manual user intervention needed
- Sharing layer: Team Memory supports team-wide knowledge propagation
This system transforms Claude Code from a tool that "starts from scratch every time" into a coding partner that understands you better the more you use it.