A persistent memory system for Claude Code that extracts learnings from past sessions and injects relevant context on every prompt.
Claude Code sessions are stateless by default. Every time context compacts or you start a new session, Claude forgets:
- Solutions you already discovered together
- Gotchas and traps you identified
- Your infrastructure details and preferences
- Decisions you made and why
This leads to repeated mistakes, redundant conversations, and lost productivity.
This system gives Claude persistent memory across sessions:
- Convert your
.jsonltranscripts to readable markdown - Extract learnings using Claude sub-agents that process transcripts
- Embed learnings with a local embedding model (nomic-embed-text)
- Inject relevant memories via Claude Code hooks that fire on every prompt
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your Prompt │────►│ Hook Fires │────►│ Query Daemon │
│ │ │ (mechanical) │ │ (cosine sim) │
└─────────────────┘ └─────────────────┘ └────────┬────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Claude sees │◄────│ Inject as XML │◄────│ Top 3 memories │
│ context + mem │ │ in context │ │ (≥0.45 sim) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Node.js (for transcript conversion)
- Ollama (for local embeddings)
- Python 3.8+ (for the memory daemon)
- Claude Code CLI
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Pull the embedding model
ollama pull nomic-embed-text
# Clone this repo
git clone https://github.com/zacdcook/claude-code-semantic-memory.git
cd claude-code-semantic-memoryClaude Code stores session transcripts as .jsonl files in ~/.claude/projects/. Convert them to readable markdown:
node scripts/jsonl-to-markdown.js ~/.claude/projects/ ./converted-transcripts/This extracts user messages, assistant messages (including thinking blocks), and system prompts. Tool calls and results are stripped for cleaner extraction.
Start a new Claude Code session and use the extraction prompt:
claudeThen paste the contents of prompts/extract-learnings.md. Claude will:
- List all
.mdfiles in your converted transcripts folder - Dispatch sub-agents in parallel to process batches
- Each sub-agent extracts structured learnings
- Store learnings via the daemon's
/storeendpoint - Output to
~/extracted-learnings.jsonl
cd daemon
pip install -r requirements.txt
python server.pyThe daemon runs on port 8741 and provides:
POST /store- Embed and store a learningPOST /recall- Query for relevant memoriesGET /health- Health check
python scripts/import-learnings.py ~/extracted-learnings.jsonlCopy all hooks to your Claude Code hooks directory:
# Session initialization
cp hooks/session-start.sh ~/.claude/hooks/SessionStart.sh
# Memory injection on prompts
cp hooks/user-prompt-submit.sh ~/.claude/hooks/UserPromptSubmit.sh
# Memory injection during iteration
cp hooks/pre-tool-use.sh ~/.claude/hooks/PreToolUse.sh
# Auto-export on compaction
cp hooks/pre-compact.sh ~/.claude/hooks/PreCompact.sh
# Make executable
chmod +x ~/.claude/hooks/*.shNow:
- Every prompt automatically queries memory and injects relevant learnings
- During iteration, Claude's thinking is analyzed for additional relevant memories
- When context compacts, the transcript is exported and a sub-agent extracts learnings
SESSION START
════════════
┌─────────────────┐
│ SessionStart │ → Check daemon health
│ │ → Warn about orphaned transcripts
└────────┬────────┘
│
ACTIVE WORK (repeats for each user message)
════════════
▼
┌─────────────────┐
│UserPromptSubmit │ → Embed user's prompt
│ │ → Query daemon /recall
│ │ → Inject top 3 memories
└────────┬────────┘
│
▼ (fires before EACH tool)
┌─────────────────┐
│ PreToolUse │ → Extract Claude's thinking
│ │ → Query for new relevant memories
│ │ → Inject if thinking has drifted
└────────┬────────┘
│
CONTEXT COMPACTION (when context window fills)
════════════════════
▼
┌─────────────────┐
│ PreCompact │ → Export transcript to disk
│ │ → Convert JSONL to markdown
│ │ → Output sub-agent dispatch instructions
└────────┬────────┘
│
▼
┌─────────────────┐
│ Sub-Agent │ → Read exported transcript
│ (Task tool) │ → Extract learnings
│ │ → Store via daemon /store endpoint
└─────────────────┘
When context compacts, the PreCompact hook exports the transcript and outputs instructions for Claude to dispatch a sub-agent. The sub-agent:
- Reads the exported markdown transcript
- Extracts learnings (solutions, gotchas, patterns, etc.)
- Stores each learning via the daemon's
/storeendpoint
This keeps everything within Claude Code - no external API calls needed. The sub-agent uses the Task tool to run without blocking.
| Type | Description | Example |
|---|---|---|
WORKING_SOLUTION |
Confirmed working commands/patterns | "Use Import-Clixml for PowerShell credentials over Tailscale" |
GOTCHA |
Traps and counterintuitive behaviors | "Git Bash strips $ variables before PowerShell sees them" |
PATTERN |
Recurring architectural decisions | "Check HOSTNAME in hooks to determine daemon URL" |
DECISION |
Explicit design choices with reasoning | "Using nomic-embed-text for 8K token context" |
FAILURE |
What didn't work and why | "SSH-based PS remoting fails due to TTY requirements" |
PREFERENCE |
User's stated preferences | "Query memory before asking clarifying questions" |
| Decision | Why |
|---|---|
| Hooks over CLAUDE.md | Hooks fire deterministically; CLAUDE.md instructions are suggestions Claude may skip under cognitive load |
| nomic-embed-text over MiniLM | 8K token context vs 256 tokens — MiniLM truncates 75% of longer conversation turns |
| 0.45 similarity threshold | Permissive enough to catch semantically related content, not so low it floods with noise |
| Sub-agent extraction | Uses Claude Code's own capabilities; no external API keys or local LLMs needed |
CREATE TABLE learnings (
id INTEGER PRIMARY KEY,
type TEXT NOT NULL,
content TEXT NOT NULL,
context TEXT,
embedding BLOB NOT NULL,
confidence REAL DEFAULT 0.9,
session_source TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE INDEX idx_learnings_type ON learnings(type);claude-code-semantic-memory/
├── README.md
├── scripts/
│ ├── jsonl-to-markdown.js # Convert transcripts
│ └── import-learnings.py # Import JSONL to database
├── prompts/
│ └── extract-learnings.md # Prompt for sub-agent extraction
├── daemon/
│ ├── server.py # Flask API server
│ ├── requirements.txt
│ └── config.json # Similarity thresholds, model config
├── hooks/
│ ├── session-start.sh # Check daemon, warn orphans
│ ├── user-prompt-submit.sh # Memory injection on prompts
│ ├── pre-tool-use.sh # Memory injection during iteration
│ └── pre-compact.sh # Auto-export and dispatch sub-agent
└── examples/
└── sample-learnings.jsonl # Example output format
Edit daemon/config.json:
{
"embeddingModel": "nomic-embed-text",
"minSimilarity": 0.45,
"maxResults": 3,
"duplicateThreshold": 0.92,
"timeoutMs": 2500,
"port": 8741
}| Parameter | Default | Description |
|---|---|---|
embeddingModel |
nomic-embed-text |
Ollama model for embeddings (768 dimensions) |
minSimilarity |
0.45 |
Minimum cosine similarity to return a memory |
maxResults |
3 |
Maximum memories to inject per query |
duplicateThreshold |
0.92 |
Similarity threshold for deduplication |
timeoutMs |
2500 |
Max time to wait for embedding |
port |
8741 |
Daemon port |
If you run Claude Code on a laptop but want embeddings on a GPU desktop:
- Run the daemon on your desktop
- Connect both machines via Tailscale (or any VPN)
- Set
CLAUDE_DAEMON_HOSTenvironment variable on your laptop:
export CLAUDE_DAEMON_HOST=100.95.72.101 # Desktop's Tailscale IPThe hook will query the remote daemon instead of localhost.
- Check hook is executable:
chmod +x ~/.claude/hooks/*.sh - Verify daemon is running:
curl http://localhost:8741/health
- Check similarity threshold isn't too high
- Verify learnings were imported:
curl http://localhost:8741/stats - Try a more specific query
- Ensure Ollama is using GPU:
ollama psshould show CUDA - Reduce batch sizes if running out of memory
- Ollama — Local embedding infrastructure
- nomic-embed-text — Embedding model with 8K context
MIT