| layout | default | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| title | Chapter 4: Memory, Skills, and the Learning Loop | ||||||||
| nav_order | 4 | ||||||||
| parent | Hermes Agent Tutorial | ||||||||
| format_version | v2 | ||||||||
| why | Memory and skills are what separate Hermes from a stateless chatbot wrapper. This chapter explains the three-layer memory architecture, how the agent autonomously decides when to write and improve memories, and how the closed learning loop turns real conversations into self-improving procedural knowledge. | ||||||||
| mental_model | Think of Hermes's memory as three filing systems working in parallel: a searchable archive of past sessions (episodic), a set of declarative fact sheets about you and your world (semantic), and a library of how-to guides the agent has written for itself (procedural). The agent reads all three on every turn and writes to them whenever it judges it worthwhile. | ||||||||
| learning_outcomes |
|
||||||||
| snapshot |
|
||||||||
| chapter_map |
|
||||||||
| sources |
A personal AI agent faces a fundamental tension: it needs to remember enormous amounts of context to be genuinely useful, but LLM context windows are finite and expensive. Naively keeping everything in the prompt doesn't scale. Throwing everything away between sessions makes the agent forget-prone and frustrating.
Hermes resolves this tension with a three-layer memory architecture that stores different types of knowledge in the form best suited to their retrieval patterns:
- Episodic memory is searchable by content — you retrieve it when you need to recall what happened
- Semantic memory is always-available facts — injected into every prompt because they're always relevant
- Procedural memory is skills — injected selectively when the agent recognizes a task it has proceduralized
The result is an agent that uses its context window efficiently: only the most relevant episodic fragments, only the applicable skills, but always the core user model and preferences.
graph TD
subgraph Episodic["Episodic Memory (sessions.db)"]
E1[Session summaries — FTS5 SQLite]
E2[LLM-generated 200-500 word summaries]
E3[Written at session end by memory_manager.py]
E4[Retrieved by context_engine.py via FTS5 search]
end
subgraph Semantic["Semantic Memory (MEMORY.md / USER.md)"]
S1[MEMORY.md — long-term facts about work/projects]
S2[USER.md — user model: style, expertise, goals]
S3[Written autonomously by memory_manager.py nudges]
S4[Injected into EVERY prompt via prompt_builder.py]
S5[Honcho dialectic updates USER.md continuously]
end
subgraph Procedural["Procedural Memory (SKILL.md files)"]
P1[SKILL.md files in ~/.hermes/skills/]
P2[Created by agent when complex task is completed]
P3[Self-improved by agent when used and found lacking]
P4[Injected selectively by skill_utils.select_relevant]
P5[Shareable via agentskills.io hub]
end
Episodic --> PromptBuilder[prompt_builder.py]
Semantic --> PromptBuilder
Procedural --> PromptBuilder
PromptBuilder --> LLM[LLM Call]
At the end of every conversation (triggered by /new, graceful exit, or a configurable inactivity timeout), memory_manager.py calls the LLM to produce a compact summary of the session:
# hermes_cli/agent/memory_manager.py (session summary flow)
async def summarize_and_store(self, session: Session):
"""
Called when a session ends. Generates an LLM summary and
stores it in sessions.db for future FTS5 retrieval.
"""
summary_prompt = f"""
Summarize this conversation in 200-400 words.
Focus on: decisions made, facts established, tasks completed,
and any important context that would help recall this session.
Conversation:
{session.format_for_summary()}
"""
summary = await self.llm.complete(summary_prompt, model=self.fast_model)
# Extract topic tags for additional FTS5 signal
tags = await self.extract_tags(summary)
self.db.execute(
"""
INSERT INTO sessions_fts(session_id, summary, tags, created_at, relevance_score)
VALUES (?, ?, ?, ?, 1.0)
""",
(session.id, summary, ",".join(tags), session.created_at)
)You can query the sessions database directly:
# List recent sessions
hermes session list --limit 10
# Search sessions by content
hermes session search "ETL pipeline"
# Show a specific session summary
hermes session show session-2026-04-11-debugging
# Export all session summaries
hermes session export --format json > sessions_backup.jsonOr query the SQLite database directly:
sqlite3 ~/.hermes/sessions/sessions.db \
"SELECT session_id, substr(summary, 1, 100) FROM sessions_fts
WHERE sessions_fts MATCH 'ETL pipeline'
ORDER BY rank LIMIT 5;"CREATE VIRTUAL TABLE sessions_fts USING fts5(
session_id UNINDEXED, -- not searched, just stored
summary, -- main search field
tags, -- topic tags for boosting
created_at UNINDEXED,
relevance_score UNINDEXED,
-- FTS5 options
tokenize = "unicode61", -- handles unicode, punctuation
content = sessions_raw, -- content table for snippet generation
content_rowid = id
);
-- The raw sessions table (content store)
CREATE TABLE sessions_raw (
id INTEGER PRIMARY KEY,
session_id TEXT UNIQUE,
summary TEXT,
tags TEXT,
created_at REAL,
relevance_score REAL DEFAULT 1.0,
message_count INTEGER,
token_count INTEGER
);The most distinctive aspect of Hermes's memory system is that the agent decides autonomously when to update its own semantic memory. It doesn't ask you to save facts — it infers when a fact is worth saving and writes it without interrupting the conversation.
This is implemented via "nudge evaluation" — a lightweight classifier that runs after each assistant response:
# hermes_cli/agent/memory_manager.py (nudge evaluation)
NUDGE_TRIGGERS = [
# Pattern: agent should write to MEMORY.md
"user mentioned a new project",
"user stated a preference",
"important date or deadline mentioned",
"technology choice made",
"architecture decision recorded",
"new person mentioned with context",
]
USER_NUDGE_TRIGGERS = [
# Pattern: agent should update USER.md
"user corrected the agent's assumption about expertise",
"user communication style shift detected",
"user expressed frustration with response style",
"user demonstrated knowledge in new domain",
]
async def evaluate_nudge(
self,
user_message: str,
assistant_response: str,
recent_history: list
) -> NudgeDecision:
"""
Lightweight LLM call to determine if a memory write is warranted.
Uses a fast model (e.g., gpt-4o-mini) to keep latency low.
"""
prompt = f"""
Based on this exchange, should the agent update its long-term memory?
User: {user_message}
Assistant: {assistant_response[:500]}
Respond with JSON:
{{
"memory_write": boolean,
"user_write": boolean,
"memory_entry": "string or null",
"user_entry": "string or null",
"reasoning": "brief explanation"
}}
"""
result = await self.llm.complete_json(prompt, model=self.fast_model)
return NudgeDecision(**result)MEMORY.md is a human-readable Markdown file that the agent appends to and reorganizes:
# Long-Term Memory
## Projects
- **data-pipeline-v2**: Python ETL project, started 2026-04-11. Using Apache Airflow 2.8, PostgreSQL 15, dbt. Location: ~/projects/data-pipeline/
- **hermes-agent-fork**: Contributing to NousResearch/hermes-agent. Focus: improving FTS5 search ranking.
## Preferences
- Prefers concise explanations with working code examples over theoretical exposition
- Uses VS Code as primary editor, Neovim for quick edits
- Prefer Python type hints in all new code
- Dark mode terminals only
## Technology Context
- Primary language: Python 3.12
- Database of choice: PostgreSQL for production, SQLite for local tooling
- Infrastructure: Docker + Kubernetes, deployed to AWS EKS
## Important Dates
- Project deadline: data-pipeline-v2 demo on 2026-05-01USER.md is maintained by the Honcho dialectic system and tracks the user model:
# User Model
## Communication Style
- Technical, direct, no filler phrases
- Prefers bullet points over prose for lists
- Appreciates when agent admits uncertainty
- Dislikes over-explanation of things they clearly know
## Expertise Level
- Senior software engineer (8+ years)
- Expert: Python, data engineering, SQL
- Proficient: DevOps, Kubernetes, Terraform
- Learning: ML infrastructure, RL training pipelines
## Interaction Patterns
- Usually works in evening sessions (18:00–23:00 UTC)
- Asks focused questions; rarely needs prompting for context
- Prefers to see reasoning process for complex decisions
- Occasionally frustration with overly cautious responses
## Goals
- Build a production-grade data platform
- Contribute to open-source AI tooling
- Learn RL training pipeline designHoncho is an external service integrated with Hermes for dialectic user modeling — a technique where the user model is refined through an ongoing implicit dialogue between what the agent infers and what the user's behavior confirms or contradicts.
sequenceDiagram
participant User
participant Hermes as Hermes (hermes_cli)
participant MemMgr as memory_manager.py
participant Honcho as Honcho Service
User->>Hermes: message (showing technical expertise)
Hermes->>MemMgr: evaluate_nudge(message, response)
MemMgr->>Honcho: update_user_model(observation)
Honcho-->>MemMgr: updated_user_state
MemMgr->>MemMgr: write to USER.md if significant change
Note over Honcho: Honcho maintains a probabilistic user model
Note over Honcho: Updated after every interaction
Note over MemMgr: Only writes USER.md when change exceeds threshold
Honcho can be configured in config.yaml:
honcho:
enabled: true
endpoint: "https://api.honcho.dev/v1"
api_key: "hk-..."
sync_interval: 10 # sync USER.md every N interactions
local_fallback: true # use local heuristics if Honcho is unavailableEach skill file is a structured Markdown document with a defined schema:
---
skill_id: python_etl_patterns
created: 2026-04-11
last_improved: 2026-04-12
improvement_count: 3
trigger_phrases: ["etl", "data pipeline", "extract transform load", "airflow"]
confidence: 0.87
---
# Skill: Python ETL Patterns
## When to Use This Skill
Use this skill when the user is working on data extraction, transformation,
or loading tasks in Python.
## Pattern: Idempotent Airflow DAG
```python
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
def idempotent_etl(**context):
execution_date = context['execution_date']
# Use execution_date to ensure idempotency
process_for_date(execution_date)
dag = DAG(
'etl_pipeline',
start_date=datetime(2026, 1, 1),
schedule_interval='@daily',
catchup=False, # Important: prevent historical backfill
max_active_runs=1 # Prevent concurrent runs
)- Not handling idempotency — always parameterize by execution date
- Missing error handling on external API calls — use try/except + retries
- Missing logging — use Airflow's built-in task logging
- Not testing DAGs locally — use
airflow dags testbefore deployment
- v1: Basic DAG template
- v2: Added idempotency pattern after user hit duplicate data bug
- v3: Added local testing guidance after user asked about testing
### Autonomous Skill Creation
The agent creates a new skill file when it detects it has provided a complex, reusable workflow. This is triggered by `skill_utils.py`'s post-response analysis:
```python
# hermes_cli/agent/skill_utils.py (skill creation flow)
async def evaluate_skill_creation(
self,
user_message: str,
assistant_response: str,
response_length: int,
tool_calls_made: list
) -> SkillCreationDecision:
"""
Decide if this interaction warrants creating a new SKILL.md.
Heuristics:
- Response contained a reusable code pattern
- Multiple tool calls were orchestrated in a non-obvious way
- User asked about a general pattern (not a one-off)
- Response is long (>500 tokens) and structured
"""
if response_length < 200:
return SkillCreationDecision(create=False)
# Check against existing skills to avoid duplication
similar = self.find_similar_skill(user_message, threshold=0.8)
if similar:
# Improve existing skill instead of creating new one
return SkillCreationDecision(
create=False,
improve=similar,
improvement_type="extend"
)
# Ask LLM to evaluate and optionally generate the skill
decision = await self.llm.complete_json(
self._skill_creation_prompt(user_message, assistant_response),
model=self.fast_model
)
if decision["create"]:
await self._write_skill(decision["skill_content"])
return SkillCreationDecision(**decision)
When an existing skill is used and the agent finds it lacking or outdated, it can improve the skill in-place:
# hermes_cli/agent/skill_utils.py (skill improvement)
async def improve_skill(
self,
skill: Skill,
improvement_context: str
) -> bool:
"""
Called when agent used a skill but found it needed updating.
Rewrites the skill with improvements and bumps improvement_count.
"""
improved_content = await self.llm.complete(
f"""
Improve this skill based on the following context:
Context: {improvement_context}
Current skill:
{skill.content}
Produce an improved version that:
1. Fixes any issues discovered in context
2. Adds the new pattern/insight from context
3. Updates the improvement notes section
4. Increments improvement_count in frontmatter
""",
model=self.capable_model
)
# Write atomically
tmp_path = skill.path.with_suffix('.tmp')
tmp_path.write_text(improved_content)
tmp_path.rename(skill.path)
skill.improvement_count += 1
return TrueHermes integrates with agentskills.io, a community platform for sharing SKILL.md files between users.
# Publish a skill to agentskills.io
hermes skills publish python_etl_patterns
# Output:
# Skill published: https://agentskills.io/skills/python_etl_patterns
# Stars: 0 Downloads: 0# Search for skills
hermes skills search "kubernetes deployment"
# Install a skill
hermes skills install kubernetes-deployment-patterns
# List installed skills
hermes skills list# SKILL.md frontmatter for agentskills.io submission
---
skill_id: python_etl_patterns
author: johnxie
version: "1.3.0"
tags: [python, etl, airflow, data-engineering]
description: "Battle-tested patterns for Python ETL pipelines with Airflow"
requires_tools: [file_read, shell_exec]
tested_with: [gpt-4o, claude-3-5-sonnet, hermes-3-llama-3.1-70b]
license: MIT
---flowchart LR
subgraph Input
UM[User Message]
end
subgraph Retrieval
FTS[FTS5 Search\nsessions.db]
SK[Skill Selection\nskill_utils.py]
MM[MEMORY.md read]
UM2[USER.md read]
end
subgraph Assembly
PB[prompt_builder.py]
end
subgraph LLM
API[LLM Call]
RESP[Response]
end
subgraph PostProcess["Post-Processing"]
NE[Nudge Evaluator\nmemory_manager.py]
SE[Session End\nSummarizer]
SCR[Skill Creator\nskill_utils.py]
end
subgraph Storage
SDB[(sessions.db\nFTS5)]
MEMD[MEMORY.md]
USERD[USER.md]
SKILLF[skills/*.md]
end
UM --> FTS
UM --> SK
MM --> PB
UM2 --> PB
FTS --> PB
SK --> PB
PB --> API
API --> RESP
RESP --> NE
RESP --> SCR
RESP --> SE
NE -->|fact worth saving| MEMD
NE -->|user model update| USERD
SE -->|session end| SDB
SCR -->|new reusable pattern| SKILLF
# ~/.hermes/config.yaml
memory:
episodic:
max_results: 5 # Max session fragments per prompt
recency_weight:
7_days: 2.0
30_days: 1.5
90_days: 1.0
older: 0.7
summary_model: "gpt-4o-mini" # Fast model for session summarization
semantic:
max_memory_entries: 100 # Trim MEMORY.md when it exceeds this
max_user_entries: 50 # Trim USER.md when it exceeds this
honcho_enabled: true
procedural:
skills_dir: "~/.hermes/skills"
max_skills_injected: 3
relevance_threshold: 0.15
auto_create: true # Allow autonomous skill creation
auto_improve: true # Allow autonomous skill improvement
agentskills_sync: false # Auto-sync with agentskills.io| Concept | Key Takeaway |
|---|---|
| Episodic memory | FTS5 SQLite session summaries; searched by content + recency on every prompt |
| Semantic memory | MEMORY.md (facts) + USER.md (user model); always injected, agent-written |
| Procedural memory | SKILL.md files; selectively injected by TF-IDF; autonomously created and improved |
| memory_manager.py | Evaluates after every response whether a memory nudge is warranted |
| Nudge system | Lightweight LLM classifier using fast model; runs asynchronously |
| Session summarization | LLM generates 200-500 word summary at session end; stored in FTS5 |
| Honcho dialectic | External service for probabilistic user modeling; falls back to local heuristics |
| Skill creation | Triggered when agent provides a complex reusable pattern |
| Skill improvement | Agent rewrites skill file when it finds the current version lacking |
| agentskills.io | Community skill hub; install/publish via hermes skills commands |