System architecture for hosting hermes-agent as a managed AgentCore runtime.
This project deploys hermes-agent (a general-purpose self-improving AI agent) on Amazon Bedrock AgentCore, following the same pattern as sample-host-openclaw-on-amazon-bedrock-agentcore.
AgentCore provides per-user Firecracker microVMs that run a containerized agent runtime. Each user gets an isolated container with its own filesystem, credentials, and session lifecycle.
┌─────────────────────────────────────────────────────────────────┐
│ User Channels │
│ Telegram Discord Slack WhatsApp Matrix Web UI DingTalk │
└──────┬────────┬───────┬──────┬────────┬───────┬────────┬────────┘
│ │ │ │ │ │ │
└────────┴───────┴──────┴────────┴───────┴────────┘
│
┌────────────▼────────────┐
│ API Gateway │
│ (HTTP API, per-channel │
│ webhook routes) │
└────────────┬─────────────┘
│
┌────────────▼────────────┐
│ Router Lambda │ ← Webhook validation
│ (Python 3.13) │ Identity resolution
│ │ Session management
│ + DynamoDB │ Async dispatch
│ (identity table) │
└────────────┬─────────────┘
│
InvokeAgentRuntime(
runtimeArn, qualifier,
runtimeSessionId,
runtimeUserId, payload)
│
┌─────────────────────────────▼─────────────────────────────────┐
│ Amazon Bedrock AgentCore │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Per-User Firecracker MicroVM │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────┐ │ │
│ │ │ Contract Server (port 8080) │ │ │
│ │ │ GET /ping → health check │ │ │
│ │ │ POST /invocations → message dispatch │ │ │
│ │ └──────────┬──────────────────────────────────┘ │ │
│ │ │ │ │
│ │ ┌────────▼────────┐ ┌──────────────────┐ │ │
│ │ │ Warm-up Agent │ │ hermes-agent │ │ │
│ │ │ (lightweight, │───►│ (full agent, │ │ │
│ │ │ fast startup) │ │ 40+ tools) │ │ │
│ │ └────────┬────────┘ └──────┬───────────┘ │ │
│ │ │ │ │ │
│ │ ┌────────▼─────────────────────▼──────────┐ │ │
│ │ │ LLM Provider Layer │ │ │
│ │ │ litellm (Bedrock ConverseStream + │ │ │
│ │ │ external API fallback) │ │ │
│ │ └─────────────────┬───────────────────────┘ │ │
│ │ │ │ │
│ │ ┌─────────────────▼───────────────────────┐ │ │
│ │ │ State Layer │ │ │
│ │ │ /mnt/workspace/.hermes/ │ │ │
│ │ │ ├── state.db (SQLite + FTS5) │ │ │
│ │ │ ├── memories/ (MEMORY.md, USER.md) │ │ │
│ │ │ ├── skills/ (learned skills) │ │ │
│ │ │ └── config.yaml │ │ │
│ │ │ │ │ │
│ │ │ Workspace Sync (↔ S3 every 5 min) │ │ │
│ │ └──────────────────────────────────────────┘ │ │
│ │ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────┘
│ │
┌────────────▼──────────┐ ┌────────────▼──────────┐
│ Amazon Bedrock │ │ S3 User Files │
│ ConverseStream API │ │ {user_ns}/ │
│ + Guardrails │ │ .hermes/state.db │
└───────────────────────┘ │ .hermes/memories/ │
│ .hermes/skills/ │
┌───────────────────────┐ └───────────────────────┘
│ EventBridge Scheduler│
│ + Cron Lambda │ ┌───────────────────────┐
│ (replaces hermes │ │ CloudWatch │
│ cron daemon) │ │ Dashboards + Alarms │
└───────────────────────┘ └───────────────────────┘
The only interface AgentCore requires from the container:
| Endpoint | Method | Request | Response |
|---|---|---|---|
/ping |
GET | — | {"status": "Healthy"} or {"status": "HealthyBusy"} |
/invocations |
POST | {action, userId, actorId, channel, message, images?} |
{response, metadata?} |
Actions:
| Action | Purpose | Handler |
|---|---|---|
chat |
User message → agent response | handle_chat() → warm-up or full agent |
warmup |
Pre-warm container (trigger lazy init) | ensure_agent() |
cron |
Scheduled task execution | handle_cron() |
status |
Container diagnostics | Return health + uptime + memory |
Health States:
Healthy— idle, ready for requestsHealthyBusy— processing a request (prevents premature idle termination)
Python + hermes-agent dependency loading takes 10-30 seconds. During this time, a lightweight agent handles messages:
Timeline:
0s Contract server starts (fast, minimal deps)
0-2s Lightweight agent ready (boto3 + basic tools)
2-30s Messages handled by lightweight agent
10-30s Full hermes-agent loaded (AIAgent class ready)
30s+ Messages handled by full hermes-agent
Lightweight agent capabilities:
- Web search, web fetch
- Memory read (from pre-loaded files)
- Simple Q&A via Bedrock direct call
- "I'm still warming up" transparency for complex tool requests
Transition:
- Contract server tracks
agent_ready: bool - Once
AIAgentis initialized, all subsequent requests go to full agent - In-flight lightweight responses are completed before switching
Replaces hermes-agent's in-process channel adapters (gateway/platforms/):
def handler(event, context):
# 1. Parse webhook (Telegram/Slack/Discord/etc.)
# 2. Validate signature/token
# 3. Return 200 immediately (async processing)
# 4. Resolve user identity (DynamoDB lookup)
# 5. Get or create AgentCore session
# 6. InvokeAgentRuntime(payload)
# 7. Send response back to channel APIWhy Lambda instead of in-container channels:
- Webhooks need fast response (Telegram: <3s, Slack: <3s)
- Container may be cold (30s startup)
- Per-user isolation means container doesn't have all users' tokens
- Lambda handles fan-out naturally
Container filesystem (/mnt/workspace)
↕ symlink
~/.hermes/
↕ periodic sync (5 min)
S3 bucket (s3://{bucket}/{user_namespace}/)
↕ restore on container init
Container filesystem (next session)
Critical writes trigger immediate S3 sync:
- Skill creation/update
- Memory file writes
- Session database after conversation end
SIGTERM handler:
- AgentCore sends SIGTERM before container termination
- Handler triggers final S3 backup of all state
- Graceful shutdown of SQLite connections
hermes-agent supports 200+ models. On AgentCore, model routing:
| Model Request | Route | Protocol |
|---|---|---|
claude-* (Anthropic) |
Bedrock ConverseStream | IAM auth, no API key |
gpt-* (OpenAI) |
NAT → OpenAI API | API key from Secrets Manager |
gemini-* (Google) |
NAT → Google API | API key from Secrets Manager |
| Local / custom | NAT → configured endpoint | API key from Secrets Manager |
Implementation: litellm as the unified proxy layer. hermes-agent already uses OpenAI-compatible format; litellm translates to Bedrock ConverseStream natively.
# Per-user STS scoped credentials
sts.assume_role(
RoleArn=execution_role_arn,
RoleSessionName=f"hermes-{user_namespace[:32]}",
Policy=json.dumps({
"Statement": [{
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": [
f"arn:aws:s3:::{bucket}/{user_namespace}/*"
]
}]
}),
DurationSeconds=3600
)
# Refresh every 45 minutesEach container can only access its own user's S3 namespace. Combined with Firecracker VM isolation, this provides defense-in-depth.
1. User sends "research quantum computing" on Telegram
2. Telegram → webhook → API Gateway → Router Lambda
3. Lambda validates bot token signature
4. Lambda queries DynamoDB: CHANNEL#telegram:12345 → USER#user_abc
5. Lambda calls InvokeAgentRuntime(sessionId="user_abc:tg:uuid-xxx", payload={...})
6. AgentCore routes to existing microVM or spins up new one
7. Contract server receives POST /invocations {action:"chat", message:"research quantum computing"}
8. Contract server sets health → HealthyBusy
9. If agent not ready → lightweight agent responds with basic search
If agent ready → AIAgent.run_conversation() executes:
a. Builds system prompt (SOUL.md + context)
b. Calls Bedrock Claude via litellm
c. Model returns tool_calls: [web_search, web_fetch]
d. Tools execute, results appended
e. Model synthesizes final response
10. Response returned to contract server
11. Contract server sets health → Healthy
12. Response returned to Lambda via InvokeAgentRuntime response
13. Lambda calls Telegram sendMessage API
14. User sees response in Telegram
| Service | Purpose | Estimated Cost (10 users/mo) |
|---|---|---|
| Bedrock AgentCore | Per-user Firecracker microVMs | $50-150 |
| Bedrock (Claude) | LLM inference via ConverseStream | $100-500 |
| Bedrock Guardrails | Content filtering, PII redaction | $5-20 |
| ECR | Container image registry (ARM64) | $1-3 |
| VPC | 2-AZ, private/public subnets, NAT | $30-45 |
| API Gateway | HTTP API for channel webhooks | $3-5 |
| Lambda | Router + Cron executor + Token monitor | $5-10 |
| DynamoDB | Identity table, session state | $5-10 |
| S3 | User workspace persistence | $1-5 |
| Secrets Manager | API keys, bot tokens (7+ secrets) | $2-5 |
| KMS | Customer-managed encryption key | $1 |
| EventBridge Scheduler | Cron job scheduling | $1-3 |
| CloudWatch | Logs, metrics, dashboards, alarms | $5-10 |
| SNS | Alarm notifications | $0-1 |
| Cognito | Web UI authentication (optional) | $0-5 |
| Total | ~$210-770/month |
| Aspect | Standalone (current) | AgentCore (target) |
|---|---|---|
| Isolation | Single process, all users | Per-user Firecracker microVM |
| Scaling | Manual (add servers) | Automatic (containers spin up/down) |
| State | Local SQLite + files | S3-backed SQLite + workspace sync |
| Channels | In-process adapters | Router Lambda + API Gateway |
| LLM | 200+ models via API keys | Bedrock primary + external via NAT |
| Cron | In-process daemon | EventBridge Scheduler + Lambda |
| Security | Config-based allowlist | IAM + STS + VPC + KMS |
| Cost model | Fixed (server running 24/7) | Pay-per-use (idle → terminated) |
| Cold start | None (always running) | 10-30s (mitigated by warm-up agent) |
| Browser | Local Playwright | AgentCore Browser API (managed Chromium) |
| Deployment | Docker / systemd / SSH | CDK + AgentCore Toolkit |
The sample uses a Node.js proxy (agentcore-proxy.js) to translate OpenAI-compatible → Bedrock ConverseStream. For hermes-agent (Python), litellm provides the same translation natively and supports fallback to non-Bedrock models. This avoids maintaining a separate proxy process.
hermes-agent's session storage uses SQLite with FTS5 full-text search. Replacing this with DynamoDB would require rewriting the entire session/search layer. Instead, we keep SQLite locally and sync to S3 — same pattern as the sample's workspace-sync.js.
hermes-agent's platform adapters (Telegram, Discord, Slack, etc.) use long-running connections (polling, WebSocket). These don't fit the per-user container model. Lambda webhook handlers are stateless and respond within Telegram's 3-second requirement.
AgentCore runs on Graviton (ARM64). This is ~20% cheaper than x86 and the sample mandates ARM64 builds. hermes-agent's Python code is architecture-agnostic; only native dependencies (SQLite, Playwright) need ARM64 builds.
Python cold start + dependency loading + S3 restore = 10-30 seconds. Without a warm-up agent, the first message in a new session would time out. The lightweight agent provides immediate responsiveness while the full agent loads in the background.