You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/design/MEMORY.md
+97Lines changed: 97 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -399,6 +399,103 @@ Add user preference tracking and enable episodic reflection for cross-task patte
399
399
400
400
Only if Tiers 1–3 show value but semantic search proves insufficient for specific query patterns (e.g. "which files are always modified together?" or "what's the dependency impact of changing module X?"). At this point, consider Neptune Serverless or similar for relational queries. **Only build this if there is evidence that semantic retrieval fails on identifiable query patterns.**
401
401
402
+
## Memory security analysis
403
+
404
+
OWASP classifies memory and context poisoning as **ASI06** in the [2026 Top 10 for Agentic Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/), recognizing it as a first-class risk distinct from standard prompt injection. Unlike single-session prompt injection, memory poisoning creates **persistent corruption** that influences every subsequent interaction — a single poisoned entry can affect all future tasks on a repository.
405
+
406
+
### Threat model
407
+
408
+
The memory system faces two categories of corruption:
409
+
410
+
**Intentional corruption (adversarial)**
411
+
412
+
| Vector | Description | Severity |
413
+
|---|---|---|
414
+
|**Query-based memory injection (MINJA)**| Attacker crafts task descriptions or issue content that, when processed by the agent, gets stored as legitimate repository knowledge. Subsequent tasks retrieve and act on the poisoned memory. Research shows 95%+ injection success rates against undefended systems. | Critical |
415
+
|**Indirect injection via tool outputs**| Poisoned data from external sources (GitHub issues, PR comments, linked documentation) flows through context hydration into the agent's context, and from there into memory via the post-task extraction prompt. The agent trusts its own tool outputs as ground truth. | Critical |
416
+
|**Experience grafting**| Adversary manipulates the agent's experiential memory (task episodes) to induce behavioral drift — e.g., injecting a fake episode that claims "tests always fail on this repo, skip them" to suppress quality checks. | High |
417
+
|**Poisoned RAG retrieval**| Adversarial content engineered to rank highly for specific semantic queries, ensuring it is retrieved and incorporated into the agent's context during memory load. AgentPoison achieves 80%+ attack success across multiple agent domains. | High |
418
+
|**Review comment injection**| Malicious PR review comments containing embedded instructions that get extracted as persistent rules by the review feedback pipeline. See [SECURITY.md](./SECURITY.md) for existing mitigations. | High |
419
+
420
+
**Emergent corruption (non-adversarial)**
421
+
422
+
| Pattern | Description | Severity |
423
+
|---|---|---|
424
+
|**Hallucination crystallization**| Agent hallucinates a fact during a task and writes it as a repository learning. Future tasks retrieve the false memory and reinforce it through repeated use, converting an ephemeral error into a durable false belief. | High |
425
+
|**Error compounding feedback loops**| When an agent makes an error, the erroneous output enters the task episode. If similar tasks retrieve that episode, they may repeat the error, write another bad episode, and amplify the mistake across sessions. | High |
426
+
|**Stale context accumulation**| Without temporal decay, memories from 6 months ago carry the same retrieval weight as memories from yesterday. The agent operates on increasingly outdated context — referencing approaches, conventions, or patterns the team has since abandoned. | Medium |
427
+
|**Contradictory memory accumulation**| Over many tasks, the memory store accumulates contradictory records (see Memory consolidation section above). Without effective resolution, the agent receives conflicting guidance that degrades decision quality. | Medium |
428
+
429
+
### Current gaps
430
+
431
+
Analysis of the current implementation identified 9 specific memory security gaps:
432
+
433
+
| # | Gap | Affected files | Severity |
434
+
|---|---|---|---|
435
+
| 1 | No memory content validation — retrieved records are injected into agent context without sanitization |`memory.ts:loadMemoryContext()`| Critical |
436
+
| 2 | No source provenance tracking — cannot distinguish agent-written memory from externally-influenced content |`memory.ts`, `agent/memory.py`| Critical |
No single layer is sufficient. Research demonstrates that even sophisticated input filtering can be bypassed — defense-in-depth is mandatory.
478
+
479
+
### Existing mitigations
480
+
481
+
The current architecture already provides partial coverage for some layers:
482
+
483
+
-**Layer 3 (partial):** Per-repo namespace isolation via `/{actorId}/knowledge/` and `/{actorId}/episodes/{sessionId}/` prevents cross-repo contamination within the same memory resource. Token budget (2,000 tokens) limits blast radius. `schema_version` metadata enables migration tracking.
484
+
-**Fail-open design:** Memory failures never block task execution — this limits the impact of denial-of-service attacks against the memory system.
485
+
-**Repo format validation:**`_validate_repo()` prevents namespace confusion from malformed repo identifiers.
486
+
-**Model invocation logging:** Bedrock logs provide audit trail for what the model receives and generates, enabling post-hoc investigation of memory-influenced behavior.
487
+
488
+
### References
489
+
490
+
- OWASP ASI06 — Memory & Context Poisoning (2026 Top 10 for Agentic Applications)
491
+
- Dong et al. (2025), "MINJA: Memory Injection Attack on LLM Agents" — 95%+ injection success rates
492
+
- Sunil et al. (2026), "Memory Poisoning Attack and Defense on Memory Based LLM-Agents" — trust scoring defenses
493
+
- Schneider, C. (2026), "Memory Poisoning in AI Agents: Exploits That Wait" — six-layer defense architecture
494
+
- MemTrust (2026), "A Zero-Trust Architecture for Unified AI Memory System" — TEE-based memory protection
495
+
- Zuccolotto et al. (2026), "Memory Poisoning and Secure Multi-Agent Systems" — provenance and integrity measures
496
+
497
+
---
498
+
402
499
## Requirements
403
500
404
501
The platform has the following requirements for memory:
Copy file name to clipboardExpand all lines: docs/design/SECURITY.md
+31Lines changed: 31 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -100,6 +100,33 @@ The `functionArn` in `CustomStepConfig` should be validated at CDK synth time to
100
100
101
101
## Memory-specific threats
102
102
103
+
### OWASP ASI06 — Memory and context poisoning
104
+
105
+
OWASP classifies memory and context poisoning as **ASI06** in the 2026 Top 10 for Agentic Applications. This classification recognizes that persistent memory attacks are fundamentally different from single-session prompt injection (LLM01): poisoned memory entries influence every subsequent interaction, creating "sleeper agent" scenarios where compromise is dormant until activated by triggering conditions. ASI06 maps to LLM01 (prompt injection), LLM04 (data poisoning), and LLM08 (excessive agency) but with new characteristics unique to agents with persistent memory.
106
+
107
+
The platform's memory system (see [MEMORY.md](./MEMORY.md)) faces threats from both intentional attacks and emergent corruption. The full threat taxonomy and gap analysis is documented in the [Memory security analysis](./MEMORY.md#memory-security-analysis) section of MEMORY.md. The implementation plan is in [ROADMAP.md Iteration 3e](../guides/ROADMAP.md).
108
+
109
+
### Attack vectors beyond PR review comments
110
+
111
+
In addition to the PR review comment injection vector detailed below, the memory system is exposed to:
112
+
113
+
-**Query-based memory injection (MINJA)** — Attacker-crafted task descriptions that embed poisoned content the agent stores as legitimate memory. Research demonstrates 95%+ injection success rates against undefended systems via query-only interactions requiring no direct memory access.
114
+
-**Indirect injection via GitHub issues** — Issue bodies and comments are fetched during context hydration (`context-hydration.ts`) and injected into the agent's context. An adversary can craft issue content containing memory-poisoning payloads that the agent stores as "learned" repository knowledge via the post-task extraction prompt. The system currently does not differentiate between trusted (system) and untrusted (user-submitted) content in the hydration pipeline.
115
+
-**Experience grafting** — Manipulation of the agent's episodic memory to induce behavioral drift (e.g., injecting a fake episode claiming certain tests always fail, causing the agent to skip them).
116
+
-**Poisoned RAG retrieval** — Adversarial content engineered to rank highly for specific semantic queries during `RetrieveMemoryRecordsCommand`, ensuring it is retrieved and incorporated into the agent's context.
117
+
-**Emergent self-corruption** — The agent poisons itself through hallucination crystallization (false memories from hallucinated facts), error compounding feedback loops (bad episodes retrieved by similar tasks), and stale context accumulation (outdated memories weighted equally with current ones). These lack an external attacker signature and are harder to detect.
118
+
119
+
### Required mitigations (all vectors)
120
+
121
+
The defense architecture requires six layers (see [MEMORY.md](./MEMORY.md#defense-architecture) for the full model):
122
+
123
+
1.**Input moderation with trust scoring** — Content sanitization and injection pattern detection before memory write. Composite trust scores (not binary allow/block) based on source provenance, content analysis, and behavioral consistency.
124
+
2.**Memory sanitization with provenance tagging** — Every memory entry carries source metadata (`agent_episode`, `orchestrator_fallback`, `github_issue`, `review_feedback`), content hash (SHA-256), and schema version.
4.**Trust-scored retrieval** — At retrieval time, memories are weighted by temporal freshness, source reliability, and pattern consistency. Entries below a trust threshold are excluded from the context budget.
127
+
5.**Write-ahead validation (guardian pattern)** — A separate model evaluates proposed memory updates before commit.
128
+
6.**Continuous monitoring and circuit breakers** — Anomaly detection on memory write patterns, behavioral drift detection, and automatic halt when anomalies are detected.
129
+
103
130
### Prompt injection via PR review comments
104
131
105
132
The review feedback memory loop (see [MEMORY.md](./MEMORY.md)) is the most novel memory component — and the most dangerous from a security perspective. PR review comments are **attacker-controlled input** that gets processed by an LLM and stored as persistent memory influencing future agent behavior.
@@ -169,6 +196,10 @@ AgentCore Memory has **no native backup mechanism**. This is a significant gap f
169
196
170
197
-**Single GitHub OAuth token** — one token may be shared for all users and repos the platform can access. Any authenticated user can trigger agent work against any repo that token can access. There is no per-user repo scoping.
171
198
-**Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system.
199
+
-**No memory content validation** — retrieved memory records are injected into the agent's context without sanitization, injection pattern scanning, or trust scoring. This is the most critical memory security gap (OWASP ASI06). See [MEMORY.md](./MEMORY.md#memory-security-analysis) for the full gap analysis and [ROADMAP.md Iteration 3e](../guides/ROADMAP.md) for the remediation plan.
200
+
-**No memory provenance or integrity checking** — memory entries carry no source attribution, content hashing, or trust metadata. The system cannot distinguish agent-generated memory from externally-influenced content.
201
+
-**GitHub issue content as untrusted input** — issue bodies and comments (attacker-controlled) are injected into the agent's context during hydration without trust differentiation.
202
+
-**No memory rollback or quarantine** — the 365-day AgentCore Memory expiration is the only cleanup mechanism. There is no snapshot, rollback, or quarantine capability for suspected poisoned entries.
172
203
-**No MFA** — Cognito MFA is disabled (CLI-based auth flow). Should be enabled for production deployments.
173
204
-**No customer-managed KMS** — all encryption at rest uses AWS-managed keys. Customer-managed KMS can be added if required by compliance policy.
174
205
-**CORS is fully open** — `ALL_ORIGINS` is configured for CLI consumption. Restrict origins when exposing browser clients.
0 commit comments