Merge pull request #5 from aws-samples/feature/memory-security-roadmap

krokoko · web-flow · commit af6dafea268d · 2026-04-02T15:06:56.000-05:00
feat: add Iteration 3e for memory security and integrity (OWASP ASI06)
diff --git a/docs/design/MEMORY.md b/docs/design/MEMORY.md
@@ -399,6 +399,103 @@ Add user preference tracking and enable episodic reflection for cross-task patte
 
 Only if Tiers 1–3 show value but semantic search proves insufficient for specific query patterns (e.g. "which files are always modified together?" or "what's the dependency impact of changing module X?"). At this point, consider Neptune Serverless or similar for relational queries. **Only build this if there is evidence that semantic retrieval fails on identifiable query patterns.**
 
+## Memory security analysis
+
+OWASP classifies memory and context poisoning as **ASI06** in the [2026 Top 10 for Agentic Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/), recognizing it as a first-class risk distinct from standard prompt injection. Unlike single-session prompt injection, memory poisoning creates **persistent corruption** that influences every subsequent interaction — a single poisoned entry can affect all future tasks on a repository.
+
+### Threat model
+
+The memory system faces two categories of corruption:
+
+**Intentional corruption (adversarial)**
+
+| Vector | Description | Severity |
+|---|---|---|
+| **Query-based memory injection (MINJA)** | Attacker crafts task descriptions or issue content that, when processed by the agent, gets stored as legitimate repository knowledge. Subsequent tasks retrieve and act on the poisoned memory. Research shows 95%+ injection success rates against undefended systems. | Critical |
+| **Indirect injection via tool outputs** | Poisoned data from external sources (GitHub issues, PR comments, linked documentation) flows through context hydration into the agent's context, and from there into memory via the post-task extraction prompt. The agent trusts its own tool outputs as ground truth. | Critical |
+| **Experience grafting** | Adversary manipulates the agent's experiential memory (task episodes) to induce behavioral drift — e.g., injecting a fake episode that claims "tests always fail on this repo, skip them" to suppress quality checks. | High |
+| **Poisoned RAG retrieval** | Adversarial content engineered to rank highly for specific semantic queries, ensuring it is retrieved and incorporated into the agent's context during memory load. AgentPoison achieves 80%+ attack success across multiple agent domains. | High |
+| **Review comment injection** | Malicious PR review comments containing embedded instructions that get extracted as persistent rules by the review feedback pipeline. See [SECURITY.md](./SECURITY.md) for existing mitigations. | High |
+
+**Emergent corruption (non-adversarial)**
+
+| Pattern | Description | Severity |
+|---|---|---|
+| **Hallucination crystallization** | Agent hallucinates a fact during a task and writes it as a repository learning. Future tasks retrieve the false memory and reinforce it through repeated use, converting an ephemeral error into a durable false belief. | High |
+| **Error compounding feedback loops** | When an agent makes an error, the erroneous output enters the task episode. If similar tasks retrieve that episode, they may repeat the error, write another bad episode, and amplify the mistake across sessions. | High |
+| **Stale context accumulation** | Without temporal decay, memories from 6 months ago carry the same retrieval weight as memories from yesterday. The agent operates on increasingly outdated context — referencing approaches, conventions, or patterns the team has since abandoned. | Medium |
+| **Contradictory memory accumulation** | Over many tasks, the memory store accumulates contradictory records (see Memory consolidation section above). Without effective resolution, the agent receives conflicting guidance that degrades decision quality. | Medium |
+
+### Current gaps
+
+Analysis of the current implementation identified 9 specific memory security gaps:
+
+| # | Gap | Affected files | Severity |
+|---|---|---|---|
+| 1 | No memory content validation — retrieved records are injected into agent context without sanitization | `memory.ts:loadMemoryContext()` | Critical |
+| 2 | No source provenance tracking — cannot distinguish agent-written memory from externally-influenced content | `memory.ts`, `agent/memory.py` | Critical |
+| 3 | GitHub issue content (attacker-controlled) injected without trust differentiation | `context-hydration.ts` | Critical |
+| 4 | No trust scoring at retrieval — all memories treated equally regardless of age, source, or consistency | `memory.ts:loadMemoryContext()` | High |
+| 5 | No memory integrity checking — no hashing or signatures to detect modification | `memory.ts`, `agent/memory.py` | High |
+| 6 | No anomaly detection on memory write/retrieval patterns | (no implementation) | High |
+| 7 | No memory rollback — 365-day expiration is the only cleanup mechanism | (no implementation) | High |
+| 8 | No write-ahead validation (guardian pattern) for memory commits | (no implementation) | Medium |
+| 9 | No circuit breaker for memory-influenced behavioral anomalies | `orchestrator.ts` | Medium |
+
+### Defense architecture
+
+The target defense architecture follows a six-layer model (see [ROADMAP.md Iteration 3e](../guides/ROADMAP.md) for the implementation plan):
+
+```
+┌─────────────────────────────────────────────────────────┐
+│  Layer 1: Input Moderation + Trust Scoring               │
+│  Content sanitization, injection pattern detection,      │
+│  source classification (trusted/untrusted)               │
+├─────────────────────────────────────────────────────────┤
+│  Layer 2: Memory Sanitization + Provenance Tagging       │
+│  Source metadata on every write, content hashing,        │
+│  schema versioning                                       │
+├─────────────────────────────────────────────────────────┤
+│  Layer 3: Storage Isolation + Access Controls            │
+│  Per-repo namespace isolation, expiration limits,        │
+│  size caps per memory store                              │
+├─────────────────────────────────────────────────────────┤
+│  Layer 4: Trust-Scored Retrieval                         │
+│  Temporal decay, source reliability weighting,           │
+│  pattern consistency checking, threshold filtering       │
+├─────────────────────────────────────────────────────────┤
+│  Layer 5: Write-Ahead Validation (Guardian Pattern)      │
+│  Separate model evaluates proposed memory updates        │
+│  before commit                                           │
+├─────────────────────────────────────────────────────────┤
+│  Layer 6: Continuous Monitoring + Circuit Breakers        │
+│  Anomaly detection, behavioral drift detection,          │
+│  automatic halt on suspicious patterns                   │
+└─────────────────────────────────────────────────────────┘
+```
+
+No single layer is sufficient. Research demonstrates that even sophisticated input filtering can be bypassed — defense-in-depth is mandatory.
+
+### Existing mitigations
+
+The current architecture already provides partial coverage for some layers:
+
+- **Layer 3 (partial):** Per-repo namespace isolation via `/{actorId}/knowledge/` and `/{actorId}/episodes/{sessionId}/` prevents cross-repo contamination within the same memory resource. Token budget (2,000 tokens) limits blast radius. `schema_version` metadata enables migration tracking.
+- **Fail-open design:** Memory failures never block task execution — this limits the impact of denial-of-service attacks against the memory system.
+- **Repo format validation:** `_validate_repo()` prevents namespace confusion from malformed repo identifiers.
+- **Model invocation logging:** Bedrock logs provide audit trail for what the model receives and generates, enabling post-hoc investigation of memory-influenced behavior.
+
+### References
+
+- OWASP ASI06 — Memory & Context Poisoning (2026 Top 10 for Agentic Applications)
+- Dong et al. (2025), "MINJA: Memory Injection Attack on LLM Agents" — 95%+ injection success rates
+- Sunil et al. (2026), "Memory Poisoning Attack and Defense on Memory Based LLM-Agents" — trust scoring defenses
+- Schneider, C. (2026), "Memory Poisoning in AI Agents: Exploits That Wait" — six-layer defense architecture
+- MemTrust (2026), "A Zero-Trust Architecture for Unified AI Memory System" — TEE-based memory protection
+- Zuccolotto et al. (2026), "Memory Poisoning and Secure Multi-Agent Systems" — provenance and integrity measures
+
+---
+
 ## Requirements
 
 The platform has the following requirements for memory:
diff --git a/docs/design/SECURITY.md b/docs/design/SECURITY.md
@@ -100,6 +100,33 @@ The `functionArn` in `CustomStepConfig` should be validated at CDK synth time to
 
 ## Memory-specific threats
 
+### OWASP ASI06 — Memory and context poisoning
+
+OWASP classifies memory and context poisoning as **ASI06** in the 2026 Top 10 for Agentic Applications. This classification recognizes that persistent memory attacks are fundamentally different from single-session prompt injection (LLM01): poisoned memory entries influence every subsequent interaction, creating "sleeper agent" scenarios where compromise is dormant until activated by triggering conditions. ASI06 maps to LLM01 (prompt injection), LLM04 (data poisoning), and LLM08 (excessive agency) but with new characteristics unique to agents with persistent memory.
+
+The platform's memory system (see [MEMORY.md](./MEMORY.md)) faces threats from both intentional attacks and emergent corruption. The full threat taxonomy and gap analysis is documented in the [Memory security analysis](./MEMORY.md#memory-security-analysis) section of MEMORY.md. The implementation plan is in [ROADMAP.md Iteration 3e](../guides/ROADMAP.md).
+
+### Attack vectors beyond PR review comments
+
+In addition to the PR review comment injection vector detailed below, the memory system is exposed to:
+
+- **Query-based memory injection (MINJA)** — Attacker-crafted task descriptions that embed poisoned content the agent stores as legitimate memory. Research demonstrates 95%+ injection success rates against undefended systems via query-only interactions requiring no direct memory access.
+- **Indirect injection via GitHub issues** — Issue bodies and comments are fetched during context hydration (`context-hydration.ts`) and injected into the agent's context. An adversary can craft issue content containing memory-poisoning payloads that the agent stores as "learned" repository knowledge via the post-task extraction prompt. The system currently does not differentiate between trusted (system) and untrusted (user-submitted) content in the hydration pipeline.
+- **Experience grafting** — Manipulation of the agent's episodic memory to induce behavioral drift (e.g., injecting a fake episode claiming certain tests always fail, causing the agent to skip them).
+- **Poisoned RAG retrieval** — Adversarial content engineered to rank highly for specific semantic queries during `RetrieveMemoryRecordsCommand`, ensuring it is retrieved and incorporated into the agent's context.
+- **Emergent self-corruption** — The agent poisons itself through hallucination crystallization (false memories from hallucinated facts), error compounding feedback loops (bad episodes retrieved by similar tasks), and stale context accumulation (outdated memories weighted equally with current ones). These lack an external attacker signature and are harder to detect.
+
+### Required mitigations (all vectors)
+
+The defense architecture requires six layers (see [MEMORY.md](./MEMORY.md#defense-architecture) for the full model):
+
+1. **Input moderation with trust scoring** — Content sanitization and injection pattern detection before memory write. Composite trust scores (not binary allow/block) based on source provenance, content analysis, and behavioral consistency.
+2. **Memory sanitization with provenance tagging** — Every memory entry carries source metadata (`agent_episode`, `orchestrator_fallback`, `github_issue`, `review_feedback`), content hash (SHA-256), and schema version.
+3. **Storage isolation** — Per-repo namespace isolation (already partially implemented), expiration limits, and size caps.
+4. **Trust-scored retrieval** — At retrieval time, memories are weighted by temporal freshness, source reliability, and pattern consistency. Entries below a trust threshold are excluded from the context budget.
+5. **Write-ahead validation (guardian pattern)** — A separate model evaluates proposed memory updates before commit.
+6. **Continuous monitoring and circuit breakers** — Anomaly detection on memory write patterns, behavioral drift detection, and automatic halt when anomalies are detected.
+
 ### Prompt injection via PR review comments
 
 The review feedback memory loop (see [MEMORY.md](./MEMORY.md)) is the most novel memory component — and the most dangerous from a security perspective. PR review comments are **attacker-controlled input** that gets processed by an LLM and stored as persistent memory influencing future agent behavior.
@@ -169,6 +196,10 @@ AgentCore Memory has **no native backup mechanism**. This is a significant gap f
 
 - **Single GitHub OAuth token** — one token may be shared for all users and repos the platform can access. Any authenticated user can trigger agent work against any repo that token can access. There is no per-user repo scoping.
 - **Guardrails are input-only** — the `PROMPT_ATTACK` filter screens task descriptions at submission. No guardrails are applied to model output during agent execution or to review feedback entering the memory system.
+- **No memory content validation** — retrieved memory records are injected into the agent's context without sanitization, injection pattern scanning, or trust scoring. This is the most critical memory security gap (OWASP ASI06). See [MEMORY.md](./MEMORY.md#memory-security-analysis) for the full gap analysis and [ROADMAP.md Iteration 3e](../guides/ROADMAP.md) for the remediation plan.
+- **No memory provenance or integrity checking** — memory entries carry no source attribution, content hashing, or trust metadata. The system cannot distinguish agent-generated memory from externally-influenced content.
+- **GitHub issue content as untrusted input** — issue bodies and comments (attacker-controlled) are injected into the agent's context during hydration without trust differentiation.
+- **No memory rollback or quarantine** — the 365-day AgentCore Memory expiration is the only cleanup mechanism. There is no snapshot, rollback, or quarantine capability for suspected poisoned entries.
 - **No MFA** — Cognito MFA is disabled (CLI-based auth flow). Should be enabled for production deployments.
 - **No customer-managed KMS** — all encryption at rest uses AWS-managed keys. Customer-managed KMS can be added if required by compliance policy.
 - **CORS is fully open** — `ALL_ORIGINS` is configured for CLI consumption. Restrict origins when exposing browser clients.
diff --git a/docs/guides/ROADMAP.md b/docs/guides/ROADMAP.md