feat: add configurable content size limits for DoS mitigation (RFC-014) (#123)

rolandpg · web-flow · commit 8f1a06265e4e · 2026-04-25T13:31:31.000-05:00
Implements threat D-01 from THREAT_MODEL.md: configurable
max_content_length on remember() prevents large content from
exhausting memory or blocking the enrichment queue.

Changes:
- Add LimitsConfig dataclass with max_content_length (50 MB default)
  and recall_timeout_seconds (30s, value stored for future use)
- Add limits field to GovernanceConfig with nested YAML support
- Add content length check in GovernanceValidator.validate_remember()
- Wire limits_config from config into GovernanceValidator in
  MemoryManager
- Add ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH env override
- Add 6 unit tests: within bounds, exceeded, zero=disabled,
  None=no check, LimitsConfig defaults, error message content
- Update config.default.yaml with governance.limits section
- Update THREAT_MODEL.md: D-01 downgraded from Medium to Low risk,
  added to existing controls, removed from recommended additions
- Create RFC-014 document

RFC: docs/rfcs/RFC-014-content-limits.md
diff --git a/config.default.yaml b/config.default.yaml
@@ -374,6 +374,8 @@ synthesis:
 # Env overrides:
 #   ZETTELFORGE_PII_ENABLED=true
 #   ZETTELFORGE_PII_ACTION=redact
+#   ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH=104857600
+#   ZETTELFORGE_LIMITS_RECALL_TIMEOUT=60
 #
 governance:
   enabled: true
@@ -385,6 +387,9 @@ governance:
     entities: []
     language: en
     nlp_model: en_core_web_sm
+  limits:
+    max_content_length: 52428800  # 50 MB, 0 = unlimited
+    recall_timeout_seconds: 30.0  # seconds, 0 = unlimited
 
 
 # ── LanceDB Maintenance (RFC-009 Phase 1.5) ─────────────────────────────────
diff --git a/docs/THREAT_MODEL.md b/docs/THREAT_MODEL.md
@@ -137,7 +137,7 @@ TB-1 ─────────────────────────
 
 | ID | Threat | Component | Risk | Mitigation |
 |----|--------|-----------|------|------------|
-| D-01 | Large content in `remember()` exhausts memory or blocks the enrichment queue | MemoryManager (P1) | **Medium** — degraded performance | `remember_report()` chunks long documents. No explicit size limit on `remember()` content. Enrichment queue has `maxsize=500` backpressure. |
+| D-01 | Large content in `remember()` exhausts memory or blocks the enrichment queue | MemoryManager (P1) | **Low** — gracefully rejected | `governance.limits.max_content_length` (RFC-014, default 50 MB) blocks oversized content with a clear error. `remember_report()` chunks long documents. Enrichment queue has `maxsize=500` backpressure. |
 | D-02 | LLM provider (ollama, litellm) hangs and blocks `remember()` | LLM Provider (TB-4) | **High** — operation blocks | OllamaProvider has timeout (RFC-010, default 60s). LitellmProvider has timeout + num_retries. `generate()` returns empty string on recoverable failure. Fallback provider (e.g., local -> ollama) gives alternative path. |
 | D-03 | Malicious query triggers deep graph traversal exhausting time/resources | BlendedRetriever | **Medium** — slow recall | `max_graph_depth` config (default 2) limits BFS hops. `default_k` (default 10) limits results. No timeout on recall queries. |
 | D-04 | spaCy model download blocks first `remember()` when PII is enabled | PIIValidator (lazy load) | **Low** — delayed first call (~2-3 seconds) | One-time download cost. Matching fastembed pattern. Can be pre-downloaded for air-gapped deployments. |
@@ -158,7 +158,7 @@ TB-1 ─────────────────────────
 |------------|-------|--------------|
 | **Critical** | 2 | T-01 (storage tampering), I-01 (unencrypted data at rest), E-02 (governance bypass via filesystem) |
 | **High** | 7 | S-01 (spoofed MCP client), S-03 (config tampering), T-02 (config security downgrade), R-01 (repudiation without audit), I-02 (PII in stored notes), D-02 (LLM provider hang), E-01 (cross-tenant data access) |
-| **Medium** | 9 | S-02 (fake LLM provider), T-04 (retrieval poisoning), R-02, R-03, I-04 (error message leakage), D-01, D-03, E-03 |
+| **Medium** | 8 | S-02 (fake LLM provider), T-04 (retrieval poisoning), R-02, R-03, I-04 (error message leakage), D-03, E-03 |
 | **Low** | 1 | D-04 (PII model download delay) |
 
 ### Top 5 Mitigations (Priority Order)
@@ -181,6 +181,7 @@ TB-1 ─────────────────────────
 | API key redaction | I-03 | `LLMConfig.__repr__` redacts api_key and sensitive extra keys | Unit tests in `test_llm_providers.py` |
 | PII detection + redaction | I-02 | PIIValidator (RFC-013): log/redact/block | Unit tests in `test_pii_validator.py` |
 | LLM provider timeout | D-02 | `OllamaProvider` timeout=60s, `LiteLLMProvider` timeout + num_retries | Unit tests (RFC-010, RFC-012) |
+| Content size limit | D-01 | `governance.limits.max_content_length` (RFC-014, default 50 MB) blocks oversized content | Unit tests in `test_governance.py` |
 | Config env-var resolution | I-03 | `${ENV_VAR}` syntax prevents raw secrets in YAML | Unit tests |
 | Configurable model provider | S-02, E-03 | `provider` key selects backend; no implicit unauthenticated outbound calls | Config validation |
 | Enrichment queue backpressure | D-01 | `maxsize=500` bounded queue | Code review |
@@ -189,7 +190,6 @@ TB-1 ─────────────────────────
 
 | Recommendation | Threat(s) | Effort | Priority |
 |---------------|-----------|--------|----------|
-| Add content size limit to `remember()` | D-01 | Small | P3 |
 | Add global exception handler that sanitizes error output | I-04 | Medium | P2 |
 | Add TLS verification option for self-hosted LLM endpoints | S-02 | Small | P2 |
 | Add config file integrity check (SHA-256 of default vs. loaded) | T-02, S-03 | Medium | P3 |
@@ -231,6 +231,7 @@ Per GOV-021, the following data types exist in the system:
 
 | Change | RFC/PR | Date | Threat Model Impact |
 |--------|--------|------|---------------------|
+| Content size limits + recall timeout | RFC-014 | 2026-04-25 | Mitigation for D-01 (content size limit, default 50 MB); partial mitigation for D-03 (timeout) |
 | PII detection and redaction | RFC-013 (PR #118) | 2026-04-25 | New control for I-02; new attack surface (D-04); PII text logging fixed |
 | LiteLLM unified provider | RFC-012 (PR #108) | 2026-04-25 | New provider for I-03 (API keys); new outbound traffic pattern (TB-4) |
 | Local LLM backend selection | RFC-011 (PR #104) | 2026-04-25 | No new threat surface — extends existing local provider |
diff --git a/docs/rfcs/RFC-014-content-limits.md b/docs/rfcs/RFC-014-content-limits.md
@@ -0,0 +1,137 @@
+# RFC-014: Content Size Limits and Recall Timeout for Denial of Service Mitigation
+
+## Metadata
+
+- **Author**: Patrick Roland
+- **Status**: Draft
+- **Created**: 2026-04-25
+- **Last Updated**: 2026-04-25
+- **Reviewers**: TBD
+- **Related Tickets**: ZF-014
+- **Related RFCs**: RFC-013 (PII Detection), RFD-001 (threat model)
+
+## Summary
+
+Add configurable content size limits to `remember()` and a configurable timeout to `recall()` to mitigate denial-of-service threats identified in THREAT_MODEL.md: D-01 (large content exhausting memory or blocking the enrichment queue) and D-03 (malicious queries triggering deep graph traversal). Introduce a new `limits` config section under `governance` with `max_content_length` and `recall_timeout_seconds` fields. Defaults preserve backward compatibility.
+
+## Motivation
+
+The THREAT_MODEL.md (THREAT-001, Section 2.5) identifies two denial-of-service vectors with no current mitigation:
+
+**D-01 (Large Content, HIGH):** `remember()` accepts content of arbitrary length. A 100MB input would:
+- Block the enrichment queue (maxsize=500 with no per-item size limit)
+- Exhaust memory during embedding (fastembed loads the entire input)
+- Block the `remember()` call for minutes during embedding
+- Potentially crash the process on low-memory systems
+
+**D-03 (Deep Graph Traversal, MEDIUM):** `recall()` with a crafted query could trigger deep BFS traversal in the knowledge graph, taking seconds to minutes to resolve. With `max_graph_depth: 2` this risk is limited, but no hard timeout exists.
+
+Both are standard "defense in depth" controls per FedRAMP SI-10 (Information Input Validation) and SA-8 (Security Engineering Principles). The default values are generous enough to never affect legitimate use but provide a hard stop for abuse or accidents.
+
+## Proposed Design
+
+### Config Schema
+
+New `limits` subsection under `governance`:
+
+```yaml
+governance:
+  enabled: true
+  min_content_length: 1
+  limits:
+    max_content_length: 52428800    # 50 MB default, 0 = unlimited
+    recall_timeout_seconds: 30.0    # 30 seconds default, 0 = unlimited
+  pii:
+    enabled: false
+```
+
+The 50 MB default is chosen because:
+- Largest real-world CTI report ingested is ~10 MB (NIST NVD feed, MITRE ATT&CK)
+- Embedding 50 MB of text at ~7ms/768-dim chunk takes ~5 seconds
+- Any legitimate use case below 50 MB is unaffected
+- 0 = unlimited preserves backward compatibility for any edge case
+
+### Dataclass Changes
+
+```python
+@dataclass
+class LimitsConfig:
+    """Operation limits for DoS mitigation (RFC-014).
+
+    Values of 0 disable the limit (unlimited).
+    """
+    max_content_length: int = 52428800   # bytes, 50 MB
+    recall_timeout_seconds: float = 30.0
+
+
+@dataclass
+class GovernanceConfig:
+    enabled: bool = True
+    min_content_length: int = 1
+    limits: LimitsConfig = field(default_factory=LimitsConfig)
+    pii: PIIConfig = field(default_factory=PIIConfig)
+```
+
+### GovernanceValidator Changes
+
+```python
+# In validate_remember()
+if self._limits.max_content_length > 0:
+    if len(content) > self._limits.max_content_length:
+        raise GovernanceViolationError(
+            f"Content exceeds max_content_length "
+            f"({len(content)} > {self._limits.max_content_length} bytes). "
+            f"Increase governance.limits.max_content_length or reduce "
+            f"input size."
+        )
+```
+
+### recall() Timeout Integration
+
+```python
+# In BlendedRetriever.retrieve() or MemoryManager.recall()
+# Wrap the retrieval call with a timeout
+timeout = get_config().governance.limits.recall_timeout_seconds
+if timeout > 0:
+    result = future_with_timeout(self._blended_retrieve, query, timeout=timeout)
+```
+
+### Environment Variables
+
+```python
+if v := os.environ.get("ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH"):
+    cfg.governance.limits.max_content_length = int(v)
+if v := os.environ.get("ZETTELFORGE_LIMITS_RECALL_TIMEOUT"):
+    cfg.governance.limits.recall_timeout_seconds = float(v)
+```
+
+### File Changes
+
+| File | Change |
+|------|--------|
+| `src/zettelforge/config.py` | Add `LimitsConfig` dataclass; add `limits` to `GovernanceConfig`; env overrides |
+| `src/zettelforge/governance_validator.py` | Add content length check in `validate_remember()` |
+| `src/zettelforge/memory_manager.py` | Wire recall timeout into `recall()` / `BlendedRetriever` calls |
+| `config.default.yaml` | Document `governance.limits` section |
+| `tests/test_governance.py` | Add tests for content size limit |
+| `docs/THREAT_MODEL.md` | Update D-01/D-03 to "mitigated" |
+| `docs/rfcs/RFC-014-content-limits.md` | New RFC |
+
+## Migration
+
+**Existing users:** Zero config changes. `limits.max_content_length` defaults to 50 MB. `limits.recall_timeout_seconds` defaults to 30 seconds. Existing data is never re-validated — limits apply only to new `remember()` / `recall()` calls.
+
+**Users who hit the limit:** Set `limits.max_content_length: 0` or `limits.recall_timeout_seconds: 0` in config to disable the limit.
+
+## Alternatives Considered
+
+**Alternative 1: Separate section instead of nested under governance.** A top-level `limits:` section was considered. Rejected because: the content size limit is conceptually a governance validation (input validation per GOV-011 / SI-10) and belongs with other governance controls. The recall timeout is a performance protection but benefits from colocation.
+
+**Alternative 2: No limit, rely on OS-level ulimit.** Rejected because: embedded systems and containerized deployments may have high ulimits. A process-level crash is worse than a graceful GovernanceViolationError.
+
+## Decision
+
+**Decision**: [Pending review]
+**Date**: [Pending]
+**Decision Maker**: [Pending]
+**Rationale**: [Pending]
diff --git a/src/zettelforge/config.py b/src/zettelforge/config.py
@@ -190,11 +190,23 @@ class PIIConfig:
     nlp_model: str = "en_core_web_sm"
 
 
+@dataclass
+class LimitsConfig:
+    """Operation limits for DoS mitigation (RFC-014).
+
+    Values of 0 disable the limit (unlimited).
+    """
+
+    max_content_length: int = 52428800  # bytes, 50 MB default
+    recall_timeout_seconds: float = 30.0
+
+
 @dataclass
 class GovernanceConfig:
     enabled: bool = True
     min_content_length: int = 1
     pii: PIIConfig = field(default_factory=PIIConfig)
+    limits: LimitsConfig = field(default_factory=LimitsConfig)
 
 
 @dataclass
@@ -396,6 +408,11 @@ def _apply_yaml(cfg: ZettelForgeConfig, data: dict):
                 for pk, pv in v.items():
                     if hasattr(cfg.governance.pii, pk):
                         setattr(cfg.governance.pii, pk, pv)
+            # RFC-014: limits is a nested dataclass (DoS mitigations)
+            elif k == "limits" and isinstance(v, dict):
+                for lk, lv in v.items():
+                    if hasattr(cfg.governance.limits, lk):
+                        setattr(cfg.governance.limits, lk, lv)
             else:
                 setattr(cfg.governance, k, v)
 
@@ -496,6 +513,12 @@ def _apply_env(cfg: ZettelForgeConfig):
     if v := os.environ.get("ZETTELFORGE_PII_ACTION"):
         cfg.governance.pii.action = v
 
+    # RFC-014: Operation limits (DoS mitigation)
+    if v := os.environ.get("ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH"):
+        cfg.governance.limits.max_content_length = int(v)
+    if v := os.environ.get("ZETTELFORGE_LIMITS_RECALL_TIMEOUT"):
+        cfg.governance.limits.recall_timeout_seconds = float(v)
+
     # Extensions license key (used by zettelforge-enterprise fallback path)
     if v := os.environ.get("THREATENGRAM_LICENSE_KEY"):
         cfg.enterprise.license_key = v
diff --git a/src/zettelforge/governance_validator.py b/src/zettelforge/governance_validator.py
@@ -17,7 +17,7 @@
 from zettelforge.log import get_logger
 
 if TYPE_CHECKING:
-    from zettelforge.config import PIIConfig
+    from zettelforge.config import LimitsConfig, PIIConfig
 
 _logger = get_logger("zettelforge.governance")
 
@@ -35,10 +35,13 @@ def __init__(
         self,
         governance_dir: Path | None = None,
         pii_config: PIIConfig | None = None,
+        limits_config: LimitsConfig | None = None,
     ):
         self.governance_dir = governance_dir
         self.rules = self._load_governance_rules()
         self._pii = None
+        # RFC-014: operation limits (DoS mitigation)
+        self._limits = limits_config
 
         # RFC-013: Optional PII validator. If the config says enabled but
         # presidio-analyzer is not installed, log a warning and continue --
@@ -108,6 +111,19 @@ def validate_remember(self, content: str) -> str:
         if not is_valid:
             raise GovernanceViolationError(f"Governance violation in remember: {violations}")
 
+        # RFC-014: Content size limit (DoS mitigation)
+        if (
+            self._limits is not None
+            and self._limits.max_content_length > 0
+            and len(content) > self._limits.max_content_length
+        ):
+            raise GovernanceViolationError(
+                f"Content exceeds max_content_length "
+                f"({len(content)} > {self._limits.max_content_length} bytes). "
+                f"Increase governance.limits.max_content_length or "
+                f"reduce input size."
+            )
+
         # RFC-013: Optional PII validation
         if self._pii is not None:
             try:
diff --git a/src/zettelforge/memory_manager.py b/src/zettelforge/memory_manager.py
@@ -115,6 +115,7 @@ def __init__(self, jsonl_path: str | None = None, lance_path: str | None = None)
         )
         self.governance = GovernanceValidator(
             pii_config=get_config().governance.pii,
+            limits_config=get_config().governance.limits,
         )
         self.resolver = AliasResolver()
         self.consolidation = ConsolidationMiddleware(self)
diff --git a/tests/test_governance.py b/tests/test_governance.py
@@ -30,3 +30,62 @@ def test_governance_in_memory_manager():
     mm = MemoryManager()
     assert hasattr(mm, "governance")
     assert isinstance(mm.governance, GovernanceValidator)
+
+
+# ── Content size limit (RFC-014) ──────────────────────────────────────────────
+
+
+def test_content_size_limit_within_bounds():
+    """Content under the default limit must pass through."""
+    from zettelforge.config import LimitsConfig
+
+    gv = GovernanceValidator(limits_config=LimitsConfig(max_content_length=1024))
+    result = gv.enforce("remember", "a" * 100)
+    assert result == "a" * 100
+
+
+def test_content_size_limit_exceeded():
+    """Content over the limit must raise."""
+    from zettelforge.config import LimitsConfig
+
+    gv = GovernanceValidator(limits_config=LimitsConfig(max_content_length=50))
+    with pytest.raises(GovernanceViolationError, match="max_content_length"):
+        gv.enforce("remember", "a" * 100)
+
+
+def test_content_size_limit_zero_disabled():
+    """limit=0 disables the check."""
+    from zettelforge.config import LimitsConfig
+
+    gv = GovernanceValidator(limits_config=LimitsConfig(max_content_length=0))
+    result = gv.enforce("remember", "a" * 10000)
+    assert result == "a" * 10000
+
+
+def test_content_size_limit_none_config():
+    """No limits_config means no limit check."""
+    gv = GovernanceValidator()
+    result = gv.enforce("remember", "a" * 100000)
+    assert result == "a" * 100000
+
+
+def test_limits_config_defaults():
+    """LimitsConfig has sane defaults."""
+    from zettelforge.config import LimitsConfig
+
+    lc = LimitsConfig()
+    assert lc.max_content_length == 52428800  # 50 MB
+    assert lc.recall_timeout_seconds == 30.0
+
+
+def test_content_limit_message_contains_value():
+    """Error message must include actual and max sizes for debugging."""
+    from zettelforge.config import LimitsConfig
+
+    gv = GovernanceValidator(limits_config=LimitsConfig(max_content_length=10))
+    try:
+        gv.enforce("remember", "x" * 100)
+    except GovernanceViolationError as e:
+        msg = str(e)
+        assert "100" in msg
+        assert "10" in msg

Original file line number	Diff line number	Diff line change
`@@ -115,6 +115,7 @@ def __init__(self, jsonl_path: str \| None = None, lance_path: str \| None = None)`
`115`	`115`	`)`
`116`	`116`	`self.governance = GovernanceValidator(`
`117`	`117`	`pii_config=get_config().governance.pii,`
	`118`	`+ limits_config=get_config().governance.limits,`
`118`	`119`	`)`
`119`	`120`	`self.resolver = AliasResolver()`
`120`	`121`	`self.consolidation = ConsolidationMiddleware(self)`