|
| 1 | +# RFC-014: Content Size Limits and Recall Timeout for Denial of Service Mitigation |
| 2 | + |
| 3 | +## Metadata |
| 4 | + |
| 5 | +- **Author**: Patrick Roland |
| 6 | +- **Status**: Draft |
| 7 | +- **Created**: 2026-04-25 |
| 8 | +- **Last Updated**: 2026-04-25 |
| 9 | +- **Reviewers**: TBD |
| 10 | +- **Related Tickets**: ZF-014 |
| 11 | +- **Related RFCs**: RFC-013 (PII Detection), RFD-001 (threat model) |
| 12 | + |
| 13 | +## Summary |
| 14 | + |
| 15 | +Add configurable content size limits to `remember()` and a configurable timeout to `recall()` to mitigate denial-of-service threats identified in THREAT_MODEL.md: D-01 (large content exhausting memory or blocking the enrichment queue) and D-03 (malicious queries triggering deep graph traversal). Introduce a new `limits` config section under `governance` with `max_content_length` and `recall_timeout_seconds` fields. Defaults preserve backward compatibility. |
| 16 | + |
| 17 | +## Motivation |
| 18 | + |
| 19 | +The THREAT_MODEL.md (THREAT-001, Section 2.5) identifies two denial-of-service vectors with no current mitigation: |
| 20 | + |
| 21 | +**D-01 (Large Content, HIGH):** `remember()` accepts content of arbitrary length. A 100MB input would: |
| 22 | +- Block the enrichment queue (maxsize=500 with no per-item size limit) |
| 23 | +- Exhaust memory during embedding (fastembed loads the entire input) |
| 24 | +- Block the `remember()` call for minutes during embedding |
| 25 | +- Potentially crash the process on low-memory systems |
| 26 | + |
| 27 | +**D-03 (Deep Graph Traversal, MEDIUM):** `recall()` with a crafted query could trigger deep BFS traversal in the knowledge graph, taking seconds to minutes to resolve. With `max_graph_depth: 2` this risk is limited, but no hard timeout exists. |
| 28 | + |
| 29 | +Both are standard "defense in depth" controls per FedRAMP SI-10 (Information Input Validation) and SA-8 (Security Engineering Principles). The default values are generous enough to never affect legitimate use but provide a hard stop for abuse or accidents. |
| 30 | + |
| 31 | +## Proposed Design |
| 32 | + |
| 33 | +### Config Schema |
| 34 | + |
| 35 | +New `limits` subsection under `governance`: |
| 36 | + |
| 37 | +```yaml |
| 38 | +governance: |
| 39 | + enabled: true |
| 40 | + min_content_length: 1 |
| 41 | + limits: |
| 42 | + max_content_length: 52428800 # 50 MB default, 0 = unlimited |
| 43 | + recall_timeout_seconds: 30.0 # 30 seconds default, 0 = unlimited |
| 44 | + pii: |
| 45 | + enabled: false |
| 46 | +``` |
| 47 | +
|
| 48 | +The 50 MB default is chosen because: |
| 49 | +- Largest real-world CTI report ingested is ~10 MB (NIST NVD feed, MITRE ATT&CK) |
| 50 | +- Embedding 50 MB of text at ~7ms/768-dim chunk takes ~5 seconds |
| 51 | +- Any legitimate use case below 50 MB is unaffected |
| 52 | +- 0 = unlimited preserves backward compatibility for any edge case |
| 53 | +
|
| 54 | +### Dataclass Changes |
| 55 | +
|
| 56 | +```python |
| 57 | +@dataclass |
| 58 | +class LimitsConfig: |
| 59 | + """Operation limits for DoS mitigation (RFC-014). |
| 60 | +
|
| 61 | + Values of 0 disable the limit (unlimited). |
| 62 | + """ |
| 63 | + max_content_length: int = 52428800 # bytes, 50 MB |
| 64 | + recall_timeout_seconds: float = 30.0 |
| 65 | + |
| 66 | + |
| 67 | +@dataclass |
| 68 | +class GovernanceConfig: |
| 69 | + enabled: bool = True |
| 70 | + min_content_length: int = 1 |
| 71 | + limits: LimitsConfig = field(default_factory=LimitsConfig) |
| 72 | + pii: PIIConfig = field(default_factory=PIIConfig) |
| 73 | +``` |
| 74 | +
|
| 75 | +### GovernanceValidator Changes |
| 76 | +
|
| 77 | +```python |
| 78 | +# In validate_remember() |
| 79 | +if self._limits.max_content_length > 0: |
| 80 | + if len(content) > self._limits.max_content_length: |
| 81 | + raise GovernanceViolationError( |
| 82 | + f"Content exceeds max_content_length " |
| 83 | + f"({len(content)} > {self._limits.max_content_length} bytes). " |
| 84 | + f"Increase governance.limits.max_content_length or reduce " |
| 85 | + f"input size." |
| 86 | + ) |
| 87 | +``` |
| 88 | +
|
| 89 | +### recall() Timeout Integration |
| 90 | +
|
| 91 | +```python |
| 92 | +# In BlendedRetriever.retrieve() or MemoryManager.recall() |
| 93 | +# Wrap the retrieval call with a timeout |
| 94 | +timeout = get_config().governance.limits.recall_timeout_seconds |
| 95 | +if timeout > 0: |
| 96 | + result = future_with_timeout(self._blended_retrieve, query, timeout=timeout) |
| 97 | +``` |
| 98 | +
|
| 99 | +### Environment Variables |
| 100 | +
|
| 101 | +```python |
| 102 | +if v := os.environ.get("ZETTELFORGE_LIMITS_MAX_CONTENT_LENGTH"): |
| 103 | + cfg.governance.limits.max_content_length = int(v) |
| 104 | +if v := os.environ.get("ZETTELFORGE_LIMITS_RECALL_TIMEOUT"): |
| 105 | + cfg.governance.limits.recall_timeout_seconds = float(v) |
| 106 | +``` |
| 107 | +
|
| 108 | +### File Changes |
| 109 | +
|
| 110 | +| File | Change | |
| 111 | +|------|--------| |
| 112 | +| `src/zettelforge/config.py` | Add `LimitsConfig` dataclass; add `limits` to `GovernanceConfig`; env overrides | |
| 113 | +| `src/zettelforge/governance_validator.py` | Add content length check in `validate_remember()` | |
| 114 | +| `src/zettelforge/memory_manager.py` | Wire recall timeout into `recall()` / `BlendedRetriever` calls | |
| 115 | +| `config.default.yaml` | Document `governance.limits` section | |
| 116 | +| `tests/test_governance.py` | Add tests for content size limit | |
| 117 | +| `docs/THREAT_MODEL.md` | Update D-01/D-03 to "mitigated" | |
| 118 | +| `docs/rfcs/RFC-014-content-limits.md` | New RFC | |
| 119 | + |
| 120 | +## Migration |
| 121 | + |
| 122 | +**Existing users:** Zero config changes. `limits.max_content_length` defaults to 50 MB. `limits.recall_timeout_seconds` defaults to 30 seconds. Existing data is never re-validated — limits apply only to new `remember()` / `recall()` calls. |
| 123 | + |
| 124 | +**Users who hit the limit:** Set `limits.max_content_length: 0` or `limits.recall_timeout_seconds: 0` in config to disable the limit. |
| 125 | + |
| 126 | +## Alternatives Considered |
| 127 | + |
| 128 | +**Alternative 1: Separate section instead of nested under governance.** A top-level `limits:` section was considered. Rejected because: the content size limit is conceptually a governance validation (input validation per GOV-011 / SI-10) and belongs with other governance controls. The recall timeout is a performance protection but benefits from colocation. |
| 129 | + |
| 130 | +**Alternative 2: No limit, rely on OS-level ulimit.** Rejected because: embedded systems and containerized deployments may have high ulimits. A process-level crash is worse than a graceful GovernanceViolationError. |
| 131 | + |
| 132 | +## Decision |
| 133 | + |
| 134 | +**Decision**: [Pending review] |
| 135 | +**Date**: [Pending] |
| 136 | +**Decision Maker**: [Pending] |
| 137 | +**Rationale**: [Pending] |
0 commit comments