diff --git a/docs/awf-config-spec.md b/docs/awf-config-spec.md index bf39cf62..70d249bb 100644 --- a/docs/awf-config-spec.md +++ b/docs/awf-config-spec.md @@ -111,6 +111,7 @@ AWF settings MAY be supplied via config files, including stdin (`--config -`). - `apiProxy.maxRuns` → *(deprecated alias for `maxTurns`; maps to `AWF_MAX_RUNS`)* - `apiProxy.maxModelMultiplierCap` → `--max-model-multiplier-cap ` - `apiProxy.maxPermissionDenied` → `--max-permission-denied ` +- `apiProxy.maxCacheMisses` → `--max-cache-misses ` - `apiProxy.requestedModel` → *(config-only; maps to `AWF_REQUESTED_MODEL` for pre-startup validation)* - `apiProxy.modelFallback` → *(config-only; model fallback strategy)* - `apiProxy.modelRouter.providerType` → *(config-only; maps to `COPILOT_PROVIDER_TYPE`)* @@ -954,6 +955,97 @@ apiProxy: maxPermissionDenied: 3 # stop run after 3 upstream 401/403 responses ``` +## 11b. Cache-Miss Guard + +*This section is normative.* + +When `apiProxy.maxCacheMisses` is configured, the API proxy MUST halt further +LLM requests after the configured number of consecutive responses that had no +prompt-cache hits, preventing runaway token spend caused by a broken or expired +cache (e.g., mismatched cache keys, context window overflow, or prompt drift). + +### 11b.1 Counting Cache Misses + +A cache miss is counted for a response when **all** of the following are true: + +- The response is a successful upstream completion (not a proxy-level error). +- `input_tokens > 0` (zero-input responses such as empty tool calls are + excluded so they do not inflate the streak counter). +- `cache_read_tokens === 0` (no prompt-cache hit occurred). + +A cache *hit* (`cache_read_tokens > 0`) resets the consecutive miss streak to +zero. + +### 11b.2 Enforcement Behavior + +The API proxy MUST enforce the cache-miss limit as follows: + +1. **Post-response counting**: After receiving each successful upstream + response, the proxy inspects the normalized token usage and increments or + resets the miss streak counter. + +2. **Pre-request check**: Before forwarding each subsequent request to the + upstream provider, the proxy checks whether the miss streak has reached or + exceeded `maxCacheMisses`. + +3. **Rejection**: When the limit is reached or exceeded, the proxy MUST reject + the request with: + - **HTTP status**: `403 Forbidden` + - **Content-Type**: `application/json` + - **Response body**: + ```json + { + "error": { + "type": "max_cache_misses_exceeded", + "message": "Maximum consecutive cache misses exceeded (3 / 3).", + "consecutive_cache_misses": 3, + "max_cache_misses": 3 + } + } + ``` + +4. **WebSocket rejection**: For WebSocket upgrade requests, the proxy MUST + reject with `HTTP/1.1 403 Forbidden` and include the same JSON error body + before destroying the socket. + +5. **Finality**: Once the streak limit is reached, all subsequent requests in + the same run MUST be rejected. Changing `AWF_MAX_CACHE_MISSES` resets the + streak counter. + +### 11b.3 Introspection + +The `/reflect` endpoint (available on all provider ports 10000–10003; see +§10.6) MUST include the current cache-miss guard state: + +```json +{ + "cache_misses": { + "enabled": true, + "max_cache_misses": 3, + "consecutive_cache_misses": 1, + "remaining_cache_misses": 2 + } +} +``` + +When `maxCacheMisses` is not configured, the `enabled` field MUST be `false`, +`max_cache_misses` MUST be `null`, `consecutive_cache_misses` MUST be `0`, and +`remaining_cache_misses` MUST be `null`. + +### 11b.4 Configuration + +`maxCacheMisses` is a positive integer. It is supplied via the AWF config file +(stdin config) or the `--max-cache-misses` CLI flag, and maps to the +`AWF_MAX_CACHE_MISSES` environment variable injected into the api-proxy +container. + +**Example**: + +```yaml +apiProxy: + maxCacheMisses: 3 # stop run after 3 consecutive cache misses +``` + ## 12. Model Multiplier Cap *This section is normative.*