Skip to content

Commit 9116e8b

Browse files
github-actions[bot]CopilotCopilot
authored
fix: propagate config fields to all layers (#5279)
* fix(spec): add maxCacheMisses to Section 5 CLI mapping and add §11b spec PR #5202 added the maxCacheMisses guardrail with full implementation: - JSON schemas (src + docs) - TypeScript type (RateLimitOptions) - config-file.ts mapping - CLI option --max-cache-misses - AWF_MAX_CACHE_MISSES env var wiring But docs/awf-config-spec.md was not updated with: 1. Section 5 CLI Mapping entry for apiProxy.maxCacheMisses → --max-cache-misses 2. Behavioral spec section (§11b) describing counting rules, enforcement, introspection format, and configuration Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * fix(spec): align cache-miss guard wording and reflect key --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
1 parent 733dad1 commit 9116e8b

1 file changed

Lines changed: 91 additions & 0 deletions

File tree

docs/awf-config-spec.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -955,6 +955,97 @@ apiProxy:
955955
maxPermissionDenied: 3 # stop run after 3 upstream 401/403 responses
956956
```
957957
958+
## 11b. Cache-Miss Guard
959+
960+
*This section is normative.*
961+
962+
When `apiProxy.maxCacheMisses` is configured, the API proxy MUST halt further
963+
LLM requests after the configured number of consecutive responses that had no
964+
prompt-cache hits, preventing runaway token spend caused by a broken or expired
965+
cache (e.g., mismatched cache keys, context window overflow, or prompt drift).
966+
967+
### 11b.1 Counting Cache Misses
968+
969+
A cache miss is counted for a response when **all** of the following are true:
970+
971+
- The response is a successful upstream completion (not a proxy-level error).
972+
- `input_tokens > 0` (zero-input responses such as empty tool calls are
973+
excluded so they do not inflate the streak counter).
974+
- `cache_read_tokens === 0` (no prompt-cache hit occurred).
975+
976+
A cache *hit* (`cache_read_tokens > 0`) resets the consecutive miss streak to
977+
zero.
978+
979+
### 11b.2 Enforcement Behavior
980+
981+
The API proxy MUST enforce the cache-miss limit as follows:
982+
983+
1. **Post-response counting**: After receiving each successful upstream
984+
response, the proxy inspects the normalized token usage and increments or
985+
resets the miss streak counter.
986+
987+
2. **Pre-request check**: Before forwarding each subsequent request to the
988+
upstream provider, the proxy checks whether the miss streak has reached or
989+
exceeded `maxCacheMisses`.
990+
991+
3. **Rejection**: When the limit is reached or exceeded, the proxy MUST reject
992+
the request with:
993+
- **HTTP status**: `403 Forbidden`
994+
- **Content-Type**: `application/json`
995+
- **Response body**:
996+
```json
997+
{
998+
"error": {
999+
"type": "max_cache_misses_exceeded",
1000+
"message": "Maximum consecutive cache misses exceeded (3 / 3).",
1001+
"consecutive_cache_misses": 3,
1002+
"max_cache_misses": 3
1003+
}
1004+
}
1005+
```
1006+
1007+
4. **WebSocket rejection**: For WebSocket upgrade requests, the proxy MUST
1008+
reject with `HTTP/1.1 403 Forbidden` and include the same JSON error body
1009+
before destroying the socket.
1010+
1011+
5. **Finality**: Once the streak limit is reached, all subsequent requests in
1012+
the same run MUST be rejected. Changing `AWF_MAX_CACHE_MISSES` resets the
1013+
streak counter.
1014+
1015+
### 11b.3 Introspection
1016+
1017+
The `/reflect` endpoint (available on all provider ports 10000–10003; see
1018+
§10.6) MUST include the current cache-miss guard state:
1019+
1020+
```json
1021+
{
1022+
"cache_misses": {
1023+
"enabled": true,
1024+
"max_cache_misses": 3,
1025+
"consecutive_cache_misses": 1,
1026+
"remaining_cache_misses": 2
1027+
}
1028+
}
1029+
```
1030+
1031+
When `maxCacheMisses` is not configured, the `enabled` field MUST be `false`,
1032+
`max_cache_misses` MUST be `null`, `consecutive_cache_misses` MUST be `0`, and
1033+
`remaining_cache_misses` MUST be `null`.
1034+
1035+
### 11b.4 Configuration
1036+
1037+
`maxCacheMisses` is a positive integer. It is supplied via the AWF config file
1038+
(stdin config) or the `--max-cache-misses` CLI flag, and maps to the
1039+
`AWF_MAX_CACHE_MISSES` environment variable injected into the api-proxy
1040+
container.
1041+
1042+
**Example**:
1043+
1044+
```yaml
1045+
apiProxy:
1046+
maxCacheMisses: 3 # stop run after 3 consecutive cache misses
1047+
```
1048+
9581049
## 12. Model Multiplier Cap
9591050

9601051
*This section is normative.*

0 commit comments

Comments
 (0)