Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions docs/awf-config-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ AWF settings MAY be supplied via config files, including stdin (`--config -`).
- `apiProxy.maxRuns` → *(deprecated alias for `maxTurns`; maps to `AWF_MAX_RUNS`)*
- `apiProxy.maxModelMultiplierCap` → `--max-model-multiplier-cap <number>`
- `apiProxy.maxPermissionDenied` → `--max-permission-denied <number>`
- `apiProxy.maxCacheMisses` → `--max-cache-misses <number>`
- `apiProxy.requestedModel` → *(config-only; maps to `AWF_REQUESTED_MODEL` for pre-startup validation)*
- `apiProxy.modelFallback` → *(config-only; model fallback strategy)*
- `apiProxy.modelRouter.providerType` → *(config-only; maps to `COPILOT_PROVIDER_TYPE`)*
Expand Down Expand Up @@ -954,6 +955,97 @@ apiProxy:
maxPermissionDenied: 3 # stop run after 3 upstream 401/403 responses
```

## 11b. Cache-Miss Guard

*This section is normative.*

When `apiProxy.maxCacheMisses` is configured, the API proxy MUST halt further
LLM requests after the configured number of consecutive responses that had no
prompt-cache hits, preventing runaway token spend caused by a broken or expired
cache (e.g., mismatched cache keys, context window overflow, or prompt drift).

### 11b.1 Counting Cache Misses

A cache miss is counted for a response when **all** of the following are true:

- The response is a successful upstream completion (not a proxy-level error).
- `input_tokens > 0` (zero-input responses such as empty tool calls are
excluded so they do not inflate the streak counter).
- `cache_read_tokens === 0` (no prompt-cache hit occurred).

A cache *hit* (`cache_read_tokens > 0`) resets the consecutive miss streak to
zero.

### 11b.2 Enforcement Behavior

The API proxy MUST enforce the cache-miss limit as follows:

1. **Post-response counting**: After receiving each successful upstream
response, the proxy inspects the normalized token usage and increments or
resets the miss streak counter.
Comment thread
Copilot marked this conversation as resolved.

2. **Pre-request check**: Before forwarding each subsequent request to the
upstream provider, the proxy checks whether the miss streak has reached or
exceeded `maxCacheMisses`.

3. **Rejection**: When the limit is reached or exceeded, the proxy MUST reject
the request with:
- **HTTP status**: `403 Forbidden`
- **Content-Type**: `application/json`
- **Response body**:
```json
{
"error": {
"type": "max_cache_misses_exceeded",
"message": "Maximum consecutive cache misses exceeded (3 / 3).",
"consecutive_cache_misses": 3,
"max_cache_misses": 3
}
}
```

4. **WebSocket rejection**: For WebSocket upgrade requests, the proxy MUST
reject with `HTTP/1.1 403 Forbidden` and include the same JSON error body
before destroying the socket.

5. **Finality**: Once the streak limit is reached, all subsequent requests in
the same run MUST be rejected. Changing `AWF_MAX_CACHE_MISSES` resets the
streak counter.

### 11b.3 Introspection

The `/reflect` endpoint (available on all provider ports 10000–10003; see
§10.6) MUST include the current cache-miss guard state:

```json
{
"cache_misses": {
"enabled": true,
"max_cache_misses": 3,
"consecutive_cache_misses": 1,
"remaining_cache_misses": 2
}
}
```

When `maxCacheMisses` is not configured, the `enabled` field MUST be `false`,
`max_cache_misses` MUST be `null`, `consecutive_cache_misses` MUST be `0`, and
`remaining_cache_misses` MUST be `null`.

### 11b.4 Configuration

`maxCacheMisses` is a positive integer. It is supplied via the AWF config file
(stdin config) or the `--max-cache-misses` CLI flag, and maps to the
`AWF_MAX_CACHE_MISSES` environment variable injected into the api-proxy
container.

**Example**:

```yaml
apiProxy:
maxCacheMisses: 3 # stop run after 3 consecutive cache misses
```

## 12. Model Multiplier Cap

*This section is normative.*
Expand Down
Loading