Skip to content

Commit 8fae953

Browse files
authored
Merge pull request #64 from AlphaBitCore/chore/sync-8-internal-main
chore(repo): sync internal main → develop (sync 8)
2 parents a07dd99 + f62e845 commit 8fae953

194 files changed

Lines changed: 10883 additions & 5152 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 148 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,92 @@ All notable changes to this project are documented here. The format follows
66

77
## [Unreleased]
88

9-
_Nothing yet._
9+
### Changed (BREAKING — major version bump)
10+
- **Hook `onMatch` collapses to a single `action` (approve | redact | block).**
11+
The orthogonal `onMatch.inflightAction` (approve / block-hard / block-soft /
12+
redact) × `onMatch.storageAction` (keep / redact / drop-content) pair is
13+
replaced by one `action` field across the AI Gateway, Compliance Proxy, and
14+
Agent. `redact` rewrites the payload (the same masked body is forwarded,
15+
returned, and stored); `block` rejects and stores the policy attribution
16+
(matched rule, reason, compliance tags) — not a content body, since a blocked
17+
request never produces a masked wire copy; `approve` forwards and stores as-is.
18+
A redact whose adapter cannot reverse-encode the masked content onto the wire
19+
(`ErrRewriteUnsupported`) fails **closed** (the request/response is rejected,
20+
not forwarded unredacted). Soft-block (HTTP 246) is removed — block-soft folds
21+
into block (HTTP 403). The canonical normalized projection is **no longer
22+
persisted** for audit; the control plane recomputes it at view time from the
23+
(already-redacted) raw body, so `request_normalized` / `response_normalized`
24+
and `request_redaction_spans` / `response_redaction_spans` are no longer
25+
emitted.
26+
**Migration:** the config reader maps the legacy keys for a deprecation window
27+
(one-shot warning); a one-off data migration
28+
(`tools/db-migrate/manual-scripts/migrate_hook_onmatch_action_2026_06_22.sql`)
29+
rewrites stored `HookConfig.config.onMatch` rows:
30+
`block-hard|block-soft → block`, `redact → redact`,
31+
`approve + keep → approve`, `approve + redact|drop-content → redact`.
32+
Runtime enforcement is unchanged by the mapping: `block-soft` already **rejected**
33+
the request — it returned an error response (previously with the non-standard
34+
status 246, now 403) and never forwarded the traffic, so this is a status-code
35+
change, not an allow→deny change. The only data-level behavior change is
36+
`approve + redact|drop-content → redact`, which upgrades a storage-only redact to
37+
a full redact (the compliance-safe direction, never less masked than before) and
38+
occurs in no current row, so the live migration is lossless. Client note: any SDK
39+
that branched on the soft-block status 246 must now treat such a rule's response
40+
as a 403 reject. The Agent signals a block by dropping the
41+
connection (no rich error body); the proxies return an attributed 403 whose
42+
response-stage reason carries rule-ID labels only, never the upstream value.
43+
### Changed — normalized projection is now fully view-time (no migration required)
44+
45+
- **The normalized traffic projection is no longer written on the hot path; it is
46+
recomputed at view time.** Building on 1.1.0 (where the producers stopped
47+
stamping it), this completes the move end-to-end: the Hub no longer
48+
self-derives the projection from agent uploads, and the periodic
49+
**normalize-backfill job is retired**. The Control Plane (and the Agent
50+
dashboard) recompute the normalized request/response on demand — when an
51+
operator opens a Traffic detail drawer — from the stored, already-redacted
52+
body, so the rendered projection always reflects the current decoder version
53+
with no scheduled job and no stored copy to drift.
54+
- **`traffic_event_normalized` and `traffic_event_normalize_skip` are retained,
55+
write-frozen.** No schema change and **no migration is required.** The
56+
`traffic_event_normalized` sidecar still receives a row only when an older
57+
shipped agent uploads its own governed normalized copy — for a block/redact
58+
row whose raw body was dropped, that uploaded copy is the sole forensic
59+
record. The `traffic_event_normalize_skip` ledger is now inert (the job that
60+
wrote it is gone). Dropping both tables is a planned deprecation-window
61+
follow-up, not part of this change.
62+
- **`GET /api/admin/traffic/{id}/normalized`** now returns the recompute and no
63+
longer includes redaction spans (the recompute reads an already-redacted
64+
body). It returns `404` when the projection is unavailable — no stored body
65+
to recompute from (payload capture was off, or a spilled body has aged out of
66+
retention) and no stored sidecar fallback.
67+
- **Operators:** the `nexus_normalize_backfill_*` counters are no longer
68+
emitted. A missing/NULL `traffic_event_normalized` sidecar is now the normal
69+
state for current traffic, not a gap to heal.
70+
71+
### Changed — streaming-compliance enforcement (config-compatible, no migration)
72+
73+
- **Streaming response compliance is scope-routed, and the real-time path is
74+
audit-only.** A response hook's enforcement scope decides how a streamed (SSE)
75+
response is handled, overriding the admin streaming-mode default wherever that
76+
default cannot enforce:
77+
- A **block** scope buffers the full response before any byte is delivered
78+
(zero-leak hard block).
79+
- A **redact** scope under `chunked_async` streams in real time behind a prescan
80+
gate that holds a bounded trailing window and escalates to buffered redaction on
81+
a confirmed match — best-effort on the wire: a complete sensitive value is never
82+
delivered, but a leading fragment of a value longer than the window may reach the
83+
client before redaction engages, while the persisted audit copy stays fully
84+
masked within that window. A redact scope under `passthrough` falls back to
85+
buffering rather than forwarding raw.
86+
- A **non-enforcing** pipeline streams in real time, audit-only: it scans and tags
87+
every checkpoint but never blocks or rewrites the wire.
88+
- An **unbuildable fail-closed** response hook forces buffering, which fails closed
89+
with an in-band error frame — never a silent fail-open on the real-time path.
90+
- **The streamed `finish_reason` is preserved** across the canonical re-encode
91+
instead of collapsing to `stop`.
92+
- The `streaming_compliance.config` mode enum (`passthrough` / `buffer_full_block` /
93+
`chunked_async`) is unchanged; no migration. The Control Plane UI shows an
94+
always-visible per-mode disclosure of exactly what each mode enforces.
1095

1196
## [1.1.0] — 2026-06-28
1297

@@ -74,6 +159,45 @@ target.
74159

75160
### Changed — defaults (overridable, no migration required)
76161

162+
### Changed (defaults — overridable, no migration required)
163+
These flip shipped behavior toward higher throughput; each is overridable by env
164+
or yaml and an upgrade silently inherits the new default. Operators relying on the
165+
prior strictness should set the opt-out shown.
166+
- **Quota enforcement is soft by default (`NEXUS_QUOTA_WRITE_BEHIND` ON).** Per-
167+
request quota cost is accumulated in-process and flushed to Redis on a 250ms
168+
interval behind a 1s read cache, instead of a synchronous per-request Redis
169+
round-trip. Overshoot per instance ≤ ~1.25s of spend; across an N-instance fleet
170+
the blind-spend window is that × N, and a hard kill loses un-flushed increments
171+
(graceful shutdown drains). Opt out: `NEXUS_QUOTA_WRITE_BEHIND=0` (strict
172+
synchronous per-request accounting).
173+
- **Credential-stats write-behind ON by default (`NEXUS_CREDSTATS_WRITE_BEHIND`).**
174+
Credential usage counters defer off the request path; circuit-breaker
175+
transitions stay synchronous. Opt out: `NEXUS_CREDSTATS_WRITE_BEHIND=0`.
176+
- **Audit overflow default `AI_GATEWAY_AUDIT_LOSS_MODE=spill`.** The request path no
177+
longer back-pressures on a full audit pipeline; overflow spills to a durable
178+
on-disk spool replayed to Postgres. No loss until the spill channel + disk
179+
saturate; sustained overload past that drops records, counted on `dropped_total`.
180+
Opt out for strict no-drop back-pressure: `AI_GATEWAY_AUDIT_LOSS_MODE=block`.
181+
- **`NEXUS_EVENTS` audit stream is in-memory by default (`NEXUS_EVENTS_STORAGE=memory`,
182+
`DiscardNew`, cap `NEXUS_EVENTS_MAX_BYTES=auto` = 15% RAM).** Keeps the
183+
delay-tolerant burst buffer off the data disk. A NATS broker restart/crash drops
184+
published-but-undrained events (the overflow→disk no-loss path covers only the
185+
stream-full case). Opt out for a durable file-backed stream:
186+
`NEXUS_EVENTS_STORAGE=file`.
187+
- **`GOMEMLIMIT` auto-set from the cgroup limit when unset.** Each service, if
188+
`GOMEMLIMIT` is not provided, reads the cgroup memory limit at boot and sets the
189+
Go soft limit to ~70% of it (logging a WARN with the value), leaving it unset
190+
when no cgroup limit is detectable. Pin explicitly to override.
191+
- **Cache freshness protection defaults ON (`extract_cache_config.apply_freshness_rules`
192+
default `false → true`).** Freshness protection is intrinsic to caching: enabling a
193+
cache tier should not silently replay a stale time-sensitive answer (today's date,
194+
"latest" prices, live status). The freshness detector only runs when a cache tier is
195+
active, so a cache-off gateway still pays nothing and stays a lean passthrough. The
196+
flip applies to fresh installs and the no-row default; an existing deployment that
197+
already saved an `extract_cache_config` row keeps its stored value, so **no migration
198+
runs and no admin choice is overwritten**. Operators who already enabled L1/L2 and
199+
want freshness should re-save the extract-cache config (or toggle the Freshness rules
200+
card) once; operators who want maximum hit-rate can leave it off explicitly.
77201
Each default below flips shipped behavior toward higher throughput. An upgrade
78202
silently inherits the new value; the opt-out to restore prior behavior is shown.
79203

@@ -171,6 +295,29 @@ silently inherits the new value; the opt-out to restore prior behavior is shown.
171295
- The in-tree load generator (`tools/loadtest`) was extracted to the standalone
172296
`nexus-loadtest` repository.
173297

298+
### Fixed (gateway response cache correctness)
299+
- **Emergency cache master kill switch is now wired into the data plane.**
300+
`cache_master_kill_switch` (the Tier-1 global cache config) was parsed but never
301+
consulted by the AI Gateway, so flipping it did nothing. It now gates both gateway
302+
response cache tiers — L1 exact-match and L2 semantic — at the cache stage
303+
(`cacheEnabled = (l1||l2) && !cache_master_kill_switch`). It does not disable
304+
provider-side prompt caching (Anthropic markers / Gemini context cache), which only
305+
makes the upstream cache and never serves a stored gateway response.
306+
- **L1 exact-match cache fills regardless of the `cache.broker` flag.** With
307+
`cache.broker=false` (the default) the broker registry was never constructed and the
308+
broker pump is the cache's sole writer, so an admin-enabled L1 tier silently never
309+
filled (0% hit rate). The registry is now always constructed; `cache.broker` controls
310+
only same-key in-flight dedup (coalesce concurrent same-key MISSes onto one upstream
311+
call vs. independent calls) — either way the cache fills.
312+
- **L1 cache no longer serves cross-VK entries during the boot window or on
313+
Sentinel/Cluster Redis.** L1 folds the fleet `vary_by` isolation scope into its cache
314+
key, but that scope arrives on the semantic-cache config push. Before the first push
315+
the scope was unset (fleet-wide), so an entry written in that window could be read by
316+
a different virtual key; and on Sentinel/Cluster Redis the semantic config was never
317+
delivered to the gateway at all. L1 now fails closed (no lookup/store) until the fleet
318+
config has loaded, and the config snapshot (including `vary_by`) is delivered on every
319+
Redis topology — decoupled from the `*redis.Client`-only index lifecycle.
320+
174321
## [1.0.0] — 2026-06-14
175322

176323
First general-availability release. All three intercept planes (AI Gateway,

docs/developers/architecture/cross-cutting/foundation/configuration-architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -166,7 +166,7 @@ Type A = `state` is the config payload (callback applies directly). Type B = `st
166166
| `payload_capture` | A (agent) / B (ai-gateway, compliance-proxy) | ai-gateway, compliance-proxy, agent | agent: `{enabled: bool}`; server: null (receivers re-read from `system_metadata['payload_capture.config']`) |
167167
| `observability` | B (everywhere) | nexus-hub, control-plane, ai-gateway, compliance-proxy | null (receivers re-read from `system_metadata['observability.config']`) |
168168
| `response_cache.time_sensitive_patterns` | A | ai-gateway | Cluster-wide freshness rule list |
169-
| `semantic_cache.config` | A | ai-gateway | Fleet-wide L1 embedding singleton config |
169+
| `semantic_cache.config` | A | ai-gateway | Fleet-wide L1 embedding singleton config (`vary_by` / `enabled` / `threshold`), hot-swapped into the in-process `SemanticConfigCache` independently of semantic-index lifecycle so the fleet config applies even when the index is not yet ready |
170170
| `response_cache.extract_config` | A | ai-gateway | L1 extract cache fleet config (atomic.Pointer hot-swap) |
171171
| `providers` | B | ai-gateway | null — receiver reloads provider snapshot |
172172
| `models` | B | ai-gateway | null — receiver reloads model catalog |

docs/developers/architecture/cross-cutting/foundation/jobs-architecture.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,6 @@ Integrity + observability of the admin audit pipeline.
207207
|---|---|---|---|
208208
| `audit-chain-verify` | `defs/audit/audit_chain_verify.go` | (cfg, RunOnStart) | Walks the `AdminAuditLog` hash chain (`previousHash` / `integrityHash`) and reports tamper detection at ERROR level. |
209209
| `audit-freshness-check` | `defs/audit/audit_freshness_check.go` | 60 sec | Alarms when the most recent admin audit row is older than 5 min — catches the silent-stall failure class where the MQ consumer pulled the message but the INSERT failed. |
210-
| `normalize-backfill` | `defs/audit/normalize_backfill.go` | 5 min | Re-runs normalize against the raw request/response bytes for `traffic_event_normalized` rows whose sidecar is missing, all-NULL, **or stamped with a `normalize_version` other than the current schema version** — bumping `normcore.SchemaVersion` heals every historical row through this one mechanism (≈200 rows / 5-min tick, newest-first). Inline bodies are read directly; ref-only spilled bodies are fetched from the hub `SpillStore` (64 MiB read cap). Rows that cannot be filled are recorded in the `traffic_event_normalize_skip` ledger (`reason` ∈ `spill_ref_only` (no spill store wired) / `spill_fetch_failed` / `no_payload_produced`) with the schema version of the attempt; the scan excludes a skip-marked row only while its stamped version matches the current one, so every previously unfillable row re-admits exactly once per version bump — the newest-first `LIMIT` batch always advances, and the "bump heals everything" invariant covers skip-marked rows too. The marker is backfill-internal (no CP store / Traffic drawer reads it). `nexus_normalize_backfill_skipped_total{reason}` is a one-time-per-row-per-version tally, not a recurring rate. |
211210
| `siem-bridge` | `defs/audit/siem_bridge.go` | `bridge.PollInterval()` | Polls `traffic_event` and `AdminAuditLog` for new rows, classifies them, and forwards them to the configured SIEM sink. Checkpoints persisted in `system_metadata`. Registered unconditionally whenever the scheduler is enabled; the bridge self-activates when `siem.config.enabled` is set in `system_metadata`. |
212211

213212
### 5.8 Quota (1 job)

0 commit comments

Comments
 (0)