Skip to content

Commit aa364ad

Browse files
jamby77KIvanow
andauthored
feature(api): cache proposal data model + service + MCP propose tools (#134)
* feature(api): add cache proposal data model Postgres + SQLite migrations for cache_proposals and cache_proposal_audit, scoped by connection_id. Includes the two indexes required for tenant-status-proposed ordering and the partial pending-lookup index, plus an expires_at index for the expiry cron. Shared types are derived from Zod schemas (utils/cache-proposals) with preprocessors for BIGINT and JSON columns, so adapter row mappers parse rows directly with no dialect handling at the call site. Storage port methods cover create, get, list, status update, expiry, and audit append/read; memory adapter mirrors them for unit tests. SQLite expiry runs in a transaction with status re-check on UPDATE to avoid races against concurrent approvals. * fix(api): apply bugbot findings on cache proposal data model - Memory adapter now structuredClones the input on create, update, and audit append, so caller mutations after the call don't leak into stored state. Read paths also deep-clone via structuredClone instead of a shallow spread, matching the deep behavior the rest of the codebase assumes. - Agent invalidate discriminated-union test now throws on narrowing failure to match the semantic-invalidate test (was silently green if the narrow ever broke). * fix(api): apply round-2 bugbot findings on cache proposal data model - Storage updateCacheProposalStatus now validates any proposal_payload override against the existing row's (cache_type, proposal_type) via the new variantPayloadSchemaFor helper in @betterdb/shared. Prevents rows from being poisoned with a payload shape that doesn't match the discriminator. Throws ZodError on mismatch (mapped to HTTP 400 by the controller). - Empty options.status array no longer produces "status IN ()" invalid SQL; short-circuits to an empty result and skips the filter when status is undefined. - SQLite expireCacheProposalsBefore uses RETURNING * in a single UPDATE, eliminating the candidate+update dance that could return rows already expired before the call. * feature(api): add cache proposal service and 3 MCP propose tools CacheProposalService with type-specific validation, duplicate-pending rejection, and per-connection 30/hour rate limit. Resolves caches via HGETALL __betterdb:caches (discovery markers from PRs #127/#128). Typed errors (validation, invalid cache type, not found, duplicate, rate limited) are mapped to HTTP status codes at the MCP controller. Three MCP tools registered: - cache_propose_threshold_adjust (semantic_cache only, 0..2 range) - cache_propose_tool_ttl_adjust (agent_cache only, 10..86400 range) - cache_propose_invalidate (filter_kind discriminates by type; warns when estimated_affected > 10000) All proposals start as 'pending' with a 24h expiry, awaiting human approval (Day 5 spec). The module imports StorageModule for proposal CRUD and ConnectionsModule for the Valkey client used to read markers. Stacks on feature/cache-proposal-data-model (PR #134). * fix(api): apply bugbot findings on cache proposal service - CacheResolverService no longer takes ttlMs/now as constructor parameters; NestJS was attempting to resolve them as Number/Function injection tokens and crashing on module init. ttlMs and now are now fields with defaults, settable via configureForTesting() in tests. - Rate limiter now exposes a single atomic reserve() that combines check+record, and CacheProposalService.persist() calls it before the storage write. Previously check() and record() were separated by an awaited DB call, allowing concurrent requests to overshoot the limit by 1-2. - ZodError thrown from service-layer schema parses now maps to HTTP 400 with structured `issues` instead of falling through to a 500. * fix(api): apply round-2 bugbot findings on cache proposal service - Rate limiter check() returned remaining = limit - events - 1, understating available slots by one. Corrected to limit - events; reserve() still records exactly one event per allowed call. - "Does not count proposals against other connections" test was misnamed: it just retested the limit on the same connection_id. Now exhausts the limit on CONNECTION_ID, then proposes against OTHER_CONNECTION_ID and asserts the proposal succeeds, verifying per-connection isolation. * fix(api): rate limiter reserve() returns post-record remaining reserve() previously returned the remaining count from check(), which is the pre-record count. For an allowed reservation that records one event, the returned remaining was off by one. Now decrements remaining by one (clamped at zero) when allowed. * fix(api): release rate-limit slot when storage write fails Previously a transient storage failure (DB hiccup, connection loss) would consume a rate-limit slot without producing a proposal, prematurely exhausting the per-connection 30/hour budget. The service now wraps the storage call in try/catch and calls rateLimiter.release() on failure to free the slot. Concurrency safety is preserved because the reservation is still made before the awaited write, so concurrent callers can't overshoot the limit. * fix(api): apply round-3 bugbot findings on combined cache proposals PR - Rate limiter reservations now carry a unique releaseToken; the service passes the token to release(key, token) on storage failure, freeing the exact reservation rather than whichever event was newest. Two concurrent reservations with one failing no longer corrupts the bucket. - Storage updateCacheProposalStatus accepts an optional expected_status filter, applied as WHERE status IN (...). If the row is no longer in an allowed state the update is a no-op and returns null. Prevents a stale approve from resurrecting an expired/applied/rejected proposal. - Unique partial indexes on cache_proposals enforce per-pending uniqueness for threshold_adjust (by category, NULL = global) and tool_ttl_adjust (by tool_name) at the DB layer, closing the TOCTOU between rejectIfDuplicatePending and the insert. The service catches the resulting unique-constraint violation (Postgres 23505 / SQLite SQLITE_CONSTRAINT*) and surfaces it as the same DuplicatePendingProposalError. - CacheResolverService caches negative lookups for 2s instead of 30s so a propose call right after marker registration recovers quickly instead of waiting out the full positive TTL. * chore(api): consolidate discovery-marker key constants in shared Adds packages/shared/src/utils/discovery-protocol.ts exporting REGISTRY_KEY, PROTOCOL_KEY, HEARTBEAT_KEY_PREFIX, the protocol version, and a heartbeatKeyFor() helper. CacheResolverService now imports these instead of redefining the literals locally. The cache packages (@betterdb/semantic-cache, @betterdb/agent-cache) intentionally keep their own copies — they ship to npm and don't depend on @betterdb/shared. Names match (REGISTRY_KEY, HEARTBEAT_KEY_PREFIX) so the duplication is reconcile-by-grep rather than reconcile-by-rename. * refactor(api): introduce SEMANTIC_CACHE / AGENT_CACHE constants Adds two as-const string constants to @betterdb/shared, exported next to CacheTypeSchema: export const SEMANTIC_CACHE = 'semantic_cache' as const; export const AGENT_CACHE = 'agent_cache' as const; Replaces runtime string literals in CacheResolverService and CacheProposalService with these constants. Type-position usages (interface fields, type unions) switch to the existing CacheType alias instead of repeating the literal union inline. The cache packages keep their own per-package CACHE_TYPE constant because they don't depend on @betterdb/shared. * fix(api): apply round-4 bugbot findings on cache proposal service - updateCacheProposalStatus now treats expected_status: [] the same as listCacheProposals does — short-circuits to null instead of generating "status IN ()" invalid SQL on Postgres + SQLite or silently returning null via [].includes() on the memory adapter. - rejectIfDuplicatePending no longer caps at the 100-row default page size of listCacheProposals. It now pages through pending proposals (200 per page, up to 10 pages = 2000) so a connection with many pending proposals across distinct categories can't hide a real duplicate from the pre-check. The DB unique partial indexes still backstop the pre-check on insert. - isUniqueConstraintViolation no longer matches every SQLite error code starting with SQLITE_CONSTRAINT (which includes CHECK, NOTNULL, FOREIGNKEY, etc.). Only SQLITE_CONSTRAINT_UNIQUE and SQLITE_CONSTRAINT_PRIMARYKEY count as duplicate-pending now; other constraint failures bubble up as the original error. * chore(api): disable explicit return type rule in ESLint config * fix(api): make pending unique indexes treat NULL category/tool_name as duplicate bugbot flagged that the partial unique indexes on cache_proposals used JSON-extracted category / tool_name directly. Both Postgres and SQLite treat NULL as distinct in unique indexes, so two pending threshold-adjust proposals with category=null on the same (connection, cache) would both land — bypassing the DB-level dedup. Wrap the extracts in COALESCE to a sentinel string so NULLs collide. Drop the old indexes first so existing dev DBs pick up the new shape. Add a regression test confirming a second pending threshold_adjust with category=null is rejected by the DB. * fix(api): align proposal_payload validation order with expected_status across adapters bugbot LOW: Postgres + SQLite parsed proposal_payload via Zod before checking expected_status, while the memory adapter checked expected_status first. A stale write that updated both fields and failed the status precondition would surface as a ZodError on SQL but return null on memory — inconsistent client-visible behavior. Read the existing row once when either field is provided, run the expected_status guard before the variant payload parse. * fix(api): rename pending unique indexes to _v2 + keep rate-limit slot on dup-pending bugbot MEDIUM (1): the previous fix unconditionally ran DROP INDEX + CREATE UNIQUE INDEX on every API startup, taking ACCESS EXCLUSIVE on cache_proposals each time and rebuilding the index — a recurring write outage proportional to row count. Rename the COALESCE-fixed indexes to _v2 and keep CREATE IF NOT EXISTS, so first-startup migrates dev DBs once and subsequent startups skip both DROP and CREATE. bugbot MEDIUM (2): persist() refunded the rate-limit slot on every storage error, including unique-constraint violations turned into DuplicatePendingProposalError. A client guessing an existing pending (cache, scope) could spam the endpoint with 409s at zero rate-limit cost. Refund only on non-unique storage failures. * feature(api): add 6 read-only cache MCP tools CacheReadonlyService exposes: - listCaches: enumerate __betterdb:caches with hit rate + live status - cacheHealth: discriminated union by cache_type; semantic returns category breakdown + uncertain_hit_rate, agent returns per-tool breakdown - thresholdRecommendation: replicates SemanticCache.thresholdEffectiveness by reading the rolling similarity window directly - toolEffectiveness: replicates AgentCache.toolEffectiveness from the __stats hash - similarityDistribution: 20 fixed buckets of width 0.1 across the 0..2 cosine distance range, with optional category and window-hours filters - recentChanges: thin wrapper over listCacheProposals filtered by cache_name (any status, newest first) Six matching MCP tools registered, plus six GET endpoints on the existing McpController. Per-type dispatch raises InvalidCacheTypeError (HTTP 400 / code INVALID_CACHE_TYPE) when a semantic-only or agent-only tool is called against the wrong cache type. The library methods (thresholdEffectiveness, toolEffectiveness) require SemanticCache.initialize(), which would side-effect-create the FT index, so we read the underlying Valkey state directly instead of spawning a transient cache instance per request. * refactor(api): use shared discovery-protocol keys + extract readonly types CacheReadonlyService now imports REGISTRY_KEY and heartbeatKeyFor() from @betterdb/shared (added in the parent commit on this branch's base) instead of redefining the literals locally. Ensures the API reader and the cache-package writers reference the same constants. The 12 exported types describing the read-only response shapes (CacheHealth, CacheListEntry, ThresholdRecommendation, etc.) move to a sibling cache-readonly.types.ts file. The service file re-exports them so existing import paths continue to work, but the types now live separately and the service file shrinks back to behavior-only. * refactor(api): extract readHashInt helper and simplify CacheReadonlyService The local readInt() in CacheReadonlyService duplicated a pattern already used inside AgentCache.stats() and was a minor utility worth sharing. Extracted to apps/api/src/common/utils/valkey-fields.ts as readHashInt(record, field) and the service now imports it. Drive-by cleanups while in the file: - Drop two `const health: ... = { ... }; return health;` patterns in favor of returning the literal directly. - Collapse the two-step recordedAt + category filter in similarityDistribution into a single boolean expression. * refactor(api): use SEMANTIC_CACHE / AGENT_CACHE in CacheReadonlyService Picks up the constants added on the parent branch and replaces the remaining 'semantic_cache' / 'agent_cache' string literals in cache-readonly.service.ts and the discriminator types in cache-readonly.types.ts (via typeof SEMANTIC_CACHE / typeof AGENT_CACHE). The unknown-narrowing on parsed.type after the two literal-inequality checks doesn't propagate cleanly when the literals come via const imports; an explicit `as CacheType` at the assignment site keeps control-flow narrowing where it matters and avoids cluttering the checks. * refactor(api): pull recommendation strings + reasonings into named constants cache-readonly.types.ts now exports two const-object enums: THRESHOLD_RECOMMENDATIONS = { TIGHTEN, LOOSEN, OPTIMAL, INSUFFICIENT_DATA } TOOL_EFFECTIVENESS_RECOMMENDATIONS = { INCREASE_TTL, OPTIMAL, DECREASE_TTL_OR_DISABLE } with the existing union types (ThresholdRecommendationKind, ToolEffectivenessRecommendation) derived from them via typeof[keyof typeof]. The four threshold-recommendation reasoning templates also move out of the service, into a THRESHOLD_REASONINGS map of small functions that take only the relevant context. The "% of foo" formatting is now expressed once via a local formatPct helper. CacheReadonlyService imports the consts and the reasoning map and references them by name, so adding or renaming a recommendation no longer requires touching the inline strings in two places. * refactor(api): rename valkey-fields → record-fields The helper takes a generic Record<string, string>; nothing about it is Valkey-specific. The first caller happens to feed it HGETALL output, but a future caller could feed it process.env or any other string-valued record without surprise. apps/api/src/common/utils/valkey-fields.ts → record-fields.ts readHashInt(raw, field) → readIntField(record, field) * fix(api): listCaches reads __stats from prefix, not name readBaseStats(client, key) constructs ${key}:__stats. listCaches was passing marker.name; for caches whose discovery-marker name happens to equal their prefix this is fine, but the protocol allows the two to differ (e.g. an alias-style registration). Stats would then read from the wrong key and report zeros even for an active cache. Adds a regression test where ALIAS != PREFIX and asserts stats are sourced from PREFIX. * fix(api): drop redundant avgNearMissDelta guard in loosen-threshold branch bugbot LOW: nearMisses is filtered to s.score <= threshold + 0.03 so avgNearMissDelta is constrained to (0, 0.03] by construction. The extra '< 0.03' guard never excluded anything useful but did suppress the loosen recommendation in the boundary case where every near miss sits at exactly threshold + 0.03 (delta avg == 0.03), falling through to 'optimal' despite a high near-miss rate. * feature(api): add cache proposal approve/reject/edit + apply dispatcher - Add approve/reject/editAndApprove/expireProposals on CacheProposalService sharing logic between HTTP controller (actor_source='ui') and MCP tools (actor_source='mcp') - Add CacheApplyService that runs Valkey work and transitions approved -> applied|failed with idempotency on re-entry - Add CacheApplyDispatcher with 4 handlers: - semantic threshold_adjust: HSET {cache_name}:__config (gated on threshold_adjust capability advertised in marker) - agent tool_ttl_adjust: HSET {cache_name}:__tool_policies - semantic invalidate: FT.SEARCH + DEL - agent invalidate: SCAN + DEL on tool/key_prefix/session pattern - Add CacheExpirationCron (5 min cadence, setInterval pattern) marking past-due pending proposals as expired with system audit - Add /cache-proposals HTTP controller with 6 endpoints - Add 5 MCP approval endpoints in mcp.controller.ts - Extract shared mapCacheProposalErrorToHttp helper used by both controllers and add typed errors for proposal lifecycle - Tests: 18 new specs covering approve/reject/edit/expire flows and dispatcher behaviour * fix(api): address bugbot findings on apply dispatcher PR - HIGH: parseFtSearchKeys was iterating with i+=2, missing every other key when FT.SEARCH was called with RETURN 0 (which behaves like NOCONTENT and returns just [count, key1, key2, ...]). Step by 1. - LOW: extract optionalString, optionalFiniteNumber, formatApprovalResult to controller-helpers.ts, share between cache-proposal.controller and mcp.controller - Add regression test exercising RETURN 0 response shape * fix(api): scope agent_cache key_prefix invalidate to cache namespace bugbot HIGH: applyAgentInvalidate's key_prefix branch built a SCAN pattern as `<filter_value>*` without prepending `<cache.name>:`, unlike the tool/session branches. This let a key_prefix invalidation match keys belonging to other caches or unrelated application data on the same Valkey instance. Scope the pattern to the cache namespace and add a regression test. * fix(api): plumb proposalId into all apply-dispatcher errors + escapeGlob cache name bugbot MEDIUM: applySemanticThresholdAdjust, applyAgentToolTtlAdjust, applySemanticInvalidate were passing cache.name as the first arg to ApplyFailedError, which the constructor labels as proposalId — that wrong value was landing in applied_result.details. Plus, the constructor spread '{ proposalId, ...details }' allowed a stale proposalId in details to overwrite the explicit one. Pass proposalId through to all dispatcher methods, attach cacheName separately in details, and put the explicit proposalId last in the spread so it always wins. bugbot LOW: tool and session SCAN patterns interpolated cache.name without escaping while key_prefix did. Apply escapeGlob uniformly to cache.name across all three branches. * feature(web): cache proposals UI with pending/history views Add /cache-proposals route with Pending and History tabs. Pending cards render four (cache_type, proposal_type) variants with approve, reject, and edit-and-approve flows. Edit hidden on invalidate cards. History table filters by status/cache_name and opens a detail panel with full reasoning, payload, and audit trail. Sidebar shows an unread badge for new pending proposals. * fix(web): share cache-proposals unread state across hook instances Use a module-level subscription store with useSyncExternalStore so the sidebar badge clears when CacheProposals.markAllRead() runs. Wraps markAllRead in useCallback to avoid firing the page-level effect every render. Drops the unused cacheProposalQueryKeys export. * fix(web): correct cache-proposals source column and unread count - proposalSource now reads proposed_by only (was falling through to reviewed_by, mislabelling UI-proposed/MCP-reviewed entries as 'mcp'). - Track unread by last-seen timestamp instead of id, so when the marker proposal is approved/rejected/expired it doesn't inflate the unread count to all remaining (already-seen) pending proposals. * fix(web): enforce monotonic lastSeenAt to prevent unread badge regression Math.max guards setLastSeenAt so a markAllRead call where the newest pending proposal has a smaller timestamp (e.g. the previous newest was approved by another user) cannot move the marker backward and resurrect already-seen proposals as unread. * fix(api): apply dispatcher uses cache.prefix, not cache.name, for Valkey keys The discovery marker exposes name and prefix separately; they can differ. CacheReadonlyService already reads from cache.prefix, but CacheApplyDispatcher was writing config / tool-policy / SCAN pattern keys using cache.name, so threshold and TTL writes silently went to the wrong key when the two diverged. Adds regression tests with distinct name/prefix values. * fix(api): semantic invalidate FT.SEARCH index uses cache.prefix * fix(api): wrap history endpoint in mapCacheProposalErrorToHttp ProposalStatusSchema.parse on bogus status values threw an unhandled ZodError, returning a generic 500 instead of the structured 400 the mapper produces (matches every other endpoint in the controller). * fix(api): reject unsupported cache_type/proposal_type combos in dispatch Replaces the implicit fallthrough to applyAgentInvalidate with an explicit guard. Previously, an unexpected combination (e.g. agent_cache/threshold_adjust from a future schema change) would have been silently routed to applyAgentInvalidate with an incompatible payload shape. * fix(api): approve idempotency on failed proposal returns prior result runApply now throws ApplyFailedError only for fresh failures (input was 'approved'). When the input proposal is already terminal ('applied' or 'failed'), applyService.apply short-circuits and returns the prior applied_result; runApply now passes that through without throwing — mirroring the existing 'applied' behaviour and fixing the asymmetry where approving a 'failed' proposal threw. * fix(api): address bugbot findings on cache-proposals - mcp.controller: move filter_kind validation BadRequestException out of the try-block so it isn't double-handled by the error mapper, matching the threshold/ttl endpoints. - cache-approval.service.spec: pass explicit 'system' actorSource to expireProposals to match production cron call shape. - memory.adapter: enforce the (connection_id, cache_name, proposal_type) unique-on-pending constraint that SQL adapters get from the partial unique index — closes the gap for invalidate proposals (which the service has no extra duplicate guard for) on Memory storage. * fix(api): memory adapter dup check matches SQL partial-index sub-key The SQL partial unique indexes scope on (connection_id, cache_name, COALESCE(category|tool_name, '__betterdb_null__')) per proposal_type; my earlier memory-adapter check ignored the sub-discriminator and would have wrongly blocked threshold proposals for different categories or TTL proposals for different tools. Mirror the SQL behaviour exactly. Invalidate has no SQL constraint, so memory imposes none either. * fix(api): read live current threshold/TTL from Valkey when proposing Replace the always-zero stubs in readCurrentThreshold / readCurrentToolTtl with reads from the same Valkey hashes that the readonly service and apply dispatcher use ({prefix}:__config and {prefix}:__tool_policies). Falls back to 0 with a logged warning when ConnectionRegistry isn't injected (e.g. in unit-test harnesses) or the read fails. ConnectionRegistry is wired in via @optional so existing test wiring keeps working. * fix(api): cache-proposals storage tests use distinct categories The tests created multiple pending proposals with the same (connection_id, cache_name, category), which the SQL adapters' partial unique index would also reject. The memory adapter now matches that constraint, so use distinct categories where the test needs multiple pending entries. * fix(api): simplify dead-code branch in readCurrentThreshold * fix(api): read dispatcher-written threshold override before SDK baseline The apply dispatcher writes runtime threshold overrides to fields 'threshold' and 'threshold:<category>' on {prefix}:__config. The reader was only checking the SDK-published baseline fields (default_threshold and category_thresholds JSON), so after any applied threshold proposal, subsequent proposals reported the original baseline as 'current_threshold' instead of the actually effective value. Read overrides first, fall back to baseline. * fix(api): deep-clone expired proposal in memory adapter Match the structuredClone used in createCacheProposal / updateCacheProposalStatus so expireCacheProposalsBefore doesn't share mutable proposal_payload / applied_result references with the caller's previously-returned copies. * fix(api): editAndApprove fails fast on unsupported proposal_type Replace the fall-through after the threshold_adjust / tool_ttl_adjust branches with a thrown ProposalEditNotAllowedError. Previously, an unhandled proposal_type would have left newPayload undefined and the storage call would silently skip the payload update — approving the proposal without applying the requested edit. * fix(api): unify expiry boundary to <= across service and storage Memory/SQLite/Postgres expireCacheProposalsBefore all use expires_at <= now (inclusive), so a proposal at exactly now is expired by the cron. The service-layer guards in transitionToApproved and requireFreshPending used strict <, leaving a one-tick window where the cron treats the proposal as expired but approve/reject treats it as still valid. Switch the service to <= to match. * feature(semantic-cache): runtime threshold overrides via {prefix}:__config - check()/checkBatch() now read HGETALL {prefix}:__config (5s in-process cache) and honor 'threshold' and 'threshold:{category}' fields as runtime overrides on top of constructor categoryThresholds - Resolution order: options.threshold > runtime category > runtime global > constructor categoryThresholds > defaultThreshold - Read failures fall back to constructor (warn-logged); out-of-range values (<0, >2, NaN) are dropped - Advertise 'threshold_adjust' in the discovery marker's capabilities so Monitor's apply dispatcher can write the config hash - Bump to 0.4.0; CHANGELOG entry flags the {prefix}:__config behavior change * feature(mcp): wire 5 cache-proposal approval tools - cache_list_pending_proposals, cache_get_proposal, cache_approve_proposal, cache_reject_proposal, cache_edit_and_approve_proposal - Each wraps the corresponding pre-existing endpoint on McpController (apps/api/src/mcp/mcp.controller.ts:688-781), which sets actorSource='mcp' on every approval/reject/edit call - Bumps @betterdb/mcp to 1.2.0 * docs(mcp): document cache intelligence tools + sync server.json version - README adds a 'Cache Intelligence Tools' section covering all 14 cache tools (6 read-only, 3 propose, 5 approval) plus two example prompts - server.json version bumped 1.0.0 -> 1.2.0 to match package.json (the registry manifest doesn't enumerate tools; tools/list is exposed via the MCP protocol from server.tool registrations) * refactor(cache-proposals): move entire feature to proprietary as Pro tier Per review feedback, cache intelligence becomes a Pro feature gated by a new CACHE_INTELLIGENCE entitlement. Backend: - Move apps/api/src/cache-proposals/ -> proprietary/cache-proposals/ - Module becomes @global and is conditionally loaded via try/catch in app.module.ts, matching the inference-latency-pro pattern - Extract MCP cache routes from mcp.controller.ts into a new proprietary CacheProposalMcpController so the routes simply don't exist when the proprietary module isn't loaded; community-tier deployments return 404 on every cache endpoint and the MCP tools surface that to the agent - HTTP and MCP controllers both gate on @UseGuards(LicenseGuard) + @RequiresFeature(Feature.CACHE_INTELLIGENCE), returning 402 when not entitled - Extract shared MCP helpers (ValidateInstanceIdPipe, safeLimit, etc.) into apps/api/src/mcp/mcp-helpers.ts so both controllers share them - Update internal imports in moved files to use @app/* aliases Shared: - Add Feature.CACHE_INTELLIGENCE under Tier.pro Frontend: - NavItem for /cache-proposals adds requiredFeature so the link locks for non-entitled users (matches Anomaly Detection pattern) - usePendingProposals accepts an enabled flag; useCacheProposalsUnread short-circuits on non-entitled licenses so we don't poll every 15s and get 402s Tests pass: api 1228/1235, web 175/175. No regressions vs. pre-move. * test(web): cover HistoryTable, DetailPanel, useCacheProposalsUnread + mcp CHANGELOG Closes the test gaps surfaced in the C5 audit: - HistoryTable: Source column derivation from proposed_by prefix, empty state, cache_name filter wiring through to useHistoryProposals - DetailPanel: full data render (cache header, reasoning, payload, apply result, audit trail), empty audit, loading and error branches - useCacheProposalsUnread: entitlement gate skips polling, count when no lastSeenAt, markAllRead persists newest proposed_at and zeroes count Also: add CHANGELOG.md to packages/mcp documenting 1.2.0 release with the 5 new cache-intelligence approval tools and their Pro tier requirement. Web suite now: 187/187 (was 175). * feat(cache-proposals): runtime config refresh for agent-cache and sem… (#148) * feat(cache-proposals): runtime config refresh for agent-cache and semantic-cache (TS + Python) Implements the full propose→approve→apply→pickup loop so BetterDB Monitor cache proposals take effect in running processes without a restart. - Periodic refresh of `{name}:__tool_policies` (default 30 s, opt-out via `configRefresh: { enabled: false }`). First refresh fires synchronously on construction; subsequent ticks run on a `setInterval`. - `ToolCache.refreshPolicies()` — atomic swap (clear + repopulate), returns bool. `loadPolicies()` now delegates to it; stale entries are evicted. - New Prometheus counter `{prefix}_config_refresh_failed_total`. - New `ConfigRefreshOptions` type exported from the package root. - Periodic refresh of `{name}:__config` (same interval/opt-out pattern). Fields: `threshold` → `defaultThreshold`; `threshold:{cat}` → `categoryThresholds[cat]`. Constructor values are fallbacks when absent. - `refreshConfig()` public method with per-field range validation (0–2). - Adds `threshold_adjust` to the discovery capabilities array, unblocking `cache_propose_threshold_adjust` in Monitor. - New `{prefix}_config_refresh_failed_total` counter. - New `ConfigRefreshOptions` type exported from the package root. - `escapeTag` exported from the package root (both TS and Python). - Discovery marker protocol (0.5.0): registers `__betterdb:caches` entry and 30 s heartbeat on construction; `shutdown()` removes the heartbeat. New `DiscoveryOptions`, `{prefix}_discovery_write_failed_total` counter. - Config refresh (0.6.0): `asyncio` task loop mirrors TS behaviour — first refresh before first sleep. `ToolCache.refresh_policies()` atomic swap. New `ConfigRefreshOptions`. `{prefix}_config_refresh_failed_total`. - New `examples/monitor_proposals/main.py` demonstrating the full loop. - Missing test coverage added: `refresh_policies()` (6 tests), `AgentCache` config refresh (6 tests + counter), `SessionStore.get_all()`, `destroy_thread()`, `scan_fields_by_prefix()` (13 tests). - `aiohttp` declared as `[normalizer]` optional extra in `pyproject.toml`. - Discovery marker protocol: registers on `initialize()`; capabilities include `['invalidate', 'similarity_distribution', 'threshold_adjust']`. Cross-type collision raises `SemanticCacheUsageError`. `flush()` stops the old manager before dropping the index (matches TS concurrency semantics). New `DiscoveryOptions`, `{prefix}_discovery_write_failed_total` counter. - Config refresh: `asyncio` task loop, `refresh_config()` with field-level validation, constructor fallbacks, per-category support. New `ConfigRefreshOptions`. `{prefix}_config_refresh_failed_total`. - New `examples/monitor_proposals/main.py` with deterministic content-word mock embedder (stopwords stripped, DJB2 hash, dim=64). Output is bit-for-bit identical to the TypeScript equivalent. - `escape_tag` exported from the package root. - New `test_config_refresh.py` (14 tests) and `test_discovery.py` (21 tests). - `CacheApplyDispatcher.applySemanticInvalidate`: corrected FT index name from `{prefix}:__index` to `{prefix}:idx` (all semantic invalidation proposals were silently deleting 0 entries against a non-existent index). - Dispatcher test `FakeClient.call()` now captures arguments so index name and filter expression can be asserted. - New dispatcher contract tests: index name, filter forwarding, field format agreement between dispatcher writes and library reads. - `cache-proposal.service.spec.ts`: `readCurrentThreshold` and `readCurrentTtl` tested with a fake registry, verifying the apply→re-propose cycle reads the dispatcher-written value. * fix: address roborev findings (High + Medium + Low) High — ensure_discovery_ready() hung indefinitely agent_cache.py: track the discovery registration in a dedicated _discovery_task field and await only that task in ensure_discovery_ready(), not all _background_tasks. The config-refresh loop is an infinite task that never completes on its own; gathering it blocked the caller permanently. Medium — cache_edit_and_approve_proposal accepted both edit fields at once mcp/src/index.ts: add a mutual-exclusion guard that returns an error when both new_threshold and new_ttl_seconds are provided. The tool description says 'provide exactly one'; now the contract is enforced in code. Low — DiscoveryOptions defined in two places (types.py and discovery.py) discovery.py: remove the duplicate @DataClass definition and import DiscoveryOptions from types.py, the single canonical location already re-exported by __init__.py. Low — dead code in mock_embed() semantic-cache-py examples/monitor_proposals/main.py: the first words = list({...}) set-comprehension was immediately overwritten by the cleaned loop below it. Remove the dead first pass; keep only the strip-then-filter loop that produces the correct deduplicated word list. * refactor(semantic-cache): drop redundant B3 read-time threshold layer (#151) PR #134's earlier B3 commit added a 5s-TTL read-time override (HGETALL on each check()) and PR #148's commit added a 30s background refresh that mutates defaultThreshold/categoryThresholds in-place. Both read the same {prefix}:__config hash; running both is duplicated work and the file even ended up with a duplicate `private readonly configKey: string` field declaration. Keep the 30s background-refresh approach (cleaner lifecycle, opt-out flag, prometheus counter, no per-call overhead) and delete the B3 machinery: - Removes private fields thresholdOverrides, thresholdOverridesCachedAt, thresholdOverridesRefresh and the THRESHOLD_OVERRIDES_TTL_MS constant. - Removes private helpers resolveThreshold, getThresholdOverrides, refreshThresholdOverrides. - Restores check()/checkBatch() threshold resolution to the simple options.threshold > categoryThresholds[category] > defaultThreshold chain; refreshConfig() updates those mutable fields. - Deletes runtime-threshold-overrides.test.ts (covered the deleted helpers). - Removes the duplicate configKey field declaration and constructor assignment. - CHANGELOG: drop the read-time-overrides bullet, expand the periodic-refresh bullet to spell out hash field semantics and the synchronous-first-tick guarantee, and reword the Behavior change note. Tests: 128/128 pass. Trade-off: propagation goes from ~5s to ~30s worst-case, which is acceptable given the human-approval flow upstream. * fix(cache-proposals): use @app alias for ConnectionRegistry import in spec PR #148 added a ConnectionRegistry import to cache-proposal.service.spec.ts using a relative path that doesn't resolve from proprietary/. Switch to the @app alias to match every other import in the file. CI api-tests run was failing TS2307 on this line; nothing else changes. --------- Co-authored-by: Kristiyan Ivanov <kristiyan@betterdb.com>
1 parent c8bf609 commit aa364ad

113 files changed

Lines changed: 14523 additions & 162 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

apps/api/eslint.config.mjs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ export default tseslint.config(
2424
},
2525
},
2626
rules: {
27-
'@typescript-eslint/explicit-function-return-type': 'error',
27+
'@typescript-eslint/explicit-function-return-type': 'off',
2828
'@typescript-eslint/explicit-module-boundary-types': 'off',
2929
'@typescript-eslint/no-explicit-any': 'error',
3030
'@typescript-eslint/no-unused-vars': [

apps/api/src/app.module.ts

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ let KeyAnalyticsModule: any = null;
2929
let AnomalyModule: any = null;
3030
let WebhookProModule: any = null;
3131
let InferenceLatencyProModule: any = null;
32+
let CacheProposalsModule: any = null;
3233
let AgentModule: any = null;
3334
let DataRetentionModule: any = null;
3435

@@ -81,6 +82,14 @@ try {
8182
// Proprietary module not available
8283
}
8384

85+
try {
86+
const cacheProposalsModule = require('../../../proprietary/cache-proposals/cache-proposals.module');
87+
CacheProposalsModule = cacheProposalsModule.CacheProposalsModule;
88+
console.log('[CacheProposals] Proprietary module loaded');
89+
} catch {
90+
// Proprietary module not available
91+
}
92+
8493
if (process.env.CLOUD_MODE) {
8594
try {
8695
const agentModule = require('../../../proprietary/agent/agent.module');
@@ -144,6 +153,7 @@ const proprietaryImports = [
144153
AnomalyModule,
145154
WebhookProModule,
146155
InferenceLatencyProModule,
156+
CacheProposalsModule,
147157
AiModule,
148158
AgentModule,
149159
DataRetentionModule,

apps/api/src/common/interfaces/storage-port.interface.ts

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,21 @@ export type {
3030
VectorIndexSnapshotQueryOptions,
3131
} from '@betterdb/shared';
3232
export type { MetricForecastSettings, MetricKind } from '@betterdb/shared';
33+
export type {
34+
CacheType,
35+
ProposalType,
36+
ProposalStatus,
37+
ProposalAuditEvent,
38+
ActorSource,
39+
ProposalPayload,
40+
StoredCacheProposal,
41+
StoredCacheProposalAudit,
42+
CreateCacheProposalInput,
43+
ListCacheProposalsOptions,
44+
UpdateProposalStatusInput,
45+
AppendProposalAuditInput,
46+
AppliedResult,
47+
} from '@betterdb/shared';
3348
import type {
3449
AppSettings,
3550
AuditQueryOptions,
@@ -53,6 +68,12 @@ import type {
5368
Webhook,
5469
WebhookDelivery,
5570
WebhookEventType,
71+
StoredCacheProposal,
72+
StoredCacheProposalAudit,
73+
CreateCacheProposalInput,
74+
ListCacheProposalsOptions,
75+
UpdateProposalStatusInput,
76+
AppendProposalAuditInput,
5677
} from '@betterdb/shared';
5778

5879
// Anomaly Event Types
@@ -480,4 +501,13 @@ export interface StoragePort {
480501
saveMetricForecastSettings(settings: MetricForecastSettings): Promise<MetricForecastSettings>;
481502
deleteMetricForecastSettings(connectionId: string, metricKind: MetricKind): Promise<boolean>;
482503
getActiveMetricForecastSettings(): Promise<MetricForecastSettings[]>;
504+
505+
// Cache Proposal Methods
506+
createCacheProposal(input: CreateCacheProposalInput): Promise<StoredCacheProposal>;
507+
getCacheProposal(id: string): Promise<StoredCacheProposal | null>;
508+
listCacheProposals(options: ListCacheProposalsOptions): Promise<StoredCacheProposal[]>;
509+
updateCacheProposalStatus(input: UpdateProposalStatusInput): Promise<StoredCacheProposal | null>;
510+
expireCacheProposalsBefore(now: number): Promise<StoredCacheProposal[]>;
511+
appendCacheProposalAudit(input: AppendProposalAuditInput): Promise<StoredCacheProposalAudit>;
512+
getCacheProposalAudit(proposalId: string): Promise<StoredCacheProposalAudit[]>;
483513
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
export function readIntField(record: Record<string, string>, field: string): number {
2+
const value = record[field];
3+
if (value === undefined || value === '') {
4+
return 0;
5+
}
6+
const parsed = parseInt(value, 10);
7+
return Number.isNaN(parsed) ? 0 : parsed;
8+
}

apps/api/src/mcp/mcp-helpers.ts

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import { BadRequestException, Injectable, PipeTransform } from '@nestjs/common';
2+
3+
export const INSTANCE_ID_RE = /^[a-zA-Z0-9_-]+$/;
4+
export const MAX_LIMIT = 10000;
5+
6+
@Injectable()
7+
export class ValidateInstanceIdPipe implements PipeTransform<string, string> {
8+
transform(value: string): string {
9+
if (!INSTANCE_ID_RE.test(value)) {
10+
throw new BadRequestException('Invalid instance ID');
11+
}
12+
return value;
13+
}
14+
}
15+
16+
export function safeParseInt(value: string | undefined, defaultValue: number): number;
17+
export function safeParseInt(value: string | undefined, defaultValue?: undefined): number | undefined;
18+
export function safeParseInt(value: string | undefined, defaultValue?: number): number | undefined {
19+
if (value === undefined) {
20+
return defaultValue;
21+
}
22+
const parsed = parseInt(value, 10);
23+
if (isNaN(parsed)) {
24+
return defaultValue;
25+
}
26+
return parsed;
27+
}
28+
29+
/** Parse and cap a limit/count query param */
30+
export function safeLimit(value: string | undefined, defaultValue: number): number {
31+
return Math.max(1, Math.min(safeParseInt(value, defaultValue), MAX_LIMIT));
32+
}
33+
34+
/** Convert ms timestamp query param to seconds. */
35+
export function msToSeconds(value: string | undefined): number | undefined {
36+
const ms = safeParseInt(value);
37+
if (ms === undefined || ms < 0) {
38+
return undefined;
39+
}
40+
return Math.floor(ms / 1000);
41+
}

apps/api/src/mcp/mcp.controller.ts

Lines changed: 5 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
import { Controller, Get, Post, Body, Param, Query, HttpException, HttpStatus, UseGuards, Optional, Inject, BadRequestException, PipeTransform, Injectable, Logger } from '@nestjs/common';
1+
import { Controller, Get, Post, Body, Param, Query, HttpException, HttpStatus, UseGuards, Optional, Inject, BadRequestException, Logger } from '@nestjs/common';
22
import { ANOMALY_SERVICE } from '@betterdb/shared';
33
import { UsageTelemetryService } from '../telemetry/usage-telemetry.service';
44
import { ConnectionRegistry } from '../connections/connection-registry.service';
@@ -9,47 +9,16 @@ import { ClientAnalyticsAnalysisService } from '../client-analytics/client-analy
99
import { ClusterDiscoveryService } from '../cluster/cluster-discovery.service';
1010
import { ClusterMetricsService } from '../cluster/cluster-metrics.service';
1111
import { StoragePort } from '../common/interfaces/storage-port.interface';
12+
import { MAX_LIMIT, ValidateInstanceIdPipe, msToSeconds, safeLimit, safeParseInt } from './mcp-helpers';
1213

13-
const INSTANCE_ID_RE = /^[a-zA-Z0-9_-]+$/;
1414
const EVENT_NAME_RE = /^[a-zA-Z0-9_.-]+$/;
1515
const VALID_ORDER_BY = new Set(['key-count', 'cpu-usec']);
16-
const MAX_LIMIT = 10000;
17-
18-
@Injectable()
19-
class ValidateInstanceIdPipe implements PipeTransform<string, string> {
20-
transform(value: string): string {
21-
if (!INSTANCE_ID_RE.test(value)) {
22-
throw new BadRequestException('Invalid instance ID');
23-
}
24-
return value;
25-
}
26-
}
27-
28-
function safeParseInt(value: string | undefined, defaultValue: number): number;
29-
function safeParseInt(value: string | undefined, defaultValue?: undefined): number | undefined;
30-
function safeParseInt(value: string | undefined, defaultValue?: number): number | undefined {
31-
if (value === undefined) return defaultValue;
32-
const parsed = parseInt(value, 10);
33-
if (isNaN(parsed)) return defaultValue;
34-
return parsed;
35-
}
36-
37-
/** Parse and cap a limit/count query param */
38-
function safeLimit(value: string | undefined, defaultValue: number): number {
39-
return Math.max(1, Math.min(safeParseInt(value, defaultValue), MAX_LIMIT));
40-
}
41-
42-
/** Convert ms timestamp query param to seconds for commandlog service */
43-
function msToSeconds(value: string | undefined): number | undefined {
44-
const ms = safeParseInt(value);
45-
if (ms === undefined || ms < 0) return undefined;
46-
return Math.floor(ms / 1000);
47-
}
4816

4917
@Controller('mcp')
5018
@UseGuards(AgentTokenGuard)
5119
export class McpController {
5220
private readonly logger = new Logger(McpController.name);
21+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
5322
private readonly anomalyService: any;
5423

5524
private readonly telemetryService: UsageTelemetryService | null;
@@ -62,6 +31,7 @@ export class McpController {
6231
private readonly clusterDiscoveryService: ClusterDiscoveryService,
6332
private readonly clusterMetricsService: ClusterMetricsService,
6433
@Inject('STORAGE_CLIENT') private readonly storageClient: StoragePort,
34+
// eslint-disable-next-line @typescript-eslint/no-explicit-any
6535
@Optional() @Inject(ANOMALY_SERVICE) anomalyService?: any,
6636
@Optional() telemetryService?: UsageTelemetryService,
6737
) {
@@ -449,4 +419,5 @@ export class McpController {
449419
));
450420
return { ok: true };
451421
}
422+
452423
}

0 commit comments

Comments
 (0)