|
| 1 | +# Zengram Full Audit Report — 2026-04-02 |
| 2 | + |
| 3 | +**Audit scope**: Full codebase review + live database quality check |
| 4 | +**Models**: Claude Opus 4.6 (primary), Codex GPT-5.4 xhigh (adversarial), Gemini 3.1 Pro (adversarial) |
| 5 | +**Live instance**: 192.168.18.40:8084 — 521 memories, 787 entities, all retrieval paths active |
| 6 | +**Tests**: 114 pass / 0 fail / 32 suites |
| 7 | +**Codebase**: ~11K LOC across API, MCP server, 3 storage backends, 6 test suites |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## P1 — High Severity (fix before next deploy) |
| 12 | + |
| 13 | +### P1-1: Entity reclassification broken on Postgres backend |
| 14 | +**File**: `api/src/services/entities.js:370-398` |
| 15 | +**Also**: `api/src/routes/entities.js:115-117` (dry-run preview) |
| 16 | + |
| 17 | +`reclassifyEntity()` uses SQLite-specific `store.db.prepare()` syntax. The Beelink production deployment uses Postgres (`store.pool`), so `store.db` is `undefined`. The function silently returns `{ error: 'No writable store' }`. |
| 18 | + |
| 19 | +**Verified live**: `POST /entities/reclassify` dry-run returns `memories_affected: 0` for "Expert Local" (which has 13 mentions and many linked memories) — the Postgres path can't count. |
| 20 | + |
| 21 | +**Fix**: Add a Postgres code path using `store.pool.query()` for both the UPDATE and the COUNT: |
| 22 | +```javascript |
| 23 | +if (store.pool) { |
| 24 | + await store.pool.query('UPDATE entities SET entity_type = $1 WHERE id = $2', [newType, entity.id]); |
| 25 | + const linkCount = await store.pool.query('SELECT COUNT(*) as count FROM entity_memory_links WHERE entity_id = $1', [entity.id]); |
| 26 | + // ... |
| 27 | +} else if (store.db) { |
| 28 | + // existing SQLite code |
| 29 | +} |
| 30 | +``` |
| 31 | + |
| 32 | +### P1-2: Batch endpoint skips critical write-time logic |
| 33 | +**File**: `api/src/routes/memory.js:847-971` |
| 34 | + |
| 35 | +`POST /memory/batch` is missing: |
| 36 | +1. **Supersedes logic** — batch fact/status ingestion with same key creates duplicates instead of superseding old versions |
| 37 | +2. **Structured DB writes** — memories are in Qdrant but NOT in events/facts/statuses tables (divergence) |
| 38 | +3. **Relevance scoring** — batch memories have no `relevance_score`, `auto_flagged`, or `auto_deprioritized` fields |
| 39 | +4. **Entity linking** — entities are extracted (payload only) but never linked in the entity graph |
| 40 | + |
| 41 | +If agents use `brain_batch` for bootstrap/migration, the structured DB will be missing those memories entirely. |
| 42 | + |
| 43 | +--- |
| 44 | + |
| 45 | +## P2 — Medium Severity (fix in next sprint) |
| 46 | + |
| 47 | +### P2-1: Consolidated memories invisible to keyword search and structured queries |
| 48 | +**File**: `api/src/services/consolidation.js:334-372` |
| 49 | + |
| 50 | +When consolidation merges facts or creates contradiction events: |
| 51 | +- Stored in Qdrant ✓ |
| 52 | +- **NOT indexed** in `memory_search` table → invisible to BM25 keyword search |
| 53 | +- **NOT written** to structured DB (events/facts/statuses) → invisible to `GET /memory/query` |
| 54 | + |
| 55 | +Consolidated memories can only be found via vector search, never via keyword or structured queries. |
| 56 | + |
| 57 | +### P2-2: cleanupOldEvents loads ALL events into memory |
| 58 | +**File**: `api/src/services/consolidation.js:533-561` |
| 59 | + |
| 60 | +Scrolls ALL event-type memories into an in-memory array before filtering for old/unused ones. With thousands of events, this could exhaust container memory. |
| 61 | + |
| 62 | +**Fix**: Use Qdrant compound filter to scroll only matching points: |
| 63 | +```javascript |
| 64 | +scrollPoints({ type: 'event', active: true }, 200, offset) |
| 65 | +// then filter access_count === 0 && created_at < cutoff per page |
| 66 | +``` |
| 67 | + |
| 68 | +### P2-3: Race condition in fact/status supersedes |
| 69 | +**File**: `api/src/routes/memory.js:128-156` |
| 70 | + |
| 71 | +Two concurrent `POST /memory` with the same fact key: |
| 72 | +1. Both find the old fact via `findByPayload` |
| 73 | +2. Both try to supersede it (mark old as inactive, set `superseded_by`) |
| 74 | +3. Both create new points — one "wins" in Qdrant, but the old fact's `superseded_by` field may point to the wrong new memory |
| 75 | + |
| 76 | +Low probability in practice (agents rarely race on the same key) but possible during bulk operations. |
| 77 | + |
| 78 | +### P2-4: Internal fields leak in search API response |
| 79 | +**File**: `api/src/routes/memory.js:539-540` |
| 80 | + |
| 81 | +`_sessionKey` and `_sessionRank` are added to result objects during session diversity re-ranking and flow through to the JSON response when results > 3. |
| 82 | + |
| 83 | +**Verified live**: `GET /memory/search?q=...&limit=8` returns `_sessionKey: "ti-claude::2026-04-02"` and `_sessionRank: 0` in every result. |
| 84 | + |
| 85 | +**Fix**: Delete internal fields before sending response: |
| 86 | +```javascript |
| 87 | +for (const r of results) { |
| 88 | + delete r._sessionKey; |
| 89 | + delete r._sessionRank; |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +### P2-5: Duplicate entities in production database |
| 94 | +Confirmed pairs that should be merged: |
| 95 | +| Entity A | Type | Mentions | Entity B | Type | Mentions | |
| 96 | +|----------|------|----------|----------|------|----------| |
| 97 | +| Expert Local | system | 13 | expert-local | client | 9 | |
| 98 | +| Claude Code | agent | 11 | claude-code | agent | 5 | |
| 99 | +| Dermka Clinik | system | 6 | dermka-clinik | client | 4 | |
| 100 | +| Mini-Claude | workflow | 7 | Mini Claude | system | 1 | |
| 101 | + |
| 102 | +Root cause: Entity extraction treats hyphenated client_id slugs and capitalized display names as separate entities. The `extractEntities()` function adds both `clientId` (line 298) and capitalized phrases (line 340) without cross-checking. |
| 103 | + |
| 104 | +### P2-6: Massive noise in workflow entity type (192 entities) |
| 105 | +Most workflow entities are garbage captured by the quoted name regex (`QUOTED_NAME_REGEX`). Examples: |
| 106 | +- `const viewportDefs` — code variable |
| 107 | +- `openclaw gateway restart` — shell command |
| 108 | +- `drop les emojis` — instruction |
| 109 | +- `Satisfaction Guaranteed or Money Back` — marketing copy |
| 110 | +- `NEVER build HTML yourself` — instruction |
| 111 | + |
| 112 | +The `isJunkQuotedName()` filter is good but doesn't catch: |
| 113 | +- Multi-word shell commands without recognized command prefixes |
| 114 | +- French instructions (`drop les emojis`) |
| 115 | +- Marketing taglines that happen to be quoted |
| 116 | + |
| 117 | +--- |
| 118 | + |
| 119 | +## P3 — Low Severity / Polish |
| 120 | + |
| 121 | +### P3-1: 62 "system" entities with 1 mention (noise) |
| 122 | +Many are misclassified: "Maxime Fillion" (person), "Centre Morrin" (place), "Rebranded Prism" (action phrase), "Apple Vision Pro" (unrelated tech). |
| 123 | + |
| 124 | +### P3-2: No `knowledge_category` validation on store endpoint |
| 125 | +**File**: `api/src/middleware/validate.js` |
| 126 | +`validateMemoryInput` doesn't validate `knowledge_category`. Any string passes through to Qdrant. MCP tool lists valid values but server doesn't enforce. |
| 127 | + |
| 128 | +### P3-3: Feedback loop scrolls ALL active memories twice |
| 129 | +**File**: `api/src/services/feedback-loop.js:27-48, 70-105` |
| 130 | +Phase 1 (trust scores) and Phase 2 (stale detection) each scroll all active memories. Could be combined into one pass. |
| 131 | + |
| 132 | +### P3-4: CI doesn't test Postgres backend |
| 133 | +**File**: `.github/workflows/ci.yml` |
| 134 | +All tests run SQLite only. Postgres-specific bugs (like P1-1) are never caught in CI. |
| 135 | + |
| 136 | +### P3-5: `npm audit || true` always passes CI |
| 137 | +**File**: `.github/workflows/ci.yml:30` |
| 138 | +Security audit failures never fail the build. |
| 139 | + |
| 140 | +### P3-6: `EMBEDDING_DIMS` export always null |
| 141 | +**File**: `api/src/services/embedders/interface.js:63` |
| 142 | +Legacy backwards-compat export that's never set. Any caller using it gets null. |
| 143 | + |
| 144 | +### P3-7: Collection name validation gap |
| 145 | +**File**: `api/src/services/collection-registry.js:18-27` |
| 146 | +`resolveCollection()` doesn't call `validateCollectionSlug()`. Pre-prefixed names pass through to Qdrant URLs unvalidated. Qdrant likely rejects bad names, but defense-in-depth is missing. |
| 147 | + |
| 148 | +### P3-8: SQLite FTS5 keyword search doesn't support `source_agent` filter |
| 149 | +**File**: `api/src/services/keyword-search.js:142` |
| 150 | +Postgres keyword search supports `client_id`, `type`, AND `source_agent` filters. SQLite only supports `client_id` and `type`. Agent-filtered keyword searches return different results on each backend. |
| 151 | + |
| 152 | +--- |
| 153 | + |
| 154 | +## Architecture Highlights (Strengths) |
| 155 | + |
| 156 | +- **Auth**: Agent key identity binding with timing-safe comparison. Rate limiting on failed attempts. Identity enforcement on store/update/delete. |
| 157 | +- **Credential scrubbing**: Comprehensive regex patterns (API keys, JWTs, AWS keys, SSH keys, connection strings, Anthropic/OpenAI keys). |
| 158 | +- **Input validation**: Content max 10K chars, metadata max 10KB depth 3, agent names regex-validated, type/importance enum-checked. |
| 159 | +- **Multi-path retrieval**: Vector (semantic) + BM25 (keyword) + graph (entity BFS) → RRF fusion. Well-implemented with proper fallbacks. |
| 160 | +- **Consolidation defense-in-depth**: XML escaping, batch ID validation, hallucination stripping, enum whitelists for types. |
| 161 | +- **Relevance scoring**: Zero-latency (no extra embed call), near-duplicate detection, source trust, entity density. |
| 162 | +- **Session diversity re-ranking**: Smart round-robin across agent/date groups prevents search result clustering. |
| 163 | +- **Graceful degradation**: Every service (entities, keyword, graph, consolidation) is optional and degrades gracefully. |
| 164 | +- **LRU entity cache**: Bounded with pinned built-in entries, prevents unbounded memory growth. |
| 165 | +- **Test suite**: 114 tests / 32 suites / 0 failures. Good coverage of validation, scrubbing, RRF, entity extraction. |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## External Model Adversarial Review — Cross-Verification |
| 172 | + |
| 173 | +### Codex GPT-5.4 (xhigh) — FAILED |
| 174 | +Codex sandbox hit a `bwrap` loopback failure and could not read any source files. No findings produced. |
| 175 | + |
| 176 | +### Gemini 3.1 Pro (adversarial) — 20 findings, 10 verified |
| 177 | + |
| 178 | +The Gemini agent produced 20 findings (5 P0, 5 P1, 5 P2, 5 P3). After line-by-line verification against source code: |
| 179 | + |
| 180 | +| Finding | Gemini Severity | Verified | Actual Severity | Notes | |
| 181 | +|---------|----------------|----------|-----------------|-------| |
| 182 | +| P0-1: Race in dedup | P0 | REAL | **P2** | Already in report as P2-3. Low probability in practice | |
| 183 | +| P0-2: Timing side-channel on agent keys | P0 | **FALSE** | N/A | Map.get() on 256-bit keys not vulnerable | |
| 184 | +| P0-3: SQL wildcard injection via ILIKE | P0 | **FALSE** | N/A | Parameterized queries. Wildcards are intended search behavior | |
| 185 | +| P0-4: LLM output injection to DB | P0 | **FALSE** | N/A | Code has extensive validation (lines 252-300) | |
| 186 | +| P0-5: Collection name injection | P0 | REAL | **P3** | Already in report as P3-7. Qdrant rejects bad names | |
| 187 | +| P1-6: No timeout on findByPayload | P1 | **FALSE** | N/A | All Qdrant calls use `qdrantRequest()` with 10s timeout | |
| 188 | +| P1-7: Hardcoded LIMIT 50 | P1 | REAL | **P3** | Design choice, not a bug | |
| 189 | +| P1-8: Unbounded consolidation memory | P1 | REAL | **P2** | Already in report as P2-2 | |
| 190 | +| P1-9: LLM batch token overflow | P1 | **FALSE** | N/A | 50×10K chars = 125K tokens, Gemini Flash context = 1M | |
| 191 | +| P1-10: Exponential timeout backoff | P1 | **FALSE** | N/A | It's linear (not exponential), capped at 1 retry = 30s max | |
| 192 | +| P2-11: UUID collision | P2 | **FALSE** | N/A | Mathematically negligible (2^128 space) | |
| 193 | +| P2-12: Rate limiting not cluster-safe | P2 | REAL | **P3** | Single-instance deployment, not currently exploitable | |
| 194 | +| P2-13: Client resolver silent failure | P2 | **FALSE** | N/A | Code falls back to 'global' correctly (line 165) | |
| 195 | +| P2-14: No error recovery in batch consolidation | P2 | REAL | OK | Failed batches stay `consolidated=false`, retried next run | |
| 196 | +| P2-15: Embedding fallback | P2 | REAL | **P3** | Architectural limitation, not a bug | |
| 197 | +| P3-16: Incomplete input validation | P3 | **FALSE** | N/A | Didn't read validate.js. Validation IS thorough | |
| 198 | +| P3-17: Client_id in consolidation prompt | P3 | **FALSE** | N/A | Code preserves client_id correctly (line 338) | |
| 199 | +| P3-18: No logging | P3 | **FALSE** | N/A | Logging exists at lines 778, 832, 958 | |
| 200 | +| P3-19: Qdrant errors leak details | P3 | **FALSE** | N/A | All error handlers return generic messages | |
| 201 | +| P3-20: IP spoofing | P3 | REAL | **P3** | Internal API, not internet-facing | |
| 202 | + |
| 203 | +**Summary**: 10/20 findings were false positives (50% FP rate). Of the 10 real findings, 5 were already in the primary audit, 3 were P3 design choices, and 2 were correctly handled by existing code. **Zero confirmed P0 vulnerabilities.** |
| 204 | + |
| 205 | +### Claude Explore Agent — Architecture Analysis (47 tool calls, 537KB output) |
| 206 | + |
| 207 | +Comprehensive architecture map with 15 anomalies. Cross-verification: |
| 208 | + |
| 209 | +| Explore Claim | Verified | Notes | |
| 210 | +|--------------|----------|-------| |
| 211 | +| "No test files found" | **FALSE** | 6 test files in `api/tests/`, 114 tests pass | |
| 212 | +| "No batch size limit" | **FALSE** | `BATCH_MAX_SIZE=100` at `memory.js:849` | |
| 213 | +| "MCP consolidation async-only" | **FALSE** | `sync: boolean` param exists in MCP tool definition | |
| 214 | +| Confidence decay no lower bound | REAL (P3) | After 100 days: 0.98^100 = 13%. By design but could floor at 0.1 | |
| 215 | +| Temporal buffer inconsistency | REAL (P3) | ±2d for "days ago", ±5d for "last month", not documented | |
| 216 | +| Fire-and-forget lacks monitoring | REAL (P3) | Entity linking, keyword index, notifications — all silent on failure | |
| 217 | +| Client resolver MIN_THRESHOLD=2 | REAL (P3) | Permissive matching, could misattribute memories to wrong client | |
| 218 | + |
| 219 | +**Summary**: 3/7 key claims were false. 4 valid observations, all P3 severity. The architecture analysis was thorough but over-reported issues it hadn't fully verified. |
| 220 | + |
| 221 | +--- |
| 222 | + |
| 223 | +## Recommended Fix Priority |
| 224 | + |
| 225 | +1. **P1-1**: Add Postgres path to `reclassifyEntity()` + entities route dry-run |
| 226 | +2. **P1-2**: Add supersedes, structured DB writes, and entity linking to batch endpoint |
| 227 | +3. **P2-1**: Add keyword indexing + structured DB writes to consolidation merged facts |
| 228 | +4. **P2-4**: Clean `_sessionKey`/`_sessionRank` from search response |
| 229 | +5. **P2-5**: Run entity merge for the 4 confirmed duplicate pairs |
| 230 | +6. **P2-6**: Tighten `isJunkQuotedName()` to reject more noise patterns |
| 231 | +7. **P2-2**: Refactor cleanupOldEvents to filter per-page instead of loading all |
| 232 | +8. **P3-2**: Add `knowledge_category` to `validateMemoryInput` |
| 233 | +9. **P3-4**: Add Postgres service to CI workflow |
0 commit comments