Skip to content

Commit 0cad62f

Browse files
cdeustclaude
andcommitted
release: v3.14.8 — ingest_codebase full-chain extraction + audit fixes
User report: "ast is failing, no methods, no call, no files, it simply fail to use the pipeline, when the rust pipeline is working fine." Diagnosis: the upstream Rust indexer (ai-architect-mcp) was producing a correct Kuzu graph (2 312 Functions, 3 023 Methods, 780 Structs, 554 Files, 2 833 Imports). Cortex's ``ingest_codebase`` consumer was the broken half — used BM25 keyword search with the project name as query, got 2 hits, gated the Cypher fallback on emptiness so it never fired, and even when it did fire it never extracted file paths or any edges. Fix: - Replace BM25 with a Cypher-driven projection: every Function / Method / Struct, every File, every call edge, every File→symbol containment edge. Default caps removed (top_symbols=null, top_processes=null) — pull the full chain. - Refuse to memoise a synthesised graph_path on persistent upstream error (cache-poisoning bug surfaced by genius cross-verification). - Narrow broad ``except Exception`` to expected transport classes; surface per-query diagnostics in the handler response. - Detect qualified_name overload collisions; report them rather than silently coalescing. - File attribution derived from authoritative (:File)-[]->(:symbol) edges, not qn-split — language-agnostic. qn-split demoted to fallback validated against the known-files set. - Server-side label-OR filter pushdown to remove Function→Process/Community noise from the wire. - Stable ordering for unbounded queries. - Split into six modules to fit the 300-line cap: _cypher, _writers, _graph, _pages, _schema, composition root. - Lock-guard the module-level _store singleton for thread-pool callers. Live measurement on Cortex itself (cached graph): 50 150 symbols, 4 072 files, 30 818 calls, 19 297 contains. Diagnostics array reports fallback usage honestly. Tests: 2549 passing (3 new audit-driven tests + 1 cross-language attribution test). Mock routing rewritten with regex to remove the substring-ordering footgun. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4c9a57b commit 0cad62f

10 files changed

Lines changed: 1263 additions & 490 deletions

.claude-plugin/plugin.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"name": "cortex",
33
"description": "Persistent memory for Claude Code — remembers across sessions automatically. Install and forget. Scientific retrieval backed by 41 published papers.",
4-
"version": "3.14.7",
4+
"version": "3.14.8",
55
"author": {
66
"name": "Clement Deust",
77
"email": "admin@ai-architect.tools"

CHANGELOG.md

Lines changed: 72 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,82 @@ adheres to [Semantic Versioning](https://semver.org/).
66

77
## [Unreleased]
88

9+
## [3.14.8] — ingest_codebase full-chain extraction + audit fixes
10+
11+
### Fixed
12+
13+
- **`ingest_codebase` extracted only the tip of the iceberg.** BM25
14+
keyword search (`search_codebase`) was the primary symbol-extraction
15+
path, returning 2 hits when invoked with the project name as query.
16+
The Cypher fallback was gated on empty results (`if not symbols_raw`),
17+
so a 2-hit BM25 response prevented the structural pull. Even when
18+
the fallback ran it didn't extract `file_path` (Function nodes carry
19+
no such property — it's encoded in `qualified_name`) or any edges
20+
(BM25 result rows have no `calls` / `imports` keys). User-visible
21+
result on a 6 000-symbol codebase: 2 symbols, 0 edges, 0 files.
22+
Replaced with a Cypher-driven projection that pulls every
23+
Function / Method / Struct, every File node, every
24+
(`Function`/`Method`/`Struct`)→(`Function`/`Method`/`Struct`) call
25+
edge, and every File→symbol containment edge. Live measurement on
26+
the Cortex codebase: 50 150 symbols, 4 072 files, 30 818 calls,
27+
19 297 contains.
28+
- **Cache poisoning in `ensure_graph`.** When `analyze_codebase`
29+
returned `status=error` after the self-heal retry, the handler
30+
synthesised `<output_dir>/graph` and memoised it as success. Future
31+
ingests reused the bogus path and silently projected an empty graph,
32+
indistinguishable from "empty codebase". Now raises
33+
`McpConnectionError` and refuses to memoise on persistent error.
34+
- **Broad `except Exception → return []`** swallowed every transport,
35+
parse, and schema error in cypher fetchers as an empty result —
36+
indistinguishable from "graph genuinely has zero rows". Narrowed to
37+
`(McpConnectionError, ValueError, KeyError, TypeError)`. Per-query
38+
failures now surface as a `diagnostics` array in the handler
39+
response.
40+
- **qualified_name overload collisions** silently dropped legitimate
41+
cross-overload call edges via the `src_id == dst_id` self-loop
42+
guard. `write_symbol_entities` now detects collisions and surfaces
43+
them as diagnostics (the upstream graph itself is the dedupe
44+
boundary, so downstream disambiguation requires signature data the
45+
upstream does not emit).
46+
- **Hardcoded `top_symbols=50` / `top_processes=10` caps.** Defaults
47+
are now `null` ⇒ pull every symbol / every process. Callers can
48+
still cap explicitly.
49+
50+
### Changed
51+
52+
- **File attribution is now language-agnostic.** Symbol → file mapping
53+
is derived from authoritative `(:File)-[]->(:symbol)` containment
54+
edges; the `qn.split("::")[0]` heuristic is demoted to a fallback
55+
validated against the known-files set, so Rust qualified_names
56+
(`crate::module::Type::method`) cannot fabricate fake "crate" file
57+
paths.
58+
- **Server-side filter pushdown** in cypher fetchers: label-OR pattern
59+
`(b:Function|Method|Struct)` removes Function→Process /
60+
Function→Community noise from the wire. Single label-OR query for
61+
containment instead of three round-trips.
62+
- **Stable ordering** for unbounded fetches (`ORDER BY qualified_name`)
63+
and bounded fetches (`ORDER BY (end-start) DESC`).
64+
- `ingest_codebase.py` split into six modules to fit the project's
65+
300-line cap: `_cypher` (Kuzu fetchers), `_writers` (MemoryStore
66+
writers), `_graph` (analyze + cache resolution), `_pages` (process
67+
wiki rendering), `_schema` (MCP tool schema), and the composition
68+
root.
69+
970
### Added
1071

11-
- Public-readiness baseline: CONTRIBUTING.md, CODE_OF_CONDUCT.md,
12-
SECURITY.md.
13-
- GitHub issue templates (bug / feature / audit-finding) and PR template
14-
with audit-cycle checklist.
15-
- LICENSE expanded with ecosystem-context preamble + explicit
16-
independent-authorship statement (no employer affiliation).
72+
- `_store` singleton lock-guarded for thread-pool callers.
73+
- New tests: `test_persistent_upstream_error_does_not_poison_cache`,
74+
`test_cypher_error_surfaces_as_diagnostic`,
75+
`test_file_attribution_uses_containment_not_qn_split`. Mock routing
76+
rewritten to use regex patterns instead of substring keys
77+
(substring-prefix collisions silently routed wrong replies).
78+
- Public-readiness baseline (carried from Unreleased): CONTRIBUTING.md,
79+
CODE_OF_CONDUCT.md, SECURITY.md, GitHub issue/PR templates, expanded
80+
LICENSE with ecosystem-context preamble + explicit
81+
independent-authorship statement.
1782
- `prd-spec-generator` cross-link in companion-projects section.
1883

19-
### Fixed
84+
### Fixed (carried)
2085

2186
- `.mcp.json` + `plugin.json` hooks resilient to project-scoped launch.
2287

0 commit comments

Comments
 (0)