Skip to content

Latest commit

 

History

History
142 lines (123 loc) · 12.9 KB

File metadata and controls

142 lines (123 loc) · 12.9 KB

Project: codeagent-index-engine

Overview

Rust workspace at codeagent-engine/. Three crates: codeagent-core (lib), codeagent-cli (bin), codeagent-mcp (bin). All 6 phases complete. Hooks, dead code detection, init command, ASP.NET Core API boundary support, and IPC retry mechanism implemented. 412 tests passing (343 core + 41 fixture + 31 MCP + 5 CLI), 2 ignored (symlink tests on Windows), 0 failures. Additionally 28 OSS integration tests pass (gated behind --features oss-tests).

Key conventions

  • In-memory test DB: Connection::open_in_memory() + PRAGMA foreign_keys = ON + run_migrations()
  • Phase 3+ test DB: also call load_sqlite_vec() BEFORE Connection::open_in_memory(), then ensure_vec_nodes_table(&conn) after migrations
  • make_node_for_test is pub(crate) #[cfg(test)] in graph/nodes.rs
  • ensure_root_project(&conn, Language::X)NodeId usable as ProjectId(root_id.0)
  • DbReaderPool::read(|conn| {...}).await — no .get() method
  • writer.submit() returns Result<()> only; smuggle data out via Arc<Mutex<T>> or write-then-read
  • sync_fts_insert signature: (conn, node_id, name, qualified_name, parameter_signature, return_type) — no summary arg; fts_nodes has no summary column at schema v3
  • NodeUpsert has no reference_count field — that field is only on NodeRow (read-back); the SQL column defaults to 0
  • LanguageAdapter::index_file() writes nodes/edges/spans/FTS directly to &Connection — no return struct. Callers manage transactions.
  • Adapter constructors: CSharpAdapter::new(500) and TypeScriptAdapter::new(500) both take max_signature_length: usize
  • strip_bom() must be applied before parsing/hashing (in adapters/mod.rs)
  • test_support.rs: #[cfg(test)] pub(crate) mod test_support in lib.rs — provides TestRepoBuilder (temp dir + optional git init) and TestDb (file-backed SQLite with writer thread + reader pool for #[tokio::test] pipeline tests)
  • IPC from_streams() test helper: ipc::process::testing::from_streams(writer, reader, deadline_ms) — builds LanguageServiceProcess from DuplexStream pairs instead of spawning child processes. Tests use tokio::io::duplex() pairs and a mock server task.
  • insert_span() takes &SpanUpsert struct — not individual positional arguments
  • Symlink tests: gated with #[cfg_attr(windows, ignore)] — symlink creation requires elevated privileges on Windows
  • detect_symbol_renames, apply_symbol_rename, apply_file_rename, SymbolRenameMatch are pub(crate) in ingest/rename.rs for test access
  • IPC retry mechanism: IpcManager::run_analysis() wraps run_analysis_once() in an exponential backoff loop. Retryable: IpcChildExited, IpcTimeout, OOM restart (-32104). Non-retryable: IpcVersionMismatch, Cancelled, deterministic semantic errors (-32100/-32101/-32102). Config: ipc_max_retries (default 2), ipc_retry_base_delay_ms (default 500ms). CoreError::is_retryable_ipc() classifies errors.

Symbol key formats

  • C#: qualified_name:node_type[:param_count(types)] — e.g. MyApp.Auth.Login:method:1(string)
  • TypeScript: all function declarations stored as NodeType::Method (not NodeType::Function)
  • Shared: derive_file_id(path) in adapters/mod.rs — deterministic SHA-256 FileId, used by both adapters

Known bugs fixed (historical)

  1. C# handle_namespace double-naming — Fixed: compute qname before push.
  2. deletion_log missing parameter_count — Fixed: use None directly.
  3. DeletionJournalEntry missing chunk_fingerprint — Fixed in graph/deletion.rs.
  4. upsert_edge INSERT OR IGNORE no-op — Fixed: replaced with INSERT … WHERE NOT EXISTS.
  5. replace_semantic_edges_in_tx approximate→exact upgrade — Fixed: DELETE targets confidence IN ('exact', 'approximate').
  6. filter_nodes_fts MATCH alias — Fixed: WHERE fts_nodes MATCH ?1 (table name, not alias).
  7. upsert_node does not auto-populate fts_nodes — Callers must call sync_fts_insert separately.
  8. sqlite-vec v0.1.6 has no load() function — Fixed: load_sqlite_vec() uses sqlite3_auto_extension.
  9. vec0 rejects BLOB(16) PRIMARY KEY — Decision: regular vec_nodes table with brute-force cosine.
  10. C# FileId::new() non-deterministic — Fixed: derive_file_id(path) (SHA-256 based).
  11. C# namespace span ownership — Fixed: fallback to non-primary span.
  12. TS interface body not walked — Fixed: extended walk_class_body.
  13. FTS5 dot-qualified names cause syntax error — Fixed: wrap in double quotes in build_fts_query().

Schema versions

  • Migration 001: nodes, edges, node_spans, deletion_log, fts_nodes
  • Migration 002: node_identity_map, idx_node_identity_map_old
  • Migration 003: rebuild fts_nodes without summary column; remove summary_prompt_version metadata
  • Migration 004: api_endpoints table with indexes on node_id, controller_id, http_method, route_template
  • CURRENT_SCHEMA_VERSION = 4
  • vec_nodes table: NOT in migration SQL; created by ensure_vec_nodes_table(conn) separately
  • No schema migration for Phase 4, 5, or 6

Phase 3 architecture

  • embedding/provider.rs: EmbeddingProvider trait; HashEmbeddingProvider for tests
  • graph/vectors.rs: ensure_vec_nodes_table, insert_or_replace_embedding, delete_embedding, handle_model_change, vector_search_brute_force
  • query/mod.rs: All SELECT queries select 29 columns (0–28); edge columns at 29/30/31; FTS rank at 29
  • db/connection.rs: load_sqlite_vec() (no-arg) registers global auto-extension

Phase 4 architecture

  • retrieval/: intent classification, multi-channel search (vector/BM25/qname), RRF merge+rerank, context assembly
  • eval/: NDCG, MRR, precision, recall metrics; dataset loader; 100 curated eval queries

Phase 5 architecture — MCP server (COMPLETE)

  • Crate: crates/codeagent-mcp — binary crate, rmcp v0.16 SDK
  • state.rs: CodeagentService struct with #[tool_router] (19 tools) + #[tool_handler] for ServerHandler
  • error.rs: CoreError → MCP error codes; ErrorCode wraps i32 (not i64)
  • sandbox.rs: resolve_sandboxed(repo_root, relative_path) — rejects .., canonicalizes, checks starts_with
  • serialization.rs: node_to_json, filter_result_to_json, neighbor_to_json, span_to_json, outline_node_to_json, similar_node_to_json, api_endpoint_to_json, api_endpoint_with_node_to_json
  • tools/filesystem.rs: list_directory, read_file, get_directory_tree
  • tools/search.rs: search_symbols (FTS), lookup_symbol (qname), find_similar (embedding+brute-force cosine), search_api_endpoints (HTTP method/route/controller filter)
  • tools/navigation.rs: 10 tools: get_symbol, get_source_spans, get_file_outline, get_callers, get_callees, get_implementations, get_references, get_dependencies, get_dependents, find_dead_code
    • get_symbol attaches api_endpoint field when the node has endpoint data
    • get_file_outline attaches api_endpoint to action methods using batch get_api_endpoints_for_file + HashMap lookup
  • tools/management.rs: index_files (pipeline batch), get_status (DB stats)
  • Key rmcp patterns: Parameters<T> for tool params (T: Deserialize + schemars::JsonSchema), schemars = "1.0"
  • ChangeBatch has no from_created_paths — push FileChange::Created manually

Phase 6 architecture — Hardening & Observability (COMPLETE)

  • adapters/mod.rs: strip_bom() — strips UTF-8 BOM (EF BB BF) before parsing/hashing
  • query/mod.rs: get_transitive_neighbors()WITH RECURSIVE CTE; uses UNION (not UNION ALL) on node_id only
  • config.rs: LoggingConfig { debug_content_logging: bool } — default false
  • Key lesson: WITH RECURSIVE + UNION deduplicates on ALL CTE columns — CTE over node_id only for cycle termination

Hooks, Dead Code Detection & Init (COMPLETE)

PageRank (graph/ranking.rs)

  • compute_pagerank(conn, opts) -> Result<Vec<(NodeId, f64)>> — loads edges into memory, iterative power method
  • get_pagerank_summary(conn, opts) -> Result<Vec<(Node, f64)>> — full Node metadata for ranked results
  • PageRankOptions { edge_types, damping, iterations, top_n } with sensible defaults

Dead code detection (query/dead_code.rs)

  • find_dead_code(conn, opts) -> Result<Vec<DeadCodeEntry>> — single SQL query, excludes constructors/overrides/public API by default
  • count_dead_code(conn, opts) -> Result<usize> — lightweight count variant
  • DeadCodeOptions { project_id, language, include_public_api, limit } — public API excluded by default
  • DeadCodeEntry { node: Node } — wraps full Node metadata

find_dead_code MCP tool (tools/navigation.rs)

  • FindDeadCodeParams { project_id, language, include_public_api, limit } — default limit 50
  • Wired into CodeagentService tool_router in state.rs

Claude Code hooks (codeagent-cli/src/hooks.rs)

  • PreCompact: PageRank top-30 symbols as Markdown table + graph stats → additionalContext
  • PostToolUse: Extracts file_path from tool_input, re-indexes single file via adapter index_file() (syntactic-only, WAL + busy_timeout)
  • SubagentStart: Project overview with top-15 symbols + available MCP tool guidance → additionalContext
  • TaskCompleted: Dead code report + unresolved reference count → additionalContext
  • Shared: read_hook_input() (stdin JSON), write_hook_output() (stdout JSON), open_db_for_hook() (WAL + busy_timeout=5000 + migrations)
  • Repo root from $CLAUDE_PROJECT_DIR env var → cwd field → current directory fallback

Init command (codeagent-cli/src/init.rs)

  • codeagent init [--repo-root <path>] — one-command project setup
  • Creates .codeagent/ dir, config.json, index.db (with migrations), .gitignore entry
  • Merges hook registrations into .claude/settings.json (preserves existing settings)
  • 4 hooks registered: PreCompact, PostToolUse (matcher: Edit|Write|NotebookEdit), SubagentStart, TaskCompleted

CLI structure (codeagent-cli/src/main.rs)

  • Subcommands: init, hook <event>, get-node, get-neighbors, get-source, get-outline, filter, lookup, health
  • hook events: pre-compact, post-tool-use, subagent-start, task-completed
  • Init and Hook commands bypass the global --db flag (they resolve paths independently)

OSS Integration Tests (COMPLETE)

  • File: crates/codeagent-core/tests/oss_tests.rs — gated behind cargo test --features oss-tests
  • Repos: tRPC (TS, v11.10.0, MIT) and Hot Chocolate graphql-platform (C#, 15.1.12, MIT)
  • 28 tests across 4 tiers (baseline, navigation, complex queries, benchmarks)

ASP.NET Core API Boundary Support (COMPLETE)

  • graph/api_endpoints.rs: ApiStyle enum (Controller/MinimalApi/Grpc), ApiEndpointUpsert, ApiEndpoint structs; CRUD: upsert_api_endpoint, delete_api_endpoints_for_file, get_api_endpoint, get_api_endpoints_for_file, search_api_endpoints
  • api_endpoints table: separate from nodes (no 29-column bloat); ON DELETE CASCADE on node_id; indexes on node_id, controller_id, http_method, route_template
  • Tree-sitter extraction: AspNetControllerContext tracks controller state while walking class body; extract_aspnet_attributes() parses [ApiController], [HttpGet], [Route], [Authorize], [AllowAnonymous], [ProducesResponseType], [Consumes], [FromBody]
  • Route resolution: resolve_route_template(class_route, method_route, controller_name) — replaces [controller] token, combines class + method routes with /; absolute method routes (starting with /) override class route
  • Tree-sitter C# field name: method return types use child_by_field_name("returns") NOT "type""type" is for properties/events
  • Controller detection: [ApiController] attribute OR ControllerBase/Controller in base list — both trigger endpoint extraction
  • IPC extension: ApiEndpointInfo struct on SemanticNode with skip_serializing_if = "Option::is_none" for backward compat
  • Roslyn extractor: ExtractApiEndpoint(IMethodSymbol) uses GetAttributes() for authoritative data; overwrites tree-sitter approximation with extractor_version = "roslyn"
  • Semantic enrichment: process_enrichment_result() processes api_endpoint from SemanticNode; lookup_controller_id() finds parent class via Contains edge
  • MCP tool: search_api_endpoints — params: http_method, route_pattern, controller_id, limit; returns endpoint metadata + method node metadata
  • MCP enrichment: get_symbol attaches api_endpoint when present; get_file_outline batch-loads endpoints via get_api_endpoints_for_file + HashMap

Deferred

  • Pipeline integration tests (ingest/pipeline.rs) — requires tokio, IpcManager stubs
  • T-275: deferred test requiring file-based DB + async writer for WAL isolation