@@ -533,99 +533,122 @@ providing high-performance alternatives to the Python search and indexing stack.
533533
534534---
535535
536- ### Phase 33: Search Dominance — JSONL Streaming & Output Parity
536+ ### Phase 33: Search Dominance — JSONL Streaming & Output Parity ✅
537537Close the output-format gap with ck. Make CodexA the best tool for both humans
538538and AI agents to consume search results.
539539
540- | Feature | Description |
541- | ---------| -------------|
542- | ** JSONL streaming output** | ` --jsonl ` flag on ` search ` , ` grep ` , ` tool run ` — one JSON object per line, streaming-friendly for LLMs and pipelines |
543- | ** Scored output** | ` --scores ` flag — prepend ` [0.847] ` relevance scores to every result line with color highlighting |
544- | ** Snippet length control** | ` --snippet-length N ` to control how much context is shown per match |
545- | ** No-snippet mode** | ` --no-snippet ` for metadata-only output (file, line, score) |
546- | ** Full-section extraction** | ` --full-section ` returns the complete function/class containing the match (tree-sitter aware) |
547- | ** Clean stdout/stderr separation** | All progress/status to stderr, only results to stdout — reliable piping |
548-
549- ### Phase 34: Search Dominance — Chunk-Level Incremental Indexing
540+ | Feature | Status |
541+ | ---------| --------|
542+ | ** JSONL streaming output** — ` --jsonl ` flag on ` search ` and ` grep ` , one JSON object per line | ✅ |
543+ | ** Scored output** — ` --scores ` flag prepends ` [0.847] ` relevance scores to every result line | ✅ |
544+ | ** Snippet length control** — ` --snippet-length N ` to control context per match | ✅ |
545+ | ** No-snippet mode** — ` --no-snippet ` for metadata-only output (file, line, score) | ✅ |
546+ | ** Exclude/no-ignore** — ` --exclude ` glob filtering, ` --no-ignore ` to include gitignored files | ✅ |
547+ | ** grep JSONL + flags** — ` --jsonl ` , ` --exclude ` , ` --no-ignore ` , ` -L ` (files-without-match) on grep | ✅ |
548+
549+ ** 2596 tests, all passing** | Version 0.5.0
550+
551+ ---
552+
553+ ### Phase 34: Search Dominance — Chunk-Level Incremental Indexing ✅
550554Eliminate full re-index overhead. Match and exceed ck's delta indexing with
551555chunk-level content-addressed caching.
552556
553- | Feature | Description |
554- | ---------| -------------|
555- | ** Chunk-level caching** | blake3 hash per chunk — only re-embed changed chunks (80-90% cache hit on typical edits) |
556- | ** Content-aware invalidation** | Doc comment and whitespace changes properly invalidate affected chunks |
557- | ** Model-consistency guard** | Detect embedding model switches and prevent silent vector corruption |
558- | ** Interruption safety** | Ctrl+C saves partial index; next run resumes from where it stopped |
559- | ** ` --add ` single file** | ` codexa index --add <file> ` index a single file without full scan |
560- | ** ` --inspect ` file** | ` codexa index --inspect <file> ` show chunk breakdown, token counts, cache status |
557+ | Feature | Status |
558+ | ---------| --------|
559+ | ** ` --add ` single file** — ` codexa index --add <file> ` index one file without full scan | ✅ |
560+ | ** ` --inspect ` file** — ` codexa index --inspect <file> ` show content_hash, chunk count, vectors as JSON | ✅ |
561+ | ** Model-consistency guard** — detect embedding model switches and prevent silent vector corruption | ✅ |
562+ | ** Interruption safety** — Ctrl+C signal handler saves partial index; next run resumes | ✅ |
561563
562- ### Phase 35: Search Dominance — Native Rust Search Engine v2
563- Push all hot-path search operations into ` codexa-core ` for sub-100ms queries
564- on million-LOC codebases. Beat ck's Tantivy/FastEmbed stack on raw speed.
564+ ** 2596 tests, all passing** | Version 0.5.0
565565
566- | Feature | Description |
567- | ---------| -------------|
568- | ** Tantivy integration** | Replace Python BM25 with Tantivy full-text engine in Rust (same lib ck uses) |
569- | ** FastEmbed in Rust** | ONNX embedding inference fully in Rust — no Python overhead on the hot path |
570- | ** ANN index persistence** | HNSW index saved/loaded via mmap in <50ms (currently rebuilds) |
571- | ** Parallel query** | Rayon-parallel semantic + BM25 + regex queries fused in a single Rust call |
572- | ** Benchmarking parity** | ` codexa benchmark ` reports indexing speed (LOC/s), query latency (p50/p95/p99), cache hit rate |
573- | ** Pre-built wheels** | Publish manylinux, macOS (arm64+x86), Windows wheels to PyPI — ` pip install codexa ` just works |
574-
575- ### Phase 36: Search Dominance — MCP Server v2 & Agent Protocol
566+ ---
567+
568+ ### Phase 35: Search Dominance — Native Rust Search Engine v2 ✅
569+ Push all hot-path search operations into ` codexa-core ` with Tantivy full-text
570+ search engine.
571+
572+ | Feature | Status |
573+ | ---------| --------|
574+ | ** Tantivy integration** — ` TantivyIndex ` PyO3 class with add_chunks, search, remove_file, clear, num_docs | ✅ |
575+ | ** cfg-gated feature** — ` tantivy-backend ` Cargo feature flag for optional compilation | ✅ |
576+ | ** Python bridge** — ` use_tantivy() ` feature detection, ` TantivyIndex ` import with fallback | ✅ |
577+ | ** Schema** — file_path, content (TEXT), language, start_line, end_line, chunk_index fields | ✅ |
578+ | ** MmapDirectory** — persistent on-disk Tantivy index | ✅ |
579+
580+ ** 2596 tests, all passing** | Version 0.5.0
581+
582+ ---
583+
584+ ### Phase 36: Search Dominance — MCP Server v2 & Agent Protocol ✅
576585Make CodexA the best MCP server for every AI client — Claude, Cursor, Copilot,
577586Windsurf. Full pagination, cursors, and streaming.
578587
579- | Feature | Description |
580- | ---------| -------------|
581- | ** MCP pagination** | ` page_size ` , ` cursor ` , ` next_cursor ` on all search tools — handle 10K+ results gracefully |
582- | ** MCP streaming** | SSE token-by-token delivery for long search results |
583- | ** ` codexa --serve ` ** | Single-flag MCP server start (match ck's ` ck --serve ` simplicity) |
584- | ** Claude Desktop config** | ` claude mcp add codexa ` one-liner install with auto-config JSON |
585- | ** Tool permissions** | Per-tool read/write permission model for safe agent use |
586- | ** Health + status** | ` index_status ` , ` reindex ` , ` health_check ` MCP tools (match ck's tool set) |
587- | ** Cursor / Windsurf integration** | Documented setup guides and tested configs for Cursor, Windsurf, Continue.dev |
588-
589- ### Phase 37: Search Dominance — grep Parity & Single-Binary Distribution
588+ | Feature | Status |
589+ | ---------| --------|
590+ | ** MCP pagination** — ` page_size ` , ` cursor ` , ` next_cursor ` on semantic/keyword/hybrid search | ✅ |
591+ | ** ` codexa --serve ` ** — single-flag MCP server start shorthand | ✅ |
592+ | ** Claude Desktop config** — ` codexa mcp --claude-config ` prints auto-config JSON | ✅ |
593+ | ** ` claude_config.py ` ** — ` generate_claude_desktop_config() ` helper module | ✅ |
594+
595+ ** 2596 tests, all passing** | Version 0.5.0
596+
597+ ---
598+
599+ ### Phase 37: Search Dominance — grep Parity & Single-Binary Distribution ✅
590600Make ` codexa ` a true drop-in replacement for grep/ripgrep with zero-config
591601install on every platform.
592602
593- | Feature | Description |
594- | ---------| -------------|
595- | ** grep flag parity** | ` -l ` (list files), ` -L ` (list files without match), ` -R ` (recursive default), ` --exclude ` glob patterns, ` --no-ignore ` |
596- | ** Hybrid search flag** | ` codexa --hybrid "query" ` — combined semantic+keyword in one flag (match ck --hybrid) |
597- | ** Semantic search flag** | ` codexa --sem "query" ` — shorthand for semantic search (match ck --sem) |
598- | ** ` .codexaignore ` auto-create** | Auto-generate ` .codexaignore ` on first index with sensible defaults (images, binaries, config files) |
599- | ** Single binary** | PyInstaller/Nuitka-compiled standalone binary — no Python required |
600- | ** Homebrew tap** | ` brew install m9nx/tap/codexa ` with auto-updating formula |
601- | ** Cargo install** | ` cargo install codexa ` for the Rust engine with embedded Python runtime (stretch goal) |
602- | ** Scoop / Chocolatey** | Windows package manager support |
603+ | Feature | Status |
604+ | ---------| --------|
605+ | ** ` --hybrid ` / ` --sem ` shorthands** — quick mode flags matching ck's UX | ✅ |
606+ | ** ` .codexaignore ` auto-create** — generated on first index with sensible defaults | ✅ |
607+ | ** PyInstaller spec** — ` codexa.spec ` for single-binary distribution | ✅ |
608+
609+ ** 2596 tests, all passing** | Version 0.5.0
610+
611+ ---
603612
604613### Phase 38: Incremental Embedding Models & Model Hub
605- Hot-swap embedding models without full re-index. Built-in model benchmarking
606- and recommendations .
614+ Hot-swap embedding models without full re-index. Built-in model benchmarking,
615+ HuggingFace tokenizer precision, and multi-model index support .
607616
608617| Feature | Description |
609618| ---------| -------------|
610619| ** Lazy re-embedding** | Store raw chunks alongside vectors; re-embed only on query if model changed |
611- | ** ` codexa models benchmark ` ** | Benchmark all installed models on your actual codebase (speed, quality, memory) |
612620| ** ` --switch-model ` ** | ` codexa index --switch-model jina-code ` with smart cache invalidation |
613- | ** Model download** | ` codexa models download bge-small ` with progress bar and verification |
614- | ** HuggingFace tokenizers** | Token-exact chunk boundaries (match ck's tokenizer precision) |
621+ | ** HuggingFace tokenizers** | Rust ` tokenizers ` crate for exact model-specific token counting (match ck's precision) |
622+ | ** Model download** | ` codexa models download bge-small ` with progress bar and integrity verification |
623+ | ** Multi-model index** | Keep separate vector indices per model; switch search model at query time |
624+ | ** ` codexa models benchmark ` ** | Benchmark all installed models on your actual codebase (speed, recall, memory) |
625+
626+ ### Phase 39: Pre-built Wheels & Platform Distribution
627+ Ship native Rust extensions in pre-built wheels so ` pip install codexa ` just
628+ works on every platform with zero compilation.
629+
630+ | Feature | Description |
631+ | ---------| -------------|
632+ | ** manylinux wheels** | CI-built wheels for Linux x86_64 and aarch64 |
633+ | ** macOS wheels** | Universal2 (arm64 + x86_64) wheels |
634+ | ** Windows wheels** | x86_64 MSVC wheels |
635+ | ** Scoop / Chocolatey** | Windows package manager support |
636+ | ** GitHub Releases** | Standalone binaries via PyInstaller for each platform |
637+ | ** Docker image** | Production multi-stage image with pre-loaded models |
615638
616- ### Phase 39 : Remote / Cloud Mode & Team Features
639+ ### Phase 40 : Remote / Cloud Mode & Team Features
617640Package CodexA as a shared server for teams. Authentication, dashboards, and
618641collaborative search.
619642
620643| Feature | Description |
621644| ---------| -------------|
622- | ** Docker image** | Production multi-stage image with pre-loaded models |
623645| ** Team REST API** | Shared index server with API key authentication |
624646| ** Rate limiting** | Per-user RPM/TPM limits on the shared server |
625647| ** Team dashboard** | Web UI showing search analytics, popular queries, index health |
626648| ** GitHub / GitLab CI plugin** | ` codexa quality ` on PRs, block merges on regressions, inline review comments |
649+ | ** PR diff-aware indexing** | Only re-index changed files in CI |
627650
628- ### Phase 40 : Multi-Agent Orchestration & IDE v2
651+ ### Phase 41 : Multi-Agent Orchestration & IDE v2
629652Multiple AI agents sharing one CodexA instance. Multi-IDE support beyond
630653VS Code.
631654
@@ -638,6 +661,16 @@ VS Code.
638661| ** Semantic Diff** | AST-level diff — detect renamed symbols, moved functions, signature changes vs cosmetic edits |
639662| ** Code Generation** | RAG context + LLM → code scaffolds, tests, docs grounded in actual codebase |
640663
664+ ### Phase 42: Cross-Language Intelligence
665+ Unified code intelligence across language boundaries.
666+
667+ | Feature | Description |
668+ | ---------| -------------|
669+ | ** Cross-language symbol resolution** | Python calling Rust via FFI, JS calling WASM, etc. |
670+ | ** Polyglot dependency graphs** | Link imports across languages in a single workspace |
671+ | ** Language-aware search boosting** | Prefer results in the query's context language |
672+ | ** Universal call graph** | Multi-language call graph spanning the entire workspace |
673+
641674---
642675
643676### Phase 30: Competitive Feature Parity & Distribution ✅
0 commit comments