@@ -533,38 +533,110 @@ providing high-performance alternatives to the Python search and indexing stack.
533533
534534---
535535
536- ### Phase 33: Remote / Cloud Mode
537- Package CodexA as a Docker container with a shared REST API so teams can share
538- one index server. Add authentication, rate limiting, and team dashboards.
539-
540- ### Phase 34: GitHub / GitLab CI Plugin
541- GitHub Actions / GitLab CI plugin that runs ` codexa quality ` on PRs, blocks
542- merges on regressions, and posts inline review comments.
543-
544- ### Phase 35: Multi-Agent Orchestration
545- Allow multiple AI agents to share a single CodexA instance with isolated
546- sessions, concurrent tool invocations, and coordinated context windows.
547-
548- ### Phase 36: Incremental Embedding Models
549- Hot-swap embedding models without full re-index — store raw chunks alongside
550- vectors and re-embed lazily on model change.
551-
552- ### Phase 37: Code Generation Pipeline
553- Use RAG context + LLM to generate code scaffolds, tests, and documentation
554- from natural language descriptions grounded in the actual codebase.
555-
556- ### Phase 38: Distributed Workspace Federation
557- Federate multiple CodexA instances across machines/orgs — search across remote
558- indexes without copying data, with result merging and access control.
559-
560- ### Phase 39: IDE Extension v2 — Multi-IDE Support
561- Extend the VS Code extension to JetBrains (IntelliJ plugin) and Neovim (Lua),
562- sharing the same bridge server backend.
563-
564- ### Phase 40: Semantic Diff & Code Review AI
565- AST-level diff analysis — detect semantic changes (renamed symbols, moved
566- functions, signature changes) vs. cosmetic changes (formatting, comments).
567- Power AI code review with structural understanding.
536+ ### Phase 33: Search Dominance — JSONL Streaming & Output Parity
537+ Close the output-format gap with ck. Make CodexA the best tool for both humans
538+ and AI agents to consume search results.
539+
540+ | Feature | Description |
541+ | ---------| -------------|
542+ | ** JSONL streaming output** | ` --jsonl ` flag on ` search ` , ` grep ` , ` tool run ` — one JSON object per line, streaming-friendly for LLMs and pipelines |
543+ | ** Scored output** | ` --scores ` flag — prepend ` [0.847] ` relevance scores to every result line with color highlighting |
544+ | ** Snippet length control** | ` --snippet-length N ` to control how much context is shown per match |
545+ | ** No-snippet mode** | ` --no-snippet ` for metadata-only output (file, line, score) |
546+ | ** Full-section extraction** | ` --full-section ` returns the complete function/class containing the match (tree-sitter aware) |
547+ | ** Clean stdout/stderr separation** | All progress/status to stderr, only results to stdout — reliable piping |
548+
549+ ### Phase 34: Search Dominance — Chunk-Level Incremental Indexing
550+ Eliminate full re-index overhead. Match and exceed ck's delta indexing with
551+ chunk-level content-addressed caching.
552+
553+ | Feature | Description |
554+ | ---------| -------------|
555+ | ** Chunk-level caching** | blake3 hash per chunk — only re-embed changed chunks (80-90% cache hit on typical edits) |
556+ | ** Content-aware invalidation** | Doc comment and whitespace changes properly invalidate affected chunks |
557+ | ** Model-consistency guard** | Detect embedding model switches and prevent silent vector corruption |
558+ | ** Interruption safety** | Ctrl+C saves partial index; next run resumes from where it stopped |
559+ | ** ` --add ` single file** | ` codexa index --add <file> ` index a single file without full scan |
560+ | ** ` --inspect ` file** | ` codexa index --inspect <file> ` show chunk breakdown, token counts, cache status |
561+
562+ ### Phase 35: Search Dominance — Native Rust Search Engine v2
563+ Push all hot-path search operations into ` codexa-core ` for sub-100ms queries
564+ on million-LOC codebases. Beat ck's Tantivy/FastEmbed stack on raw speed.
565+
566+ | Feature | Description |
567+ | ---------| -------------|
568+ | ** Tantivy integration** | Replace Python BM25 with Tantivy full-text engine in Rust (same lib ck uses) |
569+ | ** FastEmbed in Rust** | ONNX embedding inference fully in Rust — no Python overhead on the hot path |
570+ | ** ANN index persistence** | HNSW index saved/loaded via mmap in <50ms (currently rebuilds) |
571+ | ** Parallel query** | Rayon-parallel semantic + BM25 + regex queries fused in a single Rust call |
572+ | ** Benchmarking parity** | ` codexa benchmark ` reports indexing speed (LOC/s), query latency (p50/p95/p99), cache hit rate |
573+ | ** Pre-built wheels** | Publish manylinux, macOS (arm64+x86), Windows wheels to PyPI — ` pip install codexa ` just works |
574+
575+ ### Phase 36: Search Dominance — MCP Server v2 & Agent Protocol
576+ Make CodexA the best MCP server for every AI client — Claude, Cursor, Copilot,
577+ Windsurf. Full pagination, cursors, and streaming.
578+
579+ | Feature | Description |
580+ | ---------| -------------|
581+ | ** MCP pagination** | ` page_size ` , ` cursor ` , ` next_cursor ` on all search tools — handle 10K+ results gracefully |
582+ | ** MCP streaming** | SSE token-by-token delivery for long search results |
583+ | ** ` codexa --serve ` ** | Single-flag MCP server start (match ck's ` ck --serve ` simplicity) |
584+ | ** Claude Desktop config** | ` claude mcp add codexa ` one-liner install with auto-config JSON |
585+ | ** Tool permissions** | Per-tool read/write permission model for safe agent use |
586+ | ** Health + status** | ` index_status ` , ` reindex ` , ` health_check ` MCP tools (match ck's tool set) |
587+ | ** Cursor / Windsurf integration** | Documented setup guides and tested configs for Cursor, Windsurf, Continue.dev |
588+
589+ ### Phase 37: Search Dominance — grep Parity & Single-Binary Distribution
590+ Make ` codexa ` a true drop-in replacement for grep/ripgrep with zero-config
591+ install on every platform.
592+
593+ | Feature | Description |
594+ | ---------| -------------|
595+ | ** grep flag parity** | ` -l ` (list files), ` -L ` (list files without match), ` -R ` (recursive default), ` --exclude ` glob patterns, ` --no-ignore ` |
596+ | ** Hybrid search flag** | ` codexa --hybrid "query" ` — combined semantic+keyword in one flag (match ck --hybrid) |
597+ | ** Semantic search flag** | ` codexa --sem "query" ` — shorthand for semantic search (match ck --sem) |
598+ | ** ` .codexaignore ` auto-create** | Auto-generate ` .codexaignore ` on first index with sensible defaults (images, binaries, config files) |
599+ | ** Single binary** | PyInstaller/Nuitka-compiled standalone binary — no Python required |
600+ | ** Homebrew tap** | ` brew install m9nx/tap/codexa ` with auto-updating formula |
601+ | ** Cargo install** | ` cargo install codexa ` for the Rust engine with embedded Python runtime (stretch goal) |
602+ | ** Scoop / Chocolatey** | Windows package manager support |
603+
604+ ### Phase 38: Incremental Embedding Models & Model Hub
605+ Hot-swap embedding models without full re-index. Built-in model benchmarking
606+ and recommendations.
607+
608+ | Feature | Description |
609+ | ---------| -------------|
610+ | ** Lazy re-embedding** | Store raw chunks alongside vectors; re-embed only on query if model changed |
611+ | ** ` codexa models benchmark ` ** | Benchmark all installed models on your actual codebase (speed, quality, memory) |
612+ | ** ` --switch-model ` ** | ` codexa index --switch-model jina-code ` with smart cache invalidation |
613+ | ** Model download** | ` codexa models download bge-small ` with progress bar and verification |
614+ | ** HuggingFace tokenizers** | Token-exact chunk boundaries (match ck's tokenizer precision) |
615+
616+ ### Phase 39: Remote / Cloud Mode & Team Features
617+ Package CodexA as a shared server for teams. Authentication, dashboards, and
618+ collaborative search.
619+
620+ | Feature | Description |
621+ | ---------| -------------|
622+ | ** Docker image** | Production multi-stage image with pre-loaded models |
623+ | ** Team REST API** | Shared index server with API key authentication |
624+ | ** Rate limiting** | Per-user RPM/TPM limits on the shared server |
625+ | ** Team dashboard** | Web UI showing search analytics, popular queries, index health |
626+ | ** GitHub / GitLab CI plugin** | ` codexa quality ` on PRs, block merges on regressions, inline review comments |
627+
628+ ### Phase 40: Multi-Agent Orchestration & IDE v2
629+ Multiple AI agents sharing one CodexA instance. Multi-IDE support beyond
630+ VS Code.
631+
632+ | Feature | Description |
633+ | ---------| -------------|
634+ | ** Concurrent sessions** | Isolated agent sessions with independent context windows |
635+ | ** Coordinated context** | Agents share discovered context to avoid redundant searches |
636+ | ** JetBrains plugin** | IntelliJ/PyCharm plugin sharing the same bridge server |
637+ | ** Neovim integration** | Lua plugin with telescope.nvim integration |
638+ | ** Semantic Diff** | AST-level diff — detect renamed symbols, moved functions, signature changes vs cosmetic edits |
639+ | ** Code Generation** | RAG context + LLM → code scaffolds, tests, docs grounded in actual codebase |
568640
569641---
570642
0 commit comments