Skip to content

Commit 70ada27

Browse files
committed
roadmap: rewrite Phases 33-40 — Search Dominance strategy to beat ck
Phases 33-37 focus on closing every search gap with BeaconBay/ck: - Phase 33: JSONL streaming, scored output, snippet control, full-section - Phase 34: Chunk-level incremental indexing, cache hits, interrupt safety - Phase 35: Tantivy + FastEmbed in Rust, sub-100ms queries, pre-built wheels - Phase 36: MCP v2 with pagination/cursors, Claude/Cursor/Windsurf configs - Phase 37: grep flag parity, hybrid/sem shorthands, single binary distro Phases 38-40 build on the search lead: - Phase 38: Hot-swap embedding models, model benchmarking, HF tokenizers - Phase 39: Cloud mode, team features, CI plugin - Phase 40: Multi-agent orchestration, multi-IDE, semantic diff, codegen
1 parent 5820fc5 commit 70ada27

1 file changed

Lines changed: 104 additions & 32 deletions

File tree

ROADMAP.md

Lines changed: 104 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -533,38 +533,110 @@ providing high-performance alternatives to the Python search and indexing stack.
533533

534534
---
535535

536-
### Phase 33: Remote / Cloud Mode
537-
Package CodexA as a Docker container with a shared REST API so teams can share
538-
one index server. Add authentication, rate limiting, and team dashboards.
539-
540-
### Phase 34: GitHub / GitLab CI Plugin
541-
GitHub Actions / GitLab CI plugin that runs `codexa quality` on PRs, blocks
542-
merges on regressions, and posts inline review comments.
543-
544-
### Phase 35: Multi-Agent Orchestration
545-
Allow multiple AI agents to share a single CodexA instance with isolated
546-
sessions, concurrent tool invocations, and coordinated context windows.
547-
548-
### Phase 36: Incremental Embedding Models
549-
Hot-swap embedding models without full re-index — store raw chunks alongside
550-
vectors and re-embed lazily on model change.
551-
552-
### Phase 37: Code Generation Pipeline
553-
Use RAG context + LLM to generate code scaffolds, tests, and documentation
554-
from natural language descriptions grounded in the actual codebase.
555-
556-
### Phase 38: Distributed Workspace Federation
557-
Federate multiple CodexA instances across machines/orgs — search across remote
558-
indexes without copying data, with result merging and access control.
559-
560-
### Phase 39: IDE Extension v2 — Multi-IDE Support
561-
Extend the VS Code extension to JetBrains (IntelliJ plugin) and Neovim (Lua),
562-
sharing the same bridge server backend.
563-
564-
### Phase 40: Semantic Diff & Code Review AI
565-
AST-level diff analysis — detect semantic changes (renamed symbols, moved
566-
functions, signature changes) vs. cosmetic changes (formatting, comments).
567-
Power AI code review with structural understanding.
536+
### Phase 33: Search Dominance — JSONL Streaming & Output Parity
537+
Close the output-format gap with ck. Make CodexA the best tool for both humans
538+
and AI agents to consume search results.
539+
540+
| Feature | Description |
541+
|---------|-------------|
542+
| **JSONL streaming output** | `--jsonl` flag on `search`, `grep`, `tool run` — one JSON object per line, streaming-friendly for LLMs and pipelines |
543+
| **Scored output** | `--scores` flag — prepend `[0.847]` relevance scores to every result line with color highlighting |
544+
| **Snippet length control** | `--snippet-length N` to control how much context is shown per match |
545+
| **No-snippet mode** | `--no-snippet` for metadata-only output (file, line, score) |
546+
| **Full-section extraction** | `--full-section` returns the complete function/class containing the match (tree-sitter aware) |
547+
| **Clean stdout/stderr separation** | All progress/status to stderr, only results to stdout — reliable piping |
548+
549+
### Phase 34: Search Dominance — Chunk-Level Incremental Indexing
550+
Eliminate full re-index overhead. Match and exceed ck's delta indexing with
551+
chunk-level content-addressed caching.
552+
553+
| Feature | Description |
554+
|---------|-------------|
555+
| **Chunk-level caching** | blake3 hash per chunk — only re-embed changed chunks (80-90% cache hit on typical edits) |
556+
| **Content-aware invalidation** | Doc comment and whitespace changes properly invalidate affected chunks |
557+
| **Model-consistency guard** | Detect embedding model switches and prevent silent vector corruption |
558+
| **Interruption safety** | Ctrl+C saves partial index; next run resumes from where it stopped |
559+
| **`--add` single file** | `codexa index --add <file>` index a single file without full scan |
560+
| **`--inspect` file** | `codexa index --inspect <file>` show chunk breakdown, token counts, cache status |
561+
562+
### Phase 35: Search Dominance — Native Rust Search Engine v2
563+
Push all hot-path search operations into `codexa-core` for sub-100ms queries
564+
on million-LOC codebases. Beat ck's Tantivy/FastEmbed stack on raw speed.
565+
566+
| Feature | Description |
567+
|---------|-------------|
568+
| **Tantivy integration** | Replace Python BM25 with Tantivy full-text engine in Rust (same lib ck uses) |
569+
| **FastEmbed in Rust** | ONNX embedding inference fully in Rust — no Python overhead on the hot path |
570+
| **ANN index persistence** | HNSW index saved/loaded via mmap in <50ms (currently rebuilds) |
571+
| **Parallel query** | Rayon-parallel semantic + BM25 + regex queries fused in a single Rust call |
572+
| **Benchmarking parity** | `codexa benchmark` reports indexing speed (LOC/s), query latency (p50/p95/p99), cache hit rate |
573+
| **Pre-built wheels** | Publish manylinux, macOS (arm64+x86), Windows wheels to PyPI — `pip install codexa` just works |
574+
575+
### Phase 36: Search Dominance — MCP Server v2 & Agent Protocol
576+
Make CodexA the best MCP server for every AI client — Claude, Cursor, Copilot,
577+
Windsurf. Full pagination, cursors, and streaming.
578+
579+
| Feature | Description |
580+
|---------|-------------|
581+
| **MCP pagination** | `page_size`, `cursor`, `next_cursor` on all search tools — handle 10K+ results gracefully |
582+
| **MCP streaming** | SSE token-by-token delivery for long search results |
583+
| **`codexa --serve`** | Single-flag MCP server start (match ck's `ck --serve` simplicity) |
584+
| **Claude Desktop config** | `claude mcp add codexa` one-liner install with auto-config JSON |
585+
| **Tool permissions** | Per-tool read/write permission model for safe agent use |
586+
| **Health + status** | `index_status`, `reindex`, `health_check` MCP tools (match ck's tool set) |
587+
| **Cursor / Windsurf integration** | Documented setup guides and tested configs for Cursor, Windsurf, Continue.dev |
588+
589+
### Phase 37: Search Dominance — grep Parity & Single-Binary Distribution
590+
Make `codexa` a true drop-in replacement for grep/ripgrep with zero-config
591+
install on every platform.
592+
593+
| Feature | Description |
594+
|---------|-------------|
595+
| **grep flag parity** | `-l` (list files), `-L` (list files without match), `-R` (recursive default), `--exclude` glob patterns, `--no-ignore` |
596+
| **Hybrid search flag** | `codexa --hybrid "query"` — combined semantic+keyword in one flag (match ck --hybrid) |
597+
| **Semantic search flag** | `codexa --sem "query"` — shorthand for semantic search (match ck --sem) |
598+
| **`.codexaignore` auto-create** | Auto-generate `.codexaignore` on first index with sensible defaults (images, binaries, config files) |
599+
| **Single binary** | PyInstaller/Nuitka-compiled standalone binary — no Python required |
600+
| **Homebrew tap** | `brew install m9nx/tap/codexa` with auto-updating formula |
601+
| **Cargo install** | `cargo install codexa` for the Rust engine with embedded Python runtime (stretch goal) |
602+
| **Scoop / Chocolatey** | Windows package manager support |
603+
604+
### Phase 38: Incremental Embedding Models & Model Hub
605+
Hot-swap embedding models without full re-index. Built-in model benchmarking
606+
and recommendations.
607+
608+
| Feature | Description |
609+
|---------|-------------|
610+
| **Lazy re-embedding** | Store raw chunks alongside vectors; re-embed only on query if model changed |
611+
| **`codexa models benchmark`** | Benchmark all installed models on your actual codebase (speed, quality, memory) |
612+
| **`--switch-model`** | `codexa index --switch-model jina-code` with smart cache invalidation |
613+
| **Model download** | `codexa models download bge-small` with progress bar and verification |
614+
| **HuggingFace tokenizers** | Token-exact chunk boundaries (match ck's tokenizer precision) |
615+
616+
### Phase 39: Remote / Cloud Mode & Team Features
617+
Package CodexA as a shared server for teams. Authentication, dashboards, and
618+
collaborative search.
619+
620+
| Feature | Description |
621+
|---------|-------------|
622+
| **Docker image** | Production multi-stage image with pre-loaded models |
623+
| **Team REST API** | Shared index server with API key authentication |
624+
| **Rate limiting** | Per-user RPM/TPM limits on the shared server |
625+
| **Team dashboard** | Web UI showing search analytics, popular queries, index health |
626+
| **GitHub / GitLab CI plugin** | `codexa quality` on PRs, block merges on regressions, inline review comments |
627+
628+
### Phase 40: Multi-Agent Orchestration & IDE v2
629+
Multiple AI agents sharing one CodexA instance. Multi-IDE support beyond
630+
VS Code.
631+
632+
| Feature | Description |
633+
|---------|-------------|
634+
| **Concurrent sessions** | Isolated agent sessions with independent context windows |
635+
| **Coordinated context** | Agents share discovered context to avoid redundant searches |
636+
| **JetBrains plugin** | IntelliJ/PyCharm plugin sharing the same bridge server |
637+
| **Neovim integration** | Lua plugin with telescope.nvim integration |
638+
| **Semantic Diff** | AST-level diff — detect renamed symbols, moved functions, signature changes vs cosmetic edits |
639+
| **Code Generation** | RAG context + LLM → code scaffolds, tests, docs grounded in actual codebase |
568640

569641
---
570642

0 commit comments

Comments
 (0)