M9nx
diff --git a/‎README.md‎
Lines changed: 6 additions & 4 deletions b/‎README.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎codexa-core/Cargo.toml‎
Lines changed: 2 additions & 0 deletions b/‎codexa-core/Cargo.toml‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎codexa-core/src/lib.rs‎
Lines changed: 6 additions & 0 deletions b/‎codexa-core/src/lib.rs‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎codexa-core/src/tantivy_search.rs‎
Lines changed: 195 additions & 0 deletions b/‎codexa-core/src/tantivy_search.rs‎
Lines changed: 195 additions & 0 deletions
diff --git a/‎codexa.spec‎
Lines changed: 55 additions & 0 deletions b/‎codexa.spec‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎docs/index.md‎
Lines changed: 7 additions & 5 deletions b/‎docs/index.md‎
Lines changed: 7 additions & 5 deletions
@@ -5,6 +5,8 @@
 
 <p align="center">
   <a href="https://github.com/M9nx/CodexA/actions"><img src="https://github.com/M9nx/CodexA/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
+  <a href="https://pypi.org/project/codexa/"><img src="https://img.shields.io/pypi/v/codexa?color=blue&label=PyPI" alt="PyPI"></a>
+  <a href="https://pepy.tech/project/codexa"><img src="https://pepy.tech/badge/codexa" alt="Downloads"></a>
   <img src="https://img.shields.io/badge/python-3.11%2B-blue" alt="Python 3.11+">
   <img src="https://img.shields.io/badge/version-0.5.0-green" alt="Version">
   <img src="https://img.shields.io/badge/tests-2596-brightgreen" alt="Tests">
@@ -24,13 +26,13 @@ structured tool protocol that any AI agent can call over HTTP or CLI.
 
 | Area | What you get |
 |------|-------------|
-| **Code Indexing** | Scan repos, extract functions/classes, generate vector embeddings (sentence-transformers + FAISS), ONNX runtime option, parallel indexing, `--watch` live re-indexing, `.codexaignore` support |
-| **Rust Search Engine** | Native `codexa-core` Rust crate via PyO3 — HNSW approximate nearest-neighbour search, BM25 keyword index, tree-sitter AST chunker (10 languages), memory-mapped vector persistence, parallel file scanner, optional ONNX embedding inference |
-| **Multi-Mode Search** | Semantic, keyword (BM25), regex, hybrid (RRF), and raw filesystem grep (ripgrep backend) with full `-A/-B/-C/-w/-v/-c` flags |
+| **Code Indexing** | Scan repos, extract functions/classes, generate vector embeddings (sentence-transformers + FAISS), ONNX runtime option, parallel indexing, `--watch` live re-indexing, `.codexaignore` support, `--add`/`--inspect` per-file control, model-consistency guard, Ctrl+C partial-save |
+| **Rust Search Engine** | Native `codexa-core` Rust crate via PyO3 — HNSW approximate nearest-neighbour search, BM25 keyword index, tree-sitter AST chunker (10 languages), memory-mapped vector persistence, parallel file scanner, optional ONNX embedding inference, optional Tantivy full-text search |
+| **Multi-Mode Search** | Semantic, keyword (BM25), regex, hybrid (RRF), and raw filesystem grep (ripgrep backend) with full `-A/-B/-C/-w/-v/-c/-l/-L/--exclude/--no-ignore` flags, `--hybrid`/`--sem` shorthands, `--scores`, `--snippet-length`, `--no-snippet`, JSONL streaming |
 | **RAG Pipeline** | 4-stage Retrieval-Augmented Generation — Retrieve → Deduplicate → Re-rank → Assemble with token budget, cross-encoder re-ranking, source citations |
 | **Code Context** | Rich context windows — imports, dependencies, AST-based call graphs, surrounding code |
 | **Repository Analysis** | Language breakdown (`codexa languages`), module summaries, component detection |
-| **AI Agent Protocol** | 13 built-in tools exposed via HTTP bridge, MCP server (13 tools), MCP-over-SSE (`--mcp`), or CLI for any AI agent to invoke |
+| **AI Agent Protocol** | 13 built-in tools exposed via HTTP bridge, MCP server (13 tools with pagination/cursors), MCP-over-SSE (`--mcp`), `codexa --serve` shorthand, Claude Desktop auto-config (`--claude-config`), or CLI for any AI agent to invoke |
 | **Quality & Metrics** | Complexity analysis, maintainability scoring, quality gates for CI |
 | **Multi-Repo Workspaces** | Link multiple repos under one workspace for cross-repo search & refactoring |
 | **Interactive TUI** | Terminal REPL with mode switching for interactive exploration |
 
@@ -34,6 +34,7 @@ tree-sitter-cpp = "0.23.4"
 tree-sitter-ruby = "0.23.1"
 ort = { version = "2.0.0-rc.12", features = ["download-binaries"], optional = true }
 ndarray = { version = "0.17.2", optional = true }
+tantivy = { version = "0.22", optional = true }
 
 [profile.release]
 opt-level = 3
@@ -43,3 +44,4 @@ codegen-units = 1
 [features]
 default = []
 onnx = ["dep:ort", "dep:ndarray"]
+tantivy-backend = ["dep:tantivy"]
@@ -14,6 +14,8 @@ mod embed;
 mod hnsw;
 mod hybrid;
 mod scan;
+#[cfg(feature = "tantivy-backend")]
+mod tantivy_search;
 
 /// The top-level Python module exposed via PyO3.
 #[pymodule]
@@ -45,5 +47,9 @@ fn codexa_core(m: &Bound<'_, PyModule>) -> PyResult<()> {
     #[cfg(feature = "onnx")]
     m.add_class::<embed::OnnxEmbedder>()?;
 
+    // Tantivy full-text search (only when compiled with --features tantivy-backend)
+    #[cfg(feature = "tantivy-backend")]
+    m.add_class::<tantivy_search::TantivyIndex>()?;
+
     Ok(())
 }
@@ -0,0 +1,195 @@
+//! Tantivy full-text search backend — optional feature.
+//!
+//! Provides a PyO3-exposed `TantivyIndex` class wrapping a Tantivy
+//! index for BM25-quality full-text search with sub-100ms query latency.
+//! Documents are code chunks (file_path, content, language, line range).
+
+#[cfg(feature = "tantivy-backend")]
+use pyo3::prelude::*;
+
+#[cfg(feature = "tantivy-backend")]
+use tantivy::{
+    collector::TopDocs,
+    doc,
+    query::QueryParser,
+    schema::{Field, Schema, STORED, TEXT},
+    Index, IndexReader, IndexWriter, ReloadPolicy,
+};
+
+#[cfg(feature = "tantivy-backend")]
+use std::path::PathBuf;
+
+/// A Tantivy-backed full-text search index for code chunks.
+///
+/// Wraps Tantivy's inverted index for BM25-quality full-text search.
+/// Created via `TantivyIndex(directory)` — the index is disk-persistent.
+#[cfg(feature = "tantivy-backend")]
+#[pyclass]
+pub struct TantivyIndex {
+    index: Index,
+    reader: IndexReader,
+    f_file_path: Field,
+    f_content: Field,
+    f_language: Field,
+    f_start_line: Field,
+    f_end_line: Field,
+    f_chunk_index: Field,
+    schema: Schema,
+    index_dir: PathBuf,
+}
+
+#[cfg(feature = "tantivy-backend")]
+#[pymethods]
+impl TantivyIndex {
+    /// Create or open a Tantivy index at the given directory.
+    #[new]
+    fn new(directory: String) -> PyResult<Self> {
+        let dir = PathBuf::from(&directory);
+        std::fs::create_dir_all(&dir).map_err(|e| {
+            pyo3::exceptions::PyIOError::new_err(format!("Cannot create index dir: {e}"))
+        })?;
+
+        let mut schema_builder = Schema::builder();
+        let f_file_path = schema_builder.add_text_field("file_path", STORED);
+        let f_content = schema_builder.add_text_field("content", TEXT | STORED);
+        let f_language = schema_builder.add_text_field("language", STORED);
+        let f_start_line = schema_builder.add_text_field("start_line", STORED);
+        let f_end_line = schema_builder.add_text_field("end_line", STORED);
+        let f_chunk_index = schema_builder.add_text_field("chunk_index", STORED);
+        let schema = schema_builder.build();
+
+        let mmap_dir =
+            tantivy::directory::MmapDirectory::open(&dir).map_err(|e| {
+                pyo3::exceptions::PyIOError::new_err(format!("Tantivy dir error: {e}"))
+            })?;
+
+        let index = Index::open_or_create(mmap_dir, schema.clone()).map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Tantivy index error: {e}"))
+        })?;
+
+        let reader = index
+            .reader_builder()
+            .reload_policy(ReloadPolicy::OnCommitWithDelay)
+            .try_into()
+            .map_err(|e| {
+                pyo3::exceptions::PyRuntimeError::new_err(format!("Reader error: {e}"))
+            })?;
+
+        Ok(Self {
+            index,
+            reader,
+            f_file_path,
+            f_content,
+            f_language,
+            f_start_line,
+            f_end_line,
+            f_chunk_index,
+            schema,
+            index_dir: dir,
+        })
+    }
+
+    /// Add a batch of code chunks to the index.
+    ///
+    /// Each chunk is a tuple: (file_path, content, language, start_line, end_line, chunk_index)
+    fn add_chunks(&self, chunks: Vec<(String, String, String, usize, usize, usize)>) -> PyResult<u64> {
+        let mut writer: IndexWriter = self.index.writer(50_000_000).map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Writer error: {e}"))
+        })?;
+
+        let mut count = 0u64;
+        for (fp, content, lang, sl, el, ci) in chunks {
+            writer.add_document(doc!(
+                self.f_file_path => fp,
+                self.f_content => content,
+                self.f_language => lang,
+                self.f_start_line => sl.to_string(),
+                self.f_end_line => el.to_string(),
+                self.f_chunk_index => ci.to_string(),
+            )).map_err(|e| {
+                pyo3::exceptions::PyRuntimeError::new_err(format!("Add doc error: {e}"))
+            })?;
+            count += 1;
+        }
+
+        writer.commit().map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Commit error: {e}"))
+        })?;
+
+        Ok(count)
+    }
+
+    /// Search the index for a query string, returning up to `top_k` results.
+    ///
+    /// Returns a list of (file_path, content, language, start_line, end_line, chunk_index, score).
+    fn search(&self, query: &str, top_k: usize) -> PyResult<Vec<(String, String, String, usize, usize, usize, f32)>> {
+        let searcher = self.reader.searcher();
+        let query_parser = QueryParser::for_index(&self.index, vec![self.f_content]);
+        let parsed = query_parser.parse_query(query).map_err(|e| {
+            pyo3::exceptions::PyValueError::new_err(format!("Query parse error: {e}"))
+        })?;
+
+        let top_docs = searcher.search(&parsed, &TopDocs::with_limit(top_k)).map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Search error: {e}"))
+        })?;
+
+        let mut results = Vec::with_capacity(top_docs.len());
+        for (score, doc_address) in top_docs {
+            let doc = searcher.doc::<tantivy::TantivyDocument>(doc_address).map_err(|e| {
+                pyo3::exceptions::PyRuntimeError::new_err(format!("Doc fetch error: {e}"))
+            })?;
+
+            let get_text = |field: Field| -> String {
+                doc.get_first(field)
+                    .and_then(|v| v.as_str())
+                    .unwrap_or("")
+                    .to_string()
+            };
+
+            let file_path = get_text(self.f_file_path);
+            let content = get_text(self.f_content);
+            let language = get_text(self.f_language);
+            let start_line: usize = get_text(self.f_start_line).parse().unwrap_or(0);
+            let end_line: usize = get_text(self.f_end_line).parse().unwrap_or(0);
+            let chunk_index: usize = get_text(self.f_chunk_index).parse().unwrap_or(0);
+
+            results.push((file_path, content, language, start_line, end_line, chunk_index, score));
+        }
+
+        Ok(results)
+    }
+
+    /// Remove all documents for a given file path.
+    fn remove_file(&self, file_path: &str) -> PyResult<u64> {
+        let mut writer: IndexWriter = self.index.writer(50_000_000).map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Writer error: {e}"))
+        })?;
+
+        let term = tantivy::Term::from_field_text(self.f_file_path, file_path);
+        writer.delete_term(term);
+        writer.commit().map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Commit error: {e}"))
+        })?;
+
+        Ok(0) // Tantivy doesn't easily report deleted count
+    }
+
+    /// Clear the entire index.
+    fn clear(&self) -> PyResult<()> {
+        let mut writer: IndexWriter = self.index.writer(50_000_000).map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Writer error: {e}"))
+        })?;
+        writer.delete_all_documents().map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Clear error: {e}"))
+        })?;
+        writer.commit().map_err(|e| {
+            pyo3::exceptions::PyRuntimeError::new_err(format!("Commit error: {e}"))
+        })?;
+        Ok(())
+    }
+
+    /// Return the number of documents in the index.
+    fn num_docs(&self) -> u64 {
+        self.reader.searcher().num_docs()
+    }
+}
@@ -0,0 +1,55 @@
+# -*- mode: python ; coding: utf-8 -*-
+"""PyInstaller spec for building a single-binary CodexA distribution."""
+
+import sys
+from pathlib import Path
+
+block_cipher = None
+
+a = Analysis(
+    ['semantic_code_intelligence/cli/main.py'],
+    pathex=['.'],
+    binaries=[],
+    datas=[
+        ('semantic_code_intelligence', 'semantic_code_intelligence'),
+    ],
+    hiddenimports=[
+        'click',
+        'rich',
+        'pydantic',
+        'semantic_code_intelligence.cli.router',
+        'semantic_code_intelligence.cli.main',
+    ],
+    hookspath=[],
+    hooksconfig={},
+    runtime_hooks=[],
+    excludes=[],
+    win_no_prefer_redirects=False,
+    win_private_assemblies=False,
+    cipher=block_cipher,
+    noarchive=False,
+)
+
+pyz = PYZ(a.pure, a.zipped_data, cipher=block_cipher)
+
+exe = EXE(
+    pyz,
+    a.scripts,
+    a.binaries,
+    a.zipfiles,
+    a.datas,
+    [],
+    name='codexa',
+    debug=False,
+    bootloader_ignore_signals=False,
+    strip=False,
+    upx=True,
+    upx_exclude=[],
+    runtime_tmpdir=None,
+    console=True,
+    disable_windowed_traceback=False,
+    argv_emulation=False,
+    target_arch=None,
+    codesign_identity=None,
+    entitlements_file=None,
+)
@@ -17,11 +17,11 @@ features:
   - icon:
       svg: '<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><circle cx="11" cy="11" r="8"/><path d="m21 21-4.3-4.3"/></svg>'
     title: Semantic Search
-    details: Natural-language code search powered by sentence-transformers and FAISS. Find code by meaning, not just keywords — queries are embedded into vectors and matched against your entire codebase.
+    details: Natural-language code search powered by sentence-transformers, FAISS, and optional Tantivy full-text engine. Multi-mode — semantic, BM25, regex, hybrid (RRF), grep. JSONL streaming, --scores, --snippet-length, --no-snippet, --hybrid/--sem shorthands, pagination cursors.
   - icon:
       svg: '<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M12 8V4H8"/><rect width="16" height="12" x="4" y="8" rx="2"/><path d="M2 14h2"/><path d="M20 14h2"/><path d="M15 13v2"/><path d="M9 13v2"/></svg>'
     title: AI Agent Protocol
-    details: 13 structured tools invocable via CLI, HTTP bridge, MCP, or Python API. Designed for autonomous AI coding agents with structured JSON input/output.
+    details: 13 structured tools invocable via CLI, HTTP bridge, or MCP server with cursor-based pagination. codexa --serve shorthand, Claude Desktop auto-config (--claude-config), SSE streaming, and full Cursor/Windsurf compatibility.
   - icon:
       svg: '<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><path d="M17 10.5V7c0-1.38-1.12-2.5-2.5-2.5S12 5.62 12 7v3.5"/><path d="M7 10.5V7c0-1.38 1.12-2.5 2.5-2.5"/><path d="m2 19 5-5"/><path d="m7 19 5-5"/><path d="m12 19 5-5"/><path d="m17 19 5-5"/></svg>'
     title: Multi-Language Parsing
@@ -41,11 +41,11 @@ features:
   - icon:
       svg: '<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><polyline points="4 17 10 11 4 5"/><line x1="12" x2="20" y1="19" y2="19"/></svg>'
     title: 39 CLI Commands
-    details: Comprehensive Click-based CLI with --json, --pipe, and --verbose flags. Every command returns structured output suitable for scripting and automation. Includes grep, benchmark, languages, and raw filesystem search.
+    details: Comprehensive Click-based CLI with --json, --pipe, --jsonl, and --verbose flags. Every command returns structured output suitable for scripting and automation. Includes grep with --exclude/--no-ignore/-L, benchmark, languages, and raw filesystem search. Single-binary distribution via PyInstaller.
   - icon:
       svg: '<svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"><rect width="18" height="18" x="3" y="3" rx="2"/><path d="M3 9h18"/><path d="M9 21V9"/></svg>'
     title: Multiple Interfaces
-    details: CLI, Web UI, REST API, Bridge Server, MCP Server, LSP Server, and interactive TUI — all built on the same tool protocol for consistent behavior everywhere.
+    details: CLI, Web UI, REST API, Bridge Server, MCP Server (with cursor-based pagination), LSP Server, and interactive TUI — all built on the same tool protocol for consistent behavior everywhere. Incremental indexing with --add/--inspect and model-consistency guards.
 ---
 
 ## Quick Start
@@ -64,6 +64,8 @@ codexa doctor
 
 # Search your code
 codexa search "authentication middleware"
+codexa search "auth flow" --hybrid --scores
+codexa grep "TODO|FIXME" --jsonl -L
 
 # AI-powered analysis
 codexa ask "How does the auth flow work?"
@@ -75,7 +77,7 @@ codexa hotspots
 
 | Metric | Value |
 |--------|-------|
-| **Version** | 0.4.3 |
+| **Version** | 0.5.0 |
 | **CLI Commands** | 39 |
 | **AI Agent Tools** | 13 (+ plugin-registered) |
 | **Plugin Hooks** | 22 |