|
| 1 | +# cocoindex-code (Rust) — AST-based semantic code search |
| 2 | + |
| 3 | +A lightweight, effective **(AST-based)** semantic code search tool for your |
| 4 | +codebase — the native-Rust build of [`ccc`](https://github.com/cocoindex-io/cocoindex-code). |
| 5 | +Built on [CocoIndex](https://github.com/cocoindex-io/cocoindex), the Rust data |
| 6 | +transformation engine. Use it from the CLI, or wire it into Claude Code, Codex, |
| 7 | +Cursor — any coding agent — via [Skill](#coding-agent-integration) or |
| 8 | +[MCP](#mcp-server). |
| 9 | + |
| 10 | +- Instant token savings — let the agent find code by meaning, not grep. |
| 11 | +- **Local embeddings, zero setup** — runs fully offline, no API key required. |
| 12 | +- **Incremental** — only re-indexes changed files. |
| 13 | + |
| 14 | +## Features |
| 15 | + |
| 16 | +- **Semantic code search** — find relevant code with natural-language queries |
| 17 | + when grep falls short. |
| 18 | +- **Ultra performant** — a single static binary on top of the Rust |
| 19 | + [CocoIndex](https://github.com/cocoindex-io/cocoindex) engine; only changed |
| 20 | + files are re-indexed. |
| 21 | +- **Multi-language** — Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, |
| 22 | + SQL, Shell, and more (tree-sitter). |
| 23 | +- **Embedded** — a sqlite-vec index file; no database to run. |
| 24 | +- **Local embeddings** — sentence-transformers via [fastembed](https://github.com/Anush008/fastembed-rs) |
| 25 | + (ONNX), no API key, no Python. |
| 26 | + |
| 27 | +## Install |
| 28 | + |
| 29 | +The Rust build is compiled from source. It depends on the CocoIndex SDK as a |
| 30 | +sibling checkout, so clone both repos side by side: |
| 31 | + |
| 32 | +```bash |
| 33 | +git clone https://github.com/cocoindex-io/cocoindex |
| 34 | +git clone -b rust https://github.com/cocoindex-io/cocoindex-code |
| 35 | + |
| 36 | +cd cocoindex-code/rust |
| 37 | +cargo build --release |
| 38 | + |
| 39 | +# put the binary on your PATH (or use `cargo install --path .`) |
| 40 | +install -m 0755 target/release/ccc ~/.local/bin/ccc |
| 41 | +ccc --help |
| 42 | +``` |
| 43 | + |
| 44 | +Embeddings are **local-only** (fastembed/ONNX) — no cloud provider or API key is |
| 45 | +required or supported in this build. The default model is |
| 46 | +[`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-small-en-v1.5); any |
| 47 | +model in fastembed's registry can be selected (see [Configuration](#configuration)). |
| 48 | + |
| 49 | +## Quick start |
| 50 | + |
| 51 | +```bash |
| 52 | +ccc init # initialize project (creates settings) |
| 53 | +ccc index # build the index |
| 54 | +ccc search "authentication logic" # search! |
| 55 | +``` |
| 56 | + |
| 57 | +The background daemon starts automatically on first use and keeps the embedding |
| 58 | +model warm. |
| 59 | + |
| 60 | +> **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you |
| 61 | +> can skip straight to indexing. |
| 62 | +
|
| 63 | +## Coding Agent Integration |
| 64 | + |
| 65 | +### Skill |
| 66 | + |
| 67 | +Install the `ccc` skill so your coding agent automatically uses semantic search |
| 68 | +when it helps: |
| 69 | + |
| 70 | +```bash |
| 71 | +npx skills add cocoindex-io/cocoindex-code |
| 72 | +``` |
| 73 | + |
| 74 | +The skill teaches the agent to initialize, index, and search on its own, and to |
| 75 | +keep the index fresh as you work. Ask it to search the codebase — e.g. *"find how |
| 76 | +user sessions are managed"* — or invoke it directly with `/ccc`. Requires the |
| 77 | +`ccc` binary on your `PATH` (see [Install](#install)). |
| 78 | + |
| 79 | +### MCP Server |
| 80 | + |
| 81 | +Alternatively, run `ccc` as an MCP server over stdio: |
| 82 | + |
| 83 | +```bash |
| 84 | +# Claude Code |
| 85 | +claude mcp add cocoindex-code -- ccc mcp |
| 86 | + |
| 87 | +# Codex |
| 88 | +codex mcp add cocoindex-code -- ccc mcp |
| 89 | +``` |
| 90 | + |
| 91 | +Once configured, the agent decides when semantic search is helpful — finding code |
| 92 | +by description, exploring unfamiliar code, or locating implementations without |
| 93 | +knowing exact names. |
| 94 | + |
| 95 | +<details> |
| 96 | +<summary>MCP Tool Reference</summary> |
| 97 | + |
| 98 | +Running as an MCP server (`ccc mcp`) exposes one tool: |
| 99 | + |
| 100 | +**`search`** — search the codebase by semantic similarity. |
| 101 | + |
| 102 | +``` |
| 103 | +search( |
| 104 | + query: str, # natural-language query or code snippet |
| 105 | + limit: int = 5, # max results (1–100) |
| 106 | + offset: int = 0, # pagination offset |
| 107 | + refresh_index: bool = True, # refresh the index before querying |
| 108 | + languages: list[str] | None = None, # filter by language, e.g. ["python","rust"] |
| 109 | + paths: list[str] | None = None, # filter by path glob, e.g. ["src/utils/*"] |
| 110 | +) |
| 111 | +``` |
| 112 | + |
| 113 | +Returns matching chunks with file path, language, code, line numbers, and a |
| 114 | +similarity score. |
| 115 | +</details> |
| 116 | + |
| 117 | +## CLI Reference |
| 118 | + |
| 119 | +| Command | Description | |
| 120 | +|---------|-------------| |
| 121 | +| `ccc init` | Initialize a project — creates settings files, adds `.cocoindex_code/` to `.gitignore` | |
| 122 | +| `ccc index` | Build or update the index (auto-inits if needed) | |
| 123 | +| `ccc search <query>` | Semantic search across the codebase | |
| 124 | +| `ccc status` | Show index stats (chunk count, file count, language breakdown) | |
| 125 | +| `ccc mcp` | Run as an MCP server in stdio mode | |
| 126 | +| `ccc doctor` | Run diagnostics — settings, daemon, model, file matching, index health (`-v` for detail) | |
| 127 | +| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. | |
| 128 | +| `ccc daemon status` | Show daemon version, uptime, and loaded projects | |
| 129 | +| `ccc daemon restart` | Restart the background daemon | |
| 130 | +| `ccc daemon stop` | Stop the daemon | |
| 131 | + |
| 132 | +### Search options |
| 133 | + |
| 134 | +```bash |
| 135 | +ccc search database schema # basic search |
| 136 | +ccc search --lang python --lang markdown schema # filter by language |
| 137 | +ccc search --path 'src/utils/*' query handler # filter by path glob |
| 138 | +ccc search --offset 10 --limit 5 database schema # pagination |
| 139 | +ccc search --refresh database schema # update index first, then search |
| 140 | +``` |
| 141 | + |
| 142 | +By default `ccc search` scopes results to your current working directory |
| 143 | +(relative to the project root). Use `--path` to override. |
| 144 | + |
| 145 | +## Configuration |
| 146 | + |
| 147 | +Configuration lives in two YAML files, both created by `ccc init`. |
| 148 | + |
| 149 | +### User settings (`~/.cocoindex_code/global_settings.yml`) |
| 150 | + |
| 151 | +Shared across all projects — controls the embedding model. |
| 152 | + |
| 153 | +```yaml |
| 154 | +embedding: |
| 155 | + provider: sentence-transformers # local fastembed (the only supported provider) |
| 156 | + model: BAAI/bge-small-en-v1.5 # any model in fastembed's registry |
| 157 | + |
| 158 | + # Optional asymmetric-retrieval knobs, applied separately to indexing vs query. |
| 159 | + # Accepted key: prompt_name (sentence-transformers). |
| 160 | + # indexing_params: |
| 161 | + # prompt_name: passage |
| 162 | + # query_params: |
| 163 | + # prompt_name: query |
| 164 | +``` |
| 165 | + |
| 166 | +> Set `COCOINDEX_CODE_DIR` to place `global_settings.yml` somewhere other than |
| 167 | +> `~/.cocoindex_code/`. |
| 168 | +
|
| 169 | +Models are resolved against fastembed's registry by name, then by suffix — so |
| 170 | +`sentence-transformers/all-MiniLM-L6-v2` resolves. Cloud / LiteLLM providers are |
| 171 | +not part of this build; a `provider: litellm` config loads but fails with a clear |
| 172 | +message pointing at the local provider. |
| 173 | + |
| 174 | +### Project settings (`<project>/.cocoindex_code/settings.yml`) |
| 175 | + |
| 176 | +Per-project — controls which files are indexed. |
| 177 | + |
| 178 | +```yaml |
| 179 | +include_patterns: |
| 180 | + - "**/*.py" |
| 181 | + - "**/*.ts" |
| 182 | + - "**/*.rs" |
| 183 | + - "**/*.go" |
| 184 | + # ... sensible defaults for 28+ file types |
| 185 | + |
| 186 | +exclude_patterns: |
| 187 | + - "**/.*" # hidden directories |
| 188 | + - "**/node_modules" |
| 189 | + - "**/dist" |
| 190 | + # ... |
| 191 | + |
| 192 | +language_overrides: |
| 193 | + - ext: inc # treat .inc files as PHP |
| 194 | + lang: php |
| 195 | +``` |
| 196 | +
|
| 197 | +Include/exclude globs additionally honor nested `.gitignore` files. |
| 198 | +`.cocoindex_code/` is added to `.gitignore` during `init`. |
| 199 | + |
| 200 | +## Supported languages |
| 201 | + |
| 202 | +Tree-sitter–based chunking for Python, JavaScript/TypeScript, Rust, Go, Java, |
| 203 | +C/C++, C#, Ruby, PHP, Swift, Kotlin, Scala, SQL, Shell, Markdown, and more. |
| 204 | +Unrecognized text files are indexed with a generic recursive splitter. |
| 205 | + |
| 206 | +## Differences from the Python build |
| 207 | + |
| 208 | +This native build targets feature parity with the Python `ccc` for day-to-day |
| 209 | +use; two things differ today: |
| 210 | + |
| 211 | +- **Embeddings are local-only** (fastembed). There is no LiteLLM / cloud-provider |
| 212 | + option, and the default model is `BAAI/bge-small-en-v1.5`. |
| 213 | +- **Custom Python chunkers** (`chunkers:` in project settings) are not supported — |
| 214 | + the config still parses, but the built-in tree-sitter splitter is used. |
| 215 | + |
| 216 | +Index databases are interchangeable: `ccc search` works against an index built by |
| 217 | +the Python tool, and vice versa. |
0 commit comments