Skip to content

Commit 6529a56

Browse files
badmonster0claude
andauthored
docs(rust): user-facing README for the Rust ccc build (#197)
* docs(rust): document how the port uses the CocoIndex Rust SDK Add a "How it uses the CocoIndex Rust SDK" section to rust/PORTING.md — a code-grounded walkthrough of the SDK API the port exercises (Environment/App/run, ContextKey DI + change detection, #[cocoindex::function] memoization, walk_dir + mount_each!, the sqlite/vec0 table target + declare_row, and the sqlite-vec from_pool gotcha). Snippets cite live file:line anchors so the doc stays verifiable against the source. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rust): make it a standalone Rust README (drop Python framing) Rename rust/PORTING.md -> rust/README.md and rewrite as a Rust-only usage doc: drop the Python->Rust module map, the parity audit, the Python backward-compat section, and the "vs Python" deltas. Keep build/run, architecture, the "How it uses the CocoIndex Rust SDK" walkthrough, CLI commands, configuration, testing, and a plain limitations/follow-ups list. Update the e2e fixture that copies the doc as a sample markdown file. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(rust): rewrite README as a user guide (install / CLI / MCP / config) Drop the SDK-internals walkthrough. The Rust README now mirrors the main cocoindex-code README's user-facing structure — Install (build from source), Quick start, Coding Agent Integration (Skill + MCP), CLI Reference, Search options, MCP tool reference, Configuration (user/project settings), Supported languages, and a short "Differences from the Python build" note (local-only embeddings; no custom Python chunkers). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 1649b5e commit 6529a56

3 files changed

Lines changed: 218 additions & 121 deletions

File tree

rust/PORTING.md

Lines changed: 0 additions & 120 deletions
This file was deleted.

rust/README.md

Lines changed: 217 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,217 @@
1+
# cocoindex-code (Rust) — AST-based semantic code search
2+
3+
A lightweight, effective **(AST-based)** semantic code search tool for your
4+
codebase — the native-Rust build of [`ccc`](https://github.com/cocoindex-io/cocoindex-code).
5+
Built on [CocoIndex](https://github.com/cocoindex-io/cocoindex), the Rust data
6+
transformation engine. Use it from the CLI, or wire it into Claude Code, Codex,
7+
Cursor — any coding agent — via [Skill](#coding-agent-integration) or
8+
[MCP](#mcp-server).
9+
10+
- Instant token savings — let the agent find code by meaning, not grep.
11+
- **Local embeddings, zero setup** — runs fully offline, no API key required.
12+
- **Incremental** — only re-indexes changed files.
13+
14+
## Features
15+
16+
- **Semantic code search** — find relevant code with natural-language queries
17+
when grep falls short.
18+
- **Ultra performant** — a single static binary on top of the Rust
19+
[CocoIndex](https://github.com/cocoindex-io/cocoindex) engine; only changed
20+
files are re-indexed.
21+
- **Multi-language** — Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#,
22+
SQL, Shell, and more (tree-sitter).
23+
- **Embedded** — a sqlite-vec index file; no database to run.
24+
- **Local embeddings** — sentence-transformers via [fastembed](https://github.com/Anush008/fastembed-rs)
25+
(ONNX), no API key, no Python.
26+
27+
## Install
28+
29+
The Rust build is compiled from source. It depends on the CocoIndex SDK as a
30+
sibling checkout, so clone both repos side by side:
31+
32+
```bash
33+
git clone https://github.com/cocoindex-io/cocoindex
34+
git clone -b rust https://github.com/cocoindex-io/cocoindex-code
35+
36+
cd cocoindex-code/rust
37+
cargo build --release
38+
39+
# put the binary on your PATH (or use `cargo install --path .`)
40+
install -m 0755 target/release/ccc ~/.local/bin/ccc
41+
ccc --help
42+
```
43+
44+
Embeddings are **local-only** (fastembed/ONNX) — no cloud provider or API key is
45+
required or supported in this build. The default model is
46+
[`BAAI/bge-small-en-v1.5`](https://huggingface.co/BAAI/bge-small-en-v1.5); any
47+
model in fastembed's registry can be selected (see [Configuration](#configuration)).
48+
49+
## Quick start
50+
51+
```bash
52+
ccc init # initialize project (creates settings)
53+
ccc index # build the index
54+
ccc search "authentication logic" # search!
55+
```
56+
57+
The background daemon starts automatically on first use and keeps the embedding
58+
model warm.
59+
60+
> **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you
61+
> can skip straight to indexing.
62+
63+
## Coding Agent Integration
64+
65+
### Skill
66+
67+
Install the `ccc` skill so your coding agent automatically uses semantic search
68+
when it helps:
69+
70+
```bash
71+
npx skills add cocoindex-io/cocoindex-code
72+
```
73+
74+
The skill teaches the agent to initialize, index, and search on its own, and to
75+
keep the index fresh as you work. Ask it to search the codebase — e.g. *"find how
76+
user sessions are managed"* — or invoke it directly with `/ccc`. Requires the
77+
`ccc` binary on your `PATH` (see [Install](#install)).
78+
79+
### MCP Server
80+
81+
Alternatively, run `ccc` as an MCP server over stdio:
82+
83+
```bash
84+
# Claude Code
85+
claude mcp add cocoindex-code -- ccc mcp
86+
87+
# Codex
88+
codex mcp add cocoindex-code -- ccc mcp
89+
```
90+
91+
Once configured, the agent decides when semantic search is helpful — finding code
92+
by description, exploring unfamiliar code, or locating implementations without
93+
knowing exact names.
94+
95+
<details>
96+
<summary>MCP Tool Reference</summary>
97+
98+
Running as an MCP server (`ccc mcp`) exposes one tool:
99+
100+
**`search`** — search the codebase by semantic similarity.
101+
102+
```
103+
search(
104+
query: str, # natural-language query or code snippet
105+
limit: int = 5, # max results (1–100)
106+
offset: int = 0, # pagination offset
107+
refresh_index: bool = True, # refresh the index before querying
108+
languages: list[str] | None = None, # filter by language, e.g. ["python","rust"]
109+
paths: list[str] | None = None, # filter by path glob, e.g. ["src/utils/*"]
110+
)
111+
```
112+
113+
Returns matching chunks with file path, language, code, line numbers, and a
114+
similarity score.
115+
</details>
116+
117+
## CLI Reference
118+
119+
| Command | Description |
120+
|---------|-------------|
121+
| `ccc init` | Initialize a project — creates settings files, adds `.cocoindex_code/` to `.gitignore` |
122+
| `ccc index` | Build or update the index (auto-inits if needed) |
123+
| `ccc search <query>` | Semantic search across the codebase |
124+
| `ccc status` | Show index stats (chunk count, file count, language breakdown) |
125+
| `ccc mcp` | Run as an MCP server in stdio mode |
126+
| `ccc doctor` | Run diagnostics — settings, daemon, model, file matching, index health (`-v` for detail) |
127+
| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. |
128+
| `ccc daemon status` | Show daemon version, uptime, and loaded projects |
129+
| `ccc daemon restart` | Restart the background daemon |
130+
| `ccc daemon stop` | Stop the daemon |
131+
132+
### Search options
133+
134+
```bash
135+
ccc search database schema # basic search
136+
ccc search --lang python --lang markdown schema # filter by language
137+
ccc search --path 'src/utils/*' query handler # filter by path glob
138+
ccc search --offset 10 --limit 5 database schema # pagination
139+
ccc search --refresh database schema # update index first, then search
140+
```
141+
142+
By default `ccc search` scopes results to your current working directory
143+
(relative to the project root). Use `--path` to override.
144+
145+
## Configuration
146+
147+
Configuration lives in two YAML files, both created by `ccc init`.
148+
149+
### User settings (`~/.cocoindex_code/global_settings.yml`)
150+
151+
Shared across all projects — controls the embedding model.
152+
153+
```yaml
154+
embedding:
155+
provider: sentence-transformers # local fastembed (the only supported provider)
156+
model: BAAI/bge-small-en-v1.5 # any model in fastembed's registry
157+
158+
# Optional asymmetric-retrieval knobs, applied separately to indexing vs query.
159+
# Accepted key: prompt_name (sentence-transformers).
160+
# indexing_params:
161+
# prompt_name: passage
162+
# query_params:
163+
# prompt_name: query
164+
```
165+
166+
> Set `COCOINDEX_CODE_DIR` to place `global_settings.yml` somewhere other than
167+
> `~/.cocoindex_code/`.
168+
169+
Models are resolved against fastembed's registry by name, then by suffix — so
170+
`sentence-transformers/all-MiniLM-L6-v2` resolves. Cloud / LiteLLM providers are
171+
not part of this build; a `provider: litellm` config loads but fails with a clear
172+
message pointing at the local provider.
173+
174+
### Project settings (`<project>/.cocoindex_code/settings.yml`)
175+
176+
Per-project — controls which files are indexed.
177+
178+
```yaml
179+
include_patterns:
180+
- "**/*.py"
181+
- "**/*.ts"
182+
- "**/*.rs"
183+
- "**/*.go"
184+
# ... sensible defaults for 28+ file types
185+
186+
exclude_patterns:
187+
- "**/.*" # hidden directories
188+
- "**/node_modules"
189+
- "**/dist"
190+
# ...
191+
192+
language_overrides:
193+
- ext: inc # treat .inc files as PHP
194+
lang: php
195+
```
196+
197+
Include/exclude globs additionally honor nested `.gitignore` files.
198+
`.cocoindex_code/` is added to `.gitignore` during `init`.
199+
200+
## Supported languages
201+
202+
Tree-sitter–based chunking for Python, JavaScript/TypeScript, Rust, Go, Java,
203+
C/C++, C#, Ruby, PHP, Swift, Kotlin, Scala, SQL, Shell, Markdown, and more.
204+
Unrecognized text files are indexed with a generic recursive splitter.
205+
206+
## Differences from the Python build
207+
208+
This native build targets feature parity with the Python `ccc` for day-to-day
209+
use; two things differ today:
210+
211+
- **Embeddings are local-only** (fastembed). There is no LiteLLM / cloud-provider
212+
option, and the default model is `BAAI/bge-small-en-v1.5`.
213+
- **Custom Python chunkers** (`chunkers:` in project settings) are not supported —
214+
the config still parses, but the built-in tree-sitter splitter is used.
215+
216+
Index databases are interchangeable: `ccc search` works against an index built by
217+
the Python tool, and vice versa.

rust/tests/e2e_advanced.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ echo "### F. Real Rust codebase (the port's own src) — multi-language"
103103
P="$ROOT/realrust"; mkdir -p "$P/src"
104104
cp "$REPO"/rust/src/*.rs "$P/src/"
105105
cp "$REPO"/rust/Cargo.toml "$P/"
106-
cp "$REPO"/rust/PORTING.md "$P/"
106+
cp "$REPO"/rust/README.md "$P/"
107107
cd "$P"; $BIN init >/dev/null 2>&1; rr=$($BIN index 2>&1 | grep -E "rust:|toml:|markdown:")
108108
has "indexed rust files" "rust:" "$rr"
109109
has "indexed toml" "toml:" "$rr"

0 commit comments

Comments
 (0)