Skip to content

Commit 651343e

Browse files
authored
docs: Update semble docs (#25)
1 parent 3b0342d commit 651343e

7 files changed

Lines changed: 344 additions & 189 deletions

File tree

astro.config.mjs

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,16 @@ gtag('config', 'G-LQWDNXKF2X');`,
7373
{ label: 'Integrations', link: '/packages/model2vec/integrations/' },
7474
],
7575
},
76+
{
77+
label: 'Semble',
78+
items: [
79+
{ label: 'Introduction', link: '/packages/semble/introduction/' },
80+
{ label: 'Installation', link: '/packages/semble/installation/' },
81+
{ label: 'MCP Server', link: '/packages/semble/mcp-server/' },
82+
{ label: 'CLI / AGENTS.md',link: '/packages/semble/usage/' },
83+
{ label: 'Benchmarks', link: '/packages/semble/benchmarks/' },
84+
],
85+
},
7686
{
7787
label: 'SemHash',
7888
items: [
@@ -94,16 +104,6 @@ gtag('config', 'G-LQWDNXKF2X');`,
94104
{ label: 'Supported Backends', link: '/packages/vicinity/supported-backends/' },
95105
],
96106
},
97-
{
98-
label: 'Semble',
99-
items: [
100-
{ label: 'Introduction', link: '/packages/semble/introduction/' },
101-
{ label: 'Installation', link: '/packages/semble/installation/' },
102-
{ label: 'Usage', link: '/packages/semble/usage/' },
103-
{ label: 'MCP Server', link: '/packages/semble/mcp-server/' },
104-
{ label: 'Benchmarks', link: '/packages/semble/benchmarks/' },
105-
],
106-
},
107107
{
108108
label: 'Tokenlearn',
109109
items: [

src/content/docs/packages/overview/index.mdx

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,28 @@ tableOfContents: false
2828
</div>
2929
</article>
3030

31+
<article class="overview-item">
32+
<div class="overview-item-top">
33+
<img class="overview-icon" src="/images/logos/semble_logo.webp" alt="Semble" loading="lazy" />
34+
<div class="overview-copy">
35+
<h2><a href="/packages/semble/introduction/">Semble</a></h2>
36+
<p>Fast and accurate code search for agents.</p>
37+
</div>
38+
</div>
39+
<div class="overview-item-bottom">
40+
<div class="overview-tags">
41+
<span class="overview-tag">Code Search</span>
42+
<span class="overview-tag">MCP Server</span>
43+
<span class="overview-tag">Agents</span>
44+
<span class="overview-tag">Python</span>
45+
</div>
46+
<div class="overview-actions">
47+
<a class="overview-link overview-link-primary" href="/packages/semble/introduction/">Docs</a>
48+
<a class="overview-link" href="https://github.com/minishlab/semble">Repo</a>
49+
</div>
50+
</div>
51+
</article>
52+
3153
<article class="overview-item">
3254
<div class="overview-item-top">
3355
<img class="overview-icon" src="/images/logos/semhash_logo.webp" alt="SemHash" loading="lazy" />
@@ -71,28 +93,6 @@ tableOfContents: false
7193
</div>
7294
</article>
7395

74-
<article class="overview-item">
75-
<div class="overview-item-top">
76-
<img class="overview-icon" src="/images/logos/semble_logo.webp" alt="Semble" loading="lazy" />
77-
<div class="overview-copy">
78-
<h2><a href="/packages/semble/introduction/">Semble</a></h2>
79-
<p>Fast and accurate code search for agents.</p>
80-
</div>
81-
</div>
82-
<div class="overview-item-bottom">
83-
<div class="overview-tags">
84-
<span class="overview-tag">Code Search</span>
85-
<span class="overview-tag">MCP Server</span>
86-
<span class="overview-tag">Agents</span>
87-
<span class="overview-tag">Python</span>
88-
</div>
89-
<div class="overview-actions">
90-
<a class="overview-link overview-link-primary" href="/packages/semble/introduction/">Docs</a>
91-
<a class="overview-link" href="https://github.com/minishlab/semble">Repo</a>
92-
</div>
93-
</div>
94-
</article>
95-
9696
<article class="overview-item">
9797
<div class="overview-item-top">
9898
<img class="overview-icon" src="/images/logos/tokenlearn_logo.webp" alt="Tokenlearn" loading="lazy" />
Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,37 +1,66 @@
11
---
22
title: Installation
3-
description: How to install Semble
3+
description: Install Semble, set up the MCP server, and scaffold a sub-agent
44
sidebar:
55
icon: seti:config
66
---
77

8+
There are three things you can do to install Semble, which are independent of eachother. We recommend doing all three, but you can pick and choose based on your needs:
9+
10+
1. [Install Semble](#1-install-semble) (for the CLI and AGENTS.md flow).
11+
2. [Set up the MCP server](#2-mcp-server) (so your top-level agent can call Semble as a tool).
12+
3. [Install the sub-agent](#3-sub-agent) (so sub-agents, which can't call MCP tools, can still search).
13+
814
## Requirements
915

10-
- Python 3.10 or higher
16+
- Python 3.10 or higher.
17+
- [uv](https://docs.astral.sh/uv/getting-started/installation/) (recommended for all three flows).
1118
- No GPU, API keys, or external services required. Runs fully on CPU.
1219

13-
## Install
20+
## 1. Install Semble
21+
22+
Install Semble with [`uv`](https://docs.astral.sh/uv/) (recommended) or `pip`:
1423

1524
```bash
16-
pip install semble
25+
uv tool install semble # Recommended
26+
pip install semble # Or with pip
1727
```
1828

19-
Or with [uv](https://docs.astral.sh/uv/):
29+
This gives you the [`semble` CLI](/packages/semble/usage/).
30+
31+
### Optional: wire it into AGENTS.md
32+
33+
Once installed, drop the [AGENTS.md snippet](/packages/semble/usage/#agentsmd-snippet) into your `AGENTS.md`, `CLAUDE.md`, `GEMINI.md`, or equivalent. This teaches any agent (including sub-agents) when to reach for `semble` instead of grep, and is the only setup needed for harnesses without MCP support.
34+
35+
## 2. MCP Server
36+
37+
Install Semble as an [MCP server](/packages/semble/mcp-server/) for Claude Code:
2038

2139
```bash
22-
uv add semble
40+
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
2341
```
2442

25-
## MCP Server Extra
43+
For other agents (Cursor, Codex, OpenCode, VS Code, Copilot CLI, Windsurf, Gemini, Kiro, Zed), see [MCP Server](/packages/semble/mcp-server/) for the per-harness config snippet.
2644

27-
To use Semble as an [MCP server](/packages/semble/mcp-server/) with agents like Claude Code, Cursor, or OpenCode, install the `mcp` extra:
45+
## 3. Sub-agent
46+
47+
Sub-agents typically cannot call MCP tools directly. To give a sub-agent access to Semble, run `semble init` once in your project root to scaffold a dedicated search sub-agent for your harness:
2848

2949
```bash
30-
pip install "semble[mcp]"
50+
semble init # Claude Code → .claude/agents/semble-search.md
51+
semble init --agent gemini # Gemini CLI → .gemini/agents/semble-search.md
52+
semble init --agent cursor # Cursor → .cursor/agents/semble-search.md
53+
semble init --agent opencode # OpenCode → .opencode/agents/semble-search.md
54+
semble init --agent copilot # Copilot CLI → .github/agents/semble-search.md
55+
semble init --agent kiro # Kiro → .kiro/agents/semble-search.md
3156
```
3257

33-
Or, use [uvx](https://docs.astral.sh/uv/guides/tools/) to run it without a permanent install:
58+
If `semble` is not on `$PATH`, prefix the command with `uvx --from "semble[mcp]"`.
59+
60+
## Updating Semble
3461

3562
```bash
36-
uvx --from "semble[mcp]" semble
63+
uv tool upgrade semble # with uv
64+
pip install --upgrade semble # with pip
65+
uv cache clean semble # for MCP users (restart your MCP client after)
3766
```

src/content/docs/packages/semble/introduction.mdx

Lines changed: 43 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -5,65 +5,68 @@ sidebar:
55
icon: open-book
66
---
77

8-
[Semble](https://github.com/MinishLab/semble) is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read and cutting latency on every step. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](/packages/semble/benchmarks/)). Everything runs on CPU with no API keys, GPU, or external services.
8+
[Semble](https://github.com/MinishLab/semble) is a code search library built for agents. It returns the exact code snippets they need instantly, using ~98% fewer tokens than grep+read. Indexing and searching a full codebase end-to-end takes under a second, with ~200x faster indexing and ~10x faster queries than a code-specialized transformer, at 99% of its retrieval quality (see [benchmarks](/packages/semble/benchmarks/)). Everything runs on CPU with no API keys, GPU, or external services. Run it as an [MCP server](/packages/semble/mcp-server/) or call it from the shell via [AGENTS.md](/packages/semble/usage/) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo.
99

10-
Run it as an [MCP server](/packages/semble/mcp-server/) and any agent (Claude Code, Cursor, Codex, OpenCode, etc.) gets instant access to any repo, cloned and indexed on demand.
10+
## Quickstart
1111

12-
## Quick Start
12+
Your agent queries Semble in natural language (e.g. `"How is authentication handled?"`) and gets back only the relevant code snippets, without grepping or reading full files. You can set it up as an MCP server or via AGENTS.md. First, install [uv](https://docs.astral.sh/uv/getting-started/installation/) if you don't have it yet.
1313

14-
Install Semble:
1514

16-
```bash
17-
pip install semble # Install with pip
18-
uv add semble # Install with uv
19-
```
20-
21-
Index a repo and search it:
15+
### MCP (Claude Code)
2216

23-
```python
24-
from semble import SembleIndex
17+
Add Semble to Claude Code (requires [uv](https://docs.astral.sh/uv/getting-started/installation/)):
2518

26-
# Index a local directory
27-
index = SembleIndex.from_path("./my-project")
19+
```bash
20+
claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
21+
```
2822

29-
# Index a remote git repository
30-
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")
23+
Using another agent harness? See [MCP Server](/packages/semble/mcp-server/) for per-agent setup.
3124

32-
# Search with a natural-language or code query
33-
results = index.search("save model to disk", top_k=3)
25+
### Bash / AGENTS.md
3426

35-
# Find code similar to a specific result
36-
related = index.find_related(results[0], top_k=3)
27+
[Install Semble](/packages/semble/installation/), then add the [AGENTS.md snippet](/packages/semble/usage/#agentsmd-snippet) to your `AGENTS.md`, `CLAUDE.md`, or equivalent. This works for any agent and is the only option for sub-agents, which typically cannot call MCP tools directly.
3728

38-
# Each result exposes the matched chunk
39-
result = results[0]
40-
result.chunk.file_path # "model2vec/model.py"
41-
result.chunk.start_line # 127
42-
result.chunk.end_line # 150
43-
result.chunk.content # "def save_pretrained(self, path: PathLike, ..."
29+
```bash
30+
uv tool install semble # Install with uv (recommended)
31+
pip install semble # Or install with pip
4432
```
4533

34+
4635
## Main Features
4736

48-
- **Fast**: indexes a repo in ~250 ms and answers queries in ~1.5 ms, all on CPU.
37+
- **Fast**: indexes an average repo in ~250 ms and answers queries in ~1.5 ms, all on CPU.
4938
- **Accurate**: NDCG@10 of 0.854 on the [benchmarks](/packages/semble/benchmarks/), on par with code-specialized transformer models at a fraction of the size and cost.
50-
- **Local and remote**: pass a local path or a git URL; indexes are cached for the session.
51-
- **MCP server**: drop-in tool for Claude Code, Cursor, Codex, OpenCode, and any other MCP-compatible agent.
39+
- **Token-efficient**: returns only the relevant chunks, using [~98% fewer tokens than grep+read](/packages/semble/benchmarks/#token-efficiency).
5240
- **Zero setup**: runs on CPU with no API keys, GPU, or external services required.
41+
- **MCP server**: works with Claude Code, Cursor, Codex, OpenCode, VS Code, and any other MCP-compatible agent.
42+
- **Local and remote**: pass a local path or a git URL.
5343

54-
## How It Works
55-
56-
Semble splits each file into code-aware chunks using [Chonkie](https://github.com/chonkie-inc/chonkie), then scores every query with two complementary retrievers:
44+
## How it works
5745

58-
- **Semantic**: static [Model2Vec](https://github.com/MinishLab/model2vec) embeddings from the code-specialized [potion-code-16M](https://huggingface.co/minishlab/potion-code-16M) model.
59-
- **Lexical**: [BM25](https://github.com/xhluca/bm25s) for exact matches on identifiers and API names.
46+
Semble splits each file into code-aware chunks using [tree-sitter](https://github.com/tree-sitter/py-tree-sitter), then scores every query against the chunks with two complementary retrievers: static [Model2Vec](https://github.com/MinishLab/model2vec) embeddings using the code-specialized [potion-code-16M](https://huggingface.co/minishlab/potion-code-16M) model for semantic similarity, and [BM25](https://github.com/xhluca/bm25s) for lexical matches on identifiers and API names. The two score lists are fused with Reciprocal Rank Fusion (RRF).
6047

61-
The two score lists are fused with Reciprocal Rank Fusion (RRF) and then reranked with a set of code-aware signals:
48+
After fusing, results are reranked with a set of code-aware signals:
6249

63-
- **Adaptive weighting**: symbol-like queries (`Foo::bar`, `getUserById`) get more lexical weight; natural-language queries stay balanced.
64-
- **Definition boosts**: a chunk that defines the queried symbol (`class`, `def`, `func`) ranks above chunks that merely reference it.
65-
- **Identifier stems**: query tokens are stemmed and matched against identifier stems, so `parse config` boosts chunks containing `parseConfig`, `ConfigParser`, or `config_parser`.
66-
- **File coherence**: when multiple chunks from the same file match, the file is boosted so the top result reflects broad file-level relevance.
67-
- **Noise penalties**: test files, `compat`/`legacy` shims, example code, and `.d.ts` stubs are down-ranked so canonical implementations surface first.
50+
- **Adaptive weighting.** Symbol-like queries (`Foo::bar`, `_private`, `getUserById`) get more lexical weight, while natural-language queries stay balanced between semantic and lexical retrievers.
51+
- **Definition boosts.** A chunk that defines the queried symbol (a `class`, `def`, `func`, etc.) is ranked above chunks that merely reference it.
52+
- **Identifier stems.** Query tokens are stemmed and matched against identifier stems in a chunk, giving an additional weight to chunks that contain them. For example, querying `parse config` boosts chunks containing `parseConfig`, `ConfigParser`, or `config_parser`.
53+
- **File coherence.** When multiple chunks from the same file match the query, the file is boosted so the top result reflects broad file-level relevance rather than a single out-of-context chunk.
54+
- **Noise penalties.** Test files, `compat/`/`legacy/` shims, example code, and `.d.ts` declaration stubs are down-ranked so canonical implementations surface first.
6855

6956
Because the embedding model is static with no transformer forward pass at query time, all of this runs in milliseconds on CPU.
57+
58+
## Citing
59+
60+
If you use Semble in your research, please cite the following:
61+
62+
```bibtex
63+
@software{minishlab2026semble,
64+
author = {{van Dongen}, Thomas and Stephan Tulkens},
65+
title = {Semble: Fast and Accurate Code Search for Agents},
66+
year = {2026},
67+
publisher = {Zenodo},
68+
doi = {10.5281/zenodo.19785932},
69+
url = {https://github.com/MinishLab/semble},
70+
license = {MIT}
71+
}
72+
```

0 commit comments

Comments
 (0)