You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is built on top of [CocoIndex v1](https://cocoindex.io/docs-v1/llms.txt).
2
+
3
+
4
+
## Build and Test Commands
5
+
6
+
This project uses [uv](https://docs.astral.sh/uv/) for project management.
7
+
8
+
```bash
9
+
uv run mypy .# Type check Python code
10
+
uv run pytest tests/ # Run Python tests
11
+
```
12
+
13
+
## Code Conventions
14
+
15
+
### Internal vs External Modules
16
+
17
+
We distinguish between **internal modules** (under packages with `_` prefix, e.g. `_internal.*` or `connectors.*._source`) and **external modules** (which users can directly import).
18
+
19
+
**External modules** (user-facing, e.g. `cocoindex/ops/sentence_transformers.py`):
20
+
21
+
* Be strict about not leaking implementation details
22
+
* Use `__all__` to explicitly list public exports
23
+
* Prefix ALL non-public symbols with `_`, including:
24
+
* Standard library imports: `import threading as _threading`, `import typing as _typing`
25
+
* Third-party imports: `import numpy as _np`, `from numpy.typing import NDArray as _NDArray`
26
+
* Internal package imports: `from cocoindex.resources import schema as _schema`
27
+
* Exception: `TYPE_CHECKING` imports for type hints don't need prefixing
* Less strict since users shouldn't import these directly
32
+
* Standard library and internal imports don't need underscore prefix
33
+
* Only prefix symbols that are truly private to the module itself (e.g. `_context_var` for a module-private ContextVar)
34
+
35
+
### General principles (also covered by `/review-changes`)
36
+
37
+
-**Top-level imports.** Defer to in-function only for a real circular dependency or a heavy import that isn't always needed.
38
+
-**Specific types over `Any`.** When a value enters as a weaker form (`str`, `Any`), convert to the strong type at the earliest point. Don't propagate the weak form.
39
+
-**`NamedTuple`/small dataclass for multi-value returns.** Access fields by name at call sites.
40
+
-**Single source of truth.** When the same value or logic appears in multiple places, consolidate it.
41
+
-**Delete dead code and dead config.** When a change makes something unreachable, the code, the tests, and the knobs all go.
42
+
-**Honest names.** The name describes what the code does today.
43
+
44
+
### Testing Guidelines
45
+
46
+
We prefer end-to-end tests on user-facing APIs, over unit tests on smaller internal functions. With this said, there're cases where unit tests are necessary, e.g. for internal logic with various situations and edge cases, in which case it's usually easier to cover various scenarios with unit tests.
47
+
48
+
When tests fail, fix the underlying issue. Don't skip, ignore, or exclude to get a green result.
Copy file name to clipboardExpand all lines: README.md
+163-8Lines changed: 163 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,10 @@ Two install styles — they mirror the Docker image variants of the same names:
61
61
62
62
Next, set up your [coding agent integration](#coding-agent-integration) — or jump to [Manual CLI Usage](#manual-cli-usage) if you prefer direct control.
63
63
64
+
Docs:
65
+
-[Git Layered Indexing](./docs/layered-indexing.md): configure reusable `base > branch > dirty` Git layers for root clones and linked worktrees.
66
+
-[Docker Layered Indexing](./docs/docker-layered-indexing.md): run the layered daemon in Docker with persistent native state.
67
+
64
68
## Coding Agent Integration
65
69
66
70
### Skill (Recommended)
@@ -162,6 +166,16 @@ The background daemon starts automatically on first use.
162
166
163
167
> **Tip:**`ccc index` auto-initializes if you haven't run `ccc init` yet, so you can skip straight to indexing.
164
168
169
+
For Git repositories, you can configure layered indexing once from the root clone:
170
+
171
+
```bash
172
+
ccc init --base main # share a base layer across linked worktrees
173
+
ccc index # builds base + branch + dirty layers as needed
174
+
ccc overlay status # inspect the current layer stack
175
+
```
176
+
177
+
Linked worktrees reuse the same daemon-owned base layer and only index branch and dirty deltas. See [Git Layered Indexing](./docs/layered-indexing.md) for the full configuration model.
178
+
165
179
### CLI Reference
166
180
167
181
| Command | Description |
@@ -170,6 +184,8 @@ The background daemon starts automatically on first use.
170
184
|`ccc index`| Build or update the index (auto-inits if needed). Shows streaming progress. |
171
185
|`ccc search <query>`| Semantic search across the codebase |
172
186
|`ccc status`| Show index stats (chunk count, file count, language breakdown) |
187
+
|`ccc overlay status`| Inspect Git layered indexing state for the current worktree |
188
+
|`ccc overlay prune`| Prune expired branch and dirty layers |
173
189
|`ccc mcp`| Run as MCP server in stdio mode |
174
190
|`ccc doctor`| Run diagnostics — checks settings, daemon, model, file matching, and index health |
175
191
|`ccc reset`| Delete index databases. `--all` also removes settings. `-f` skips confirmation. |
@@ -185,6 +201,7 @@ ccc search --lang python --lang markdown schema # filter by language
185
201
ccc search --path 'src/utils/*' query handler # filter by path
Or grab [`docker/docker-compose.yml`](./docker/docker-compose.yml) and run `docker compose up -d` next to it (works on any shell, including Windows cmd / PowerShell).
233
250
234
-
By default your home directory is mounted into the container (set
235
-
`COCOINDEX_HOST_WORKSPACE` to narrow this to a specific code folder). Index
236
-
data and the embedding model cache persist in a Docker volume across
237
-
restarts. Your global settings file at `$HOME/.cocoindex_code/global_settings.yml`
238
-
is visible and editable on the host; edits take effect on your next `ccc` command.
251
+
By default your home directory is mounted into the container. For team setups,
252
+
prefer a narrower mount such as `COCOINDEX_HOST_WORKSPACE=$HOME/src` or one
253
+
repo path. Index data, daemon Git-layer state, and the embedding model cache
254
+
persist in the `cocoindex-data` Docker volume under `/var/cocoindex`. Your
255
+
global settings file at `$HOME/.cocoindex_code/global_settings.yml` is visible
256
+
and editable on the host; edits take effect on your next `ccc` command.
239
257
240
258
> **Pick a different image:** set `COCOINDEX_CODE_IMAGE` to override the
241
259
> default. For example, the `:full` variant or GHCR:
@@ -254,6 +272,9 @@ docker run -d --name cocoindex-code \
-**Semantic Code Search**: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.
513
+
-**Git Layered Indexing**: Reuse a shared base index across root clones and linked worktrees, then index only branch and dirty deltas. Configure it with `ccc init --base main`; see [Git Layered Indexing](./docs/layered-indexing.md).
376
514
-**Ultra Performant**: ⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex). Only re-indexes changed files for fast updates.
377
515
-**Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.
378
516
-**Embedded**: Portable and just works, no database setup required!
See [`src/cocoindex_code/chunking.py`](./src/cocoindex_code/chunking.py) for the public types and [`tests/example_toml_chunker.py`](./tests/example_toml_chunker.py) for a complete example.
495
633
634
+
### Git Layered Indexing Configuration
635
+
636
+
For Git repositories, `ccc init --base <ref>` stores a repository-level overlay
637
+
policy in daemon state. The checkout-local `settings.yml` still controls file
638
+
matching and chunking, while daemon state controls the shared base ref used by
639
+
root clones and linked worktrees.
640
+
641
+
```bash
642
+
ccc init --base main
643
+
ccc index
644
+
ccc overlay status
645
+
```
646
+
647
+
The daemon stores durable layer metadata under `COCOINDEX_CODE_STATE_DIR` and
648
+
uses stable hash IDs, so moving a repository or linked worktree does not
649
+
invalidate reusable base and branch layers. See [Git Layered Indexing](./docs/layered-indexing.md) for details.
650
+
496
651
## Embedding Models
497
652
498
653
With the `[full]` extra installed, `ccc init` defaults to a local SentenceTransformers model ([Snowflake/snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)) — no API key required. To use a different model, edit `~/.cocoindex_code/global_settings.yml`.
0 commit comments