Skip to content

Commit 6fd5eb4

Browse files
committed
Document Git layered indexing
1 parent 542e1a0 commit 6fd5eb4

5 files changed

Lines changed: 414 additions & 0 deletions

File tree

README.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,10 @@ Two install styles — they mirror the Docker image variants of the same names:
6161

6262
Next, set up your [coding agent integration](#coding-agent-integration) — or jump to [Manual CLI Usage](#manual-cli-usage) if you prefer direct control.
6363

64+
Docs:
65+
- [Git Layered Indexing](./docs/layered-indexing.md): configure reusable `base > branch > dirty` Git layers for root clones and linked worktrees.
66+
- [Docker Layered Indexing](./docs/docker-layered-indexing.md): run the layered daemon in Docker with persistent native state.
67+
6468
## Coding Agent Integration
6569

6670
### Skill (Recommended)
@@ -162,6 +166,16 @@ The background daemon starts automatically on first use.
162166

163167
> **Tip:** `ccc index` auto-initializes if you haven't run `ccc init` yet, so you can skip straight to indexing.
164168
169+
For Git repositories, you can configure layered indexing once from the root clone:
170+
171+
```bash
172+
ccc init --base main # share a base layer across linked worktrees
173+
ccc index # builds base + branch + dirty layers as needed
174+
ccc overlay status # inspect the current layer stack
175+
```
176+
177+
Linked worktrees reuse the same daemon-owned base layer and only index branch and dirty deltas. See [Git Layered Indexing](./docs/layered-indexing.md) for the full configuration model.
178+
165179
### CLI Reference
166180

167181
| Command | Description |
@@ -170,6 +184,8 @@ The background daemon starts automatically on first use.
170184
| `ccc index` | Build or update the index (auto-inits if needed). Shows streaming progress. |
171185
| `ccc search <query>` | Semantic search across the codebase |
172186
| `ccc status` | Show index stats (chunk count, file count, language breakdown) |
187+
| `ccc overlay status` | Inspect Git layered indexing state for the current worktree |
188+
| `ccc overlay prune` | Prune expired branch and dirty layers |
173189
| `ccc mcp` | Run as MCP server in stdio mode |
174190
| `ccc doctor` | Run diagnostics — checks settings, daemon, model, file matching, and index health |
175191
| `ccc reset` | Delete index databases. `--all` also removes settings. `-f` skips confirmation. |
@@ -185,6 +201,7 @@ ccc search --lang python --lang markdown schema # filter by language
185201
ccc search --path 'src/utils/*' query handler # filter by path
186202
ccc search --offset 10 --limit 5 database schema # pagination
187203
ccc search --refresh database schema # update index first, then search
204+
ccc index --base release/1.2 # override Git overlay base ref once
188205
```
189206

190207
By default, `ccc search` scopes results to your current working directory (relative to the project root). Use `--path` to override.
@@ -422,6 +439,31 @@ daemon state in `/var/cocoindex/state`. That lets the local daemon Git-layer
422439
feature reuse daemon-owned layer metadata and materialized layer sources across
423440
projects while keeping transient sockets under `/var/run/cocoindex_code`.
424441

442+
For layered indexing in Docker, initialize the base ref from the root clone and
443+
then use linked worktrees through the same wrapper/container:
444+
445+
```bash
446+
cd $HOME/src/github/cocoindex-io/cocoindex-code
447+
ccc init --base main
448+
ccc index
449+
450+
git worktree add ../cocoindex-code.worktrees/feature-1 -b feature-1 main
451+
cd ../cocoindex-code.worktrees/feature-1
452+
ccc index
453+
ccc overlay status
454+
```
455+
456+
Mount a workspace parent that contains both the root clone and linked
457+
worktrees. For example:
458+
459+
```bash
460+
COCOINDEX_HOST_WORKSPACE=$HOME/src/github/cocoindex-io \
461+
docker compose -f docker/docker-compose.yml up -d
462+
```
463+
464+
See [Docker Layered Indexing](./docs/docker-layered-indexing.md) for the full
465+
Docker setup and troubleshooting guide.
466+
425467
### Configuration via environment variables
426468

427469
Pass configuration to `docker run` / compose with `-e`:
@@ -455,6 +497,11 @@ Supported Docker environment variables:
455497
| `COCOINDEX_CODE_DB_PATH_MAPPING` | DB/index storage remapping, default `/workspace=/var/cocoindex/db`. |
456498
| `PUID`, `PGID` | Linux UID/GID used to chown Docker-managed paths and write host-owned workspace files. |
457499

500+
`COCOINDEX_CODE_STATE_DIR` is where repository/worktree metadata, overlay
501+
policy, layer manifests, and materialized layer sources are stored. Keep it on
502+
the persistent Docker volume if you want base layers to survive container
503+
recreation.
504+
458505
### Build the image locally
459506

460507
```bash
@@ -463,6 +510,7 @@ docker build -t cocoindex-code:local -f docker/Dockerfile .
463510

464511
## Features
465512
- **Semantic Code Search**: Find relevant code using natural language queries when grep doesn't work well, and save tokens immediately.
513+
- **Git Layered Indexing**: Reuse a shared base index across root clones and linked worktrees, then index only branch and dirty deltas. Configure it with `ccc init --base main`; see [Git Layered Indexing](./docs/layered-indexing.md).
466514
- **Ultra Performant**: ⚡ Built on top of ultra performant [Rust indexing engine](https://github.com/cocoindex-io/cocoindex). Only re-indexes changed files for fast updates.
467515
- **Multi-Language Support**: Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, C#, SQL, Shell, and more.
468516
- **Embedded**: Portable and just works, no database setup required!
@@ -583,6 +631,23 @@ def my_chunker(path: Path, content: str) -> tuple[str | None, list[Chunk]]:
583631

584632
See [`src/cocoindex_code/chunking.py`](./src/cocoindex_code/chunking.py) for the public types and [`tests/example_toml_chunker.py`](./tests/example_toml_chunker.py) for a complete example.
585633

634+
### Git Layered Indexing Configuration
635+
636+
For Git repositories, `ccc init --base <ref>` stores a repository-level overlay
637+
policy in daemon state. The checkout-local `settings.yml` still controls file
638+
matching and chunking, while daemon state controls the shared base ref used by
639+
root clones and linked worktrees.
640+
641+
```bash
642+
ccc init --base main
643+
ccc index
644+
ccc overlay status
645+
```
646+
647+
The daemon stores durable layer metadata under `COCOINDEX_CODE_STATE_DIR` and
648+
uses stable hash IDs, so moving a repository or linked worktree does not
649+
invalidate reusable base and branch layers. See [Git Layered Indexing](./docs/layered-indexing.md) for details.
650+
586651
## Embedding Models
587652

588653
With the `[full]` extra installed, `ccc init` defaults to a local SentenceTransformers model ([Snowflake/snowflake-arctic-embed-xs](https://huggingface.co/Snowflake/snowflake-arctic-embed-xs)) — no API key required. To use a different model, edit `~/.cocoindex_code/global_settings.yml`.

docker/docker-compose.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@
1919
# COCOINDEX_CODE_STATE_DIR=/var/cocoindex/state
2020
# COCOINDEX_CODE_RUNTIME_DIR=/var/run/cocoindex_code
2121
# COCOINDEX_CODE_DB_PATH_MAPPING=/workspace=/var/cocoindex/db
22+
#
23+
# For Git layered indexing, mount a workspace parent that contains both the
24+
# root clone and linked worktrees. Keep COCOINDEX_CODE_STATE_DIR on the
25+
# persistent cocoindex-data volume so base/branch layers survive container
26+
# recreation. See docs/docker-layered-indexing.md.
2227

2328
services:
2429
cocoindex-code:

docs/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# CocoIndex Code Docs
2+
3+
- [Git Layered Indexing](./layered-indexing.md): configuration model, stable IDs, layer stack behavior, and commands.
4+
- [Docker Layered Indexing](./docker-layered-indexing.md): Docker-specific state layout, wrapper, and linked worktree setup.

docs/docker-layered-indexing.md

Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Docker Layered Indexing
2+
3+
This guide covers the Docker-specific configuration for Git layered indexing. For the core model, see [Git Layered Indexing](./layered-indexing.md).
4+
5+
## Recommended Compose Setup
6+
7+
Use the repository compose file:
8+
9+
```bash
10+
docker compose -f docker/docker-compose.yml up -d
11+
```
12+
13+
The compose defaults are designed for layered indexing:
14+
15+
```yaml
16+
COCOINDEX_CODE_STATE_DIR: /var/cocoindex/state
17+
COCOINDEX_CODE_RUNTIME_DIR: /var/run/cocoindex_code
18+
COCOINDEX_CODE_DB_PATH_MAPPING: /workspace=/var/cocoindex/db
19+
COCOINDEX_CODE_HOST_PATH_MAPPING: /workspace=$HOME
20+
```
21+
22+
The important split is:
23+
24+
- source code and settings live on the bind mount under `/workspace`
25+
- durable daemon layer metadata lives under `/var/cocoindex/state`
26+
- per-project non-layer DB paths are remapped to `/var/cocoindex/db`
27+
- sockets, PID files, and logs stay under `/var/run/cocoindex_code`
28+
29+
## Mount the Right Workspace
30+
31+
The default compose file mounts your home directory:
32+
33+
```bash
34+
COCOINDEX_HOST_WORKSPACE=$HOME docker compose -f docker/docker-compose.yml up -d
35+
```
36+
37+
For a narrower mount, point it at the parent containing both the root clone and linked worktrees:
38+
39+
```bash
40+
COCOINDEX_HOST_WORKSPACE=$HOME/src/github/cocoindex-io \
41+
docker compose -f docker/docker-compose.yml up -d
42+
```
43+
44+
Example host layout:
45+
46+
```text
47+
$HOME/src/github/cocoindex-io/
48+
cocoindex-code/
49+
cocoindex-code.worktrees/
50+
feature-1/
51+
```
52+
53+
Both paths must be visible inside the same container mount for the daemon to reuse repository and layer state across them.
54+
55+
## Host Wrapper
56+
57+
Use this wrapper so Docker commands resolve the host current directory correctly:
58+
59+
```bash
60+
ccc() {
61+
local container="${COCOINDEX_CODE_CONTAINER_NAME:-cocoindex-code}"
62+
if [ "$(docker inspect -f '{{.State.Running}}' "$container" 2>/dev/null)" != "true" ]; then
63+
echo "cocoindex-code container is not running. Start it with: docker compose -f docker/docker-compose.yml up -d" >&2
64+
return 1
65+
fi
66+
67+
local flags=(-i)
68+
if [ "${1:-}" != "mcp" ] && [ -t 0 ] && [ -t 1 ]; then
69+
flags=(-it)
70+
fi
71+
72+
docker exec "${flags[@]}" \
73+
-e COCOINDEX_CODE_HOST_CWD="$PWD" \
74+
"$container" ccc "$@"
75+
}
76+
```
77+
78+
`COCOINDEX_CODE_HOST_CWD` is required for linked worktrees. It tells the container-side CLI which host directory you are actually in, then the path mapping translates it to `/workspace/...`.
79+
80+
## Layered Workflow in Docker
81+
82+
Root clone:
83+
84+
```bash
85+
cd $HOME/src/github/cocoindex-io/cocoindex-code
86+
ccc init --base main
87+
ccc index
88+
```
89+
90+
Linked worktree:
91+
92+
```bash
93+
git worktree add ../cocoindex-code.worktrees/feature-1 -b feature-1 main
94+
cd ../cocoindex-code.worktrees/feature-1
95+
ccc index
96+
ccc search "query planner"
97+
ccc overlay status
98+
```
99+
100+
The base layer is stored once under `/var/cocoindex/state` and reused by the linked worktree.
101+
102+
## Environment Variables
103+
104+
| Variable | Purpose |
105+
|---|---|
106+
| `COCOINDEX_CODE_IMAGE` | Image used by compose, e.g. `cocoindex/cocoindex-code:full`. |
107+
| `COCOINDEX_CODE_CONTAINER_NAME` | Container name used by compose and the wrapper. |
108+
| `COCOINDEX_HOST_WORKSPACE` | Host directory mounted at `/workspace`. Mount a parent that contains all worktrees you want to share. |
109+
| `COCOINDEX_CODE_HOST_PATH_MAPPING` | Container-to-host path mapping for display and host CWD translation. |
110+
| `COCOINDEX_CODE_HOST_CWD` | Host current directory passed per `docker exec` invocation. |
111+
| `COCOINDEX_CODE_STATE_DIR` | Durable daemon layer state. Default: `/var/cocoindex/state`. |
112+
| `COCOINDEX_CODE_RUNTIME_DIR` | Runtime socket/PID/log directory. Default: `/var/run/cocoindex_code`. |
113+
| `COCOINDEX_CODE_DB_PATH_MAPPING` | Non-layer project DB remapping. Default: `/workspace=/var/cocoindex/db`. |
114+
| `PUID`, `PGID` | Linux-only ownership mapping for bind-mounted files and Docker-managed state. |
115+
116+
## Debugging
117+
118+
Check daemon status:
119+
120+
```bash
121+
docker exec cocoindex-code ccc daemon status
122+
```
123+
124+
Inspect overlay status from the current host directory:
125+
126+
```bash
127+
ccc overlay status
128+
```
129+
130+
Inspect state in the container:
131+
132+
```bash
133+
docker exec -it cocoindex-code sh
134+
ls -R /var/cocoindex/state
135+
```
136+
137+
Reset all Docker-managed index, layer, and cache state:
138+
139+
```bash
140+
docker compose -f docker/docker-compose.yml down -v
141+
```
142+
143+
This preserves your source tree because it is bind-mounted from the host.

0 commit comments

Comments
 (0)