Skip to content

Commit 92d6119

Browse files
authored
feat: unified Docker workspace mount with supervised daemon (#135)
* feat: unified Docker workspace mount with supervised daemon Reshape the Docker experience around a single bind mount and a single named volume. Global settings live on the host under $HOME/.cocoindex_code/ (visible and editable); index data and the model cache persist in one cocoindex-data volume; daemon runtime state stays on the container's native filesystem. CLI and MCP output now show host-side paths via a bidirectional COCOINDEX_CODE_HOST_PATH_MAPPING translator. A shell wrapper that forwards $PWD (COCOINDEX_CODE_HOST_CWD) lets ccc work from any project subdirectory on the host. The daemon tolerates a missing global_settings.yml (starts in no-settings mode) so ccc init's interactive picker works in Docker on first run. A supervisor restart loop in the entrypoint, driven by a new COCOINDEX_CODE_DAEMON_SUPERVISED contract, makes settings-change auto-restart safe — editing global_settings.yml triggers an in-place daemon respawn without taking the container down. Linux ownership alignment via PUID/PGID, gosu privilege drop, and a coco user baked into the image. Release workflow now publishes to both Docker Hub (cocoindex/cocoindex-code) and GHCR (ghcr.io/cocoindex-io/cocoindex-code). Also: - Merge cocoindex-db and cocoindex-model-cache into a single volume - find_parent_with_marker requires .cocoindex_code/settings.yml, so a workspace-root global-only dir doesn't trigger nested-init warnings - New pytest marker `docker_e2e` gates the Docker-backed E2E suite (excluded from default pytest runs) * fix: mypy on Windows for POSIX-only os.getuid/getgid calls
1 parent 0a8fb50 commit 92d6119

26 files changed

Lines changed: 1435 additions & 155 deletions

.dockerignore

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Git + VCS
2+
.git/
3+
.gitignore
4+
5+
# Python
6+
__pycache__/
7+
*.pyc
8+
*.pyo
9+
.mypy_cache/
10+
.pytest_cache/
11+
.ruff_cache/
12+
.venv/
13+
venv/
14+
dist/
15+
build/
16+
*.egg-info/
17+
18+
# IDE / editor
19+
.idea/
20+
.vscode/
21+
*.swp
22+
.DS_Store
23+
24+
# Project artifacts
25+
.cocoindex_code/
26+
specs/
27+
tests/e2e_docker_fixtures/ # test fixtures not needed in image builds
28+
29+
# Docs (the image ships no docs)
30+
*.md
31+
!README.md

.github/workflows/release.yml

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -105,3 +105,47 @@ jobs:
105105
gh release upload
106106
'${{ github.ref_name }}' dist/*
107107
--repo '${{ github.repository }}'
108+
109+
publish-docker:
110+
name: Build & push Docker image to Docker Hub and GHCR
111+
if: github.event_name == 'release'
112+
needs:
113+
- publish-to-pypi
114+
runs-on: ubuntu-latest
115+
environment:
116+
name: docker-hub
117+
url: https://hub.docker.com/r/cocoindex/cocoindex-code
118+
permissions:
119+
contents: read
120+
packages: write
121+
steps:
122+
- uses: actions/checkout@v4
123+
124+
- name: Set up Docker Buildx
125+
uses: docker/setup-buildx-action@v3
126+
127+
- name: Log in to Docker Hub
128+
uses: docker/login-action@v3
129+
with:
130+
registry: docker.io
131+
username: ${{ secrets.DOCKERHUB_USERNAME }}
132+
password: ${{ secrets.DOCKERHUB_TOKEN }}
133+
134+
- name: Log in to GHCR
135+
uses: docker/login-action@v3
136+
with:
137+
registry: ghcr.io
138+
username: ${{ github.actor }}
139+
password: ${{ secrets.GITHUB_TOKEN }}
140+
141+
- name: Build and push to both registries
142+
uses: docker/build-push-action@v5
143+
with:
144+
context: .
145+
file: docker/Dockerfile
146+
push: true
147+
tags: |
148+
cocoindex/cocoindex-code:latest
149+
cocoindex/cocoindex-code:${{ github.ref_name }}
150+
ghcr.io/cocoindex-io/cocoindex-code:latest
151+
ghcr.io/cocoindex-io/cocoindex-code:${{ github.ref_name }}

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,7 @@ src/cocoindex_code/_version.py
4646

4747
# CocoIndex Code (ccc)
4848
/.cocoindex_code/
49+
50+
# Docker E2E fixtures contain a `lib/` dir that the generic Python rule above
51+
# would ignore — keep it tracked.
52+
!tests/e2e_docker_fixtures/**

README.md

Lines changed: 88 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -198,33 +198,79 @@ The recommended approach is a **persistent container**: start it once, and use
198198
`docker exec` to run CLI commands or connect MCP sessions to it. The daemon
199199
inside stays warm across sessions, so the embedding model is loaded only once.
200200

201-
### Step 1 — Start the container
201+
### Quick start — `docker compose up -d`
202+
203+
Grab [`docker/docker-compose.yml`](./docker/docker-compose.yml) from this repo and run:
204+
205+
```bash
206+
# macOS / Windows
207+
docker compose up -d
208+
209+
# Linux (aligns file ownership on bind-mounted paths with your host user)
210+
PUID=$(id -u) PGID=$(id -g) docker compose up -d
211+
```
212+
213+
By default your home directory is mounted into the container (set
214+
`COCOINDEX_HOST_WORKSPACE` to narrow this to a specific code folder). Index
215+
data and the embedding model cache persist in a Docker volume across
216+
restarts. Your global settings file at `$HOME/.cocoindex_code/global_settings.yml`
217+
is visible and editable on the host; edits take effect on your next `ccc` command.
218+
219+
> **GHCR:** to pull from GitHub Container Registry instead of Docker Hub,
220+
> change the `image:` line in your copy of `docker-compose.yml` to
221+
> `ghcr.io/cocoindex-io/cocoindex-code:latest`.
222+
223+
### Or: `docker run`
224+
225+
<details>
226+
<summary>Docker Desktop (macOS / Windows)</summary>
202227

203228
```bash
204229
docker run -d --name cocoindex-code \
205-
--volume "$(pwd):/workspace" \
206-
--volume cocoindex-db:/db \
207-
--volume cocoindex-model-cache:/root/.cache \
208-
ghcr.io/cocoindex-io/cocoindex-code:latest
230+
--volume "$HOME:/workspace" \
231+
--volume cocoindex-data:/var/cocoindex \
232+
-e COCOINDEX_CODE_HOST_PATH_MAPPING="/workspace=$HOME" \
233+
cocoindex/cocoindex-code:latest
209234
```
235+
</details>
210236

211-
- `/workspace` — mount your project root here
212-
- `cocoindex-db` — index databases live inside the container (fast native I/O, no cross-OS volume issues)
213-
- `cocoindex-model-cache` — persists the embedding model across image upgrades
237+
<details>
238+
<summary>Linux (with <code>PUID</code>/<code>PGID</code>)</summary>
214239

215-
### Step 2 — Index your codebase
240+
```bash
241+
docker run -d --name cocoindex-code \
242+
-e PUID=$(id -u) -e PGID=$(id -g) \
243+
--volume "$HOME:/workspace" \
244+
--volume cocoindex-data:/var/cocoindex \
245+
-e COCOINDEX_CODE_HOST_PATH_MAPPING="/workspace=$HOME" \
246+
cocoindex/cocoindex-code:latest
247+
```
248+
</details>
249+
250+
### Shell wrapper for `ccc` commands
251+
252+
Paste this into `~/.bashrc` / `~/.zshrc` so `ccc` feels native on the host
253+
and picks up the right project based on your current directory:
216254

217255
```bash
218-
docker exec -it cocoindex-code ccc index
256+
ccc() {
257+
docker exec -it -e COCOINDEX_CODE_HOST_CWD="$PWD" cocoindex-code ccc "$@"
258+
}
219259
```
220260

221-
### Step 3 — Connect your coding agent
261+
Now `cd` into any project under your workspace and run `ccc init`, `ccc index`,
262+
`ccc search ...`, `ccc status`, etc. — it just works.
263+
264+
### Connect your coding agent
222265

223266
<details>
224267
<summary>Claude Code</summary>
225268

269+
Register MCP from inside the target project so `$PWD` points there:
270+
226271
```bash
227-
claude mcp add cocoindex-code -- docker exec -i cocoindex-code ccc mcp
272+
claude mcp add cocoindex-code -- docker exec -i \
273+
-e COCOINDEX_CODE_HOST_CWD="$PWD" cocoindex-code ccc mcp
228274
```
229275

230276
Or via `.mcp.json`:
@@ -235,40 +281,50 @@ Or via `.mcp.json`:
235281
"cocoindex-code": {
236282
"type": "stdio",
237283
"command": "docker",
238-
"args": ["exec", "-i", "cocoindex-code", "ccc", "mcp"]
284+
"args": [
285+
"exec",
286+
"-i",
287+
"-e",
288+
"COCOINDEX_CODE_HOST_CWD=${PWD}",
289+
"cocoindex-code",
290+
"ccc",
291+
"mcp"
292+
]
239293
}
240294
}
241295
}
242296
```
297+
298+
> Note: use `-i` (not `-it`). The `-t` flag allocates a terminal, which
299+
> interferes with MCP's JSON messaging over stdin/stdout — only add it for
300+
> interactive `ccc` commands like `ccc init`.
243301
</details>
244302
245303
<details>
246304
<summary>Codex</summary>
247305

248306
```bash
249-
codex mcp add cocoindex-code -- docker exec -i cocoindex-code ccc mcp
307+
codex mcp add cocoindex-code -- docker exec -i \
308+
-e COCOINDEX_CODE_HOST_CWD="$PWD" cocoindex-code ccc mcp
250309
```
251310
</details>
252311

253-
### CLI usage inside the container
312+
### Upgrading from an older image
254313

255-
All `ccc` commands work via `docker exec`:
314+
Earlier images used separate `cocoindex-db` and `cocoindex-model-cache`
315+
volumes; the current image consolidates them into a single `cocoindex-data`
316+
volume. Before pulling the new image, drop the old container and volumes —
317+
indexes rebuild on your next `ccc index`, and the embedding model is
318+
re-populated automatically on first start:
256319

257320
```bash
258-
docker exec -it cocoindex-code ccc index
259-
docker exec -it cocoindex-code ccc search "authentication logic"
260-
docker exec -it cocoindex-code ccc status
261-
```
262-
263-
Or set an alias on your host so it feels native:
264-
265-
```bash
266-
alias ccc='docker exec -it cocoindex-code ccc'
321+
docker rm -f cocoindex-code
322+
docker volume rm cocoindex-db cocoindex-model-cache
267323
```
268324

269325
### Configuration via environment variables
270326

271-
Pass configuration to `docker run` with `-e`:
327+
Pass configuration to `docker run` / compose with `-e`:
272328

273329
```bash
274330
# Extra extensions (e.g. Typesafe Config, SBT build files)
@@ -281,6 +337,10 @@ Pass configuration to `docker run` with `-e`:
281337
-e VOYAGE_API_KEY=your-key
282338
```
283339

340+
> **Security note:** mounting `$HOME` gives the container read/write access
341+
> to everything under it. If that's too broad, bind-mount a narrower
342+
> directory instead (`COCOINDEX_HOST_WORKSPACE=/path/to/code`).
343+
284344
### Build the image locally
285345

286346
```bash
@@ -315,6 +375,8 @@ envs: # extra environment variabl
315375
316376
> **Note:** The daemon inherits your shell environment. If an API key (e.g. `OPENAI_API_KEY`) is already set as an environment variable, you don't need to duplicate it in `envs`. The `envs` field is only for values that aren't in your environment.
317377

378+
> **Custom location:** set `COCOINDEX_CODE_DIR` to place `global_settings.yml` somewhere other than `~/.cocoindex_code/` — useful if you want the file to live alongside your projects (e.g. on a synced folder).
379+
318380
### Project Settings (`<project>/.cocoindex_code/settings.yml`)
319381

320382
Per-project. Controls which files to index.

docker/Dockerfile

Lines changed: 50 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -3,59 +3,79 @@
33
# Alpine / musl-libc would require building from source.
44
FROM python:3.12-slim AS builder
55

6-
# Install uv via pip — avoids a ghcr.io pull during build
76
RUN pip install --quiet uv
87

98
WORKDIR /build
109

10+
# Default: install the released cocoindex-code from PyPI (release flow).
11+
# Tests/local dev override with:
12+
# --build-arg CCC_INSTALL_SPEC=/ccc-src[default]
13+
# which installs from the copied-in source tree instead. The COPY always runs;
14+
# with .dockerignore trimming build artifacts it adds ~nothing.
15+
ARG CCC_INSTALL_SPEC="cocoindex-code[default]"
16+
COPY . /ccc-src
17+
1118
RUN uv pip install --system --prerelease=allow \
1219
"cocoindex>=1.0.0a33" \
13-
"cocoindex-code[default]"
20+
"${CCC_INSTALL_SPEC}"
1421

1522
# ─── Stage 2: pre-bake the default embedding model ────────────────────────────
16-
# Bakes Snowflake/snowflake-arctic-embed-xs into the image so cold container
17-
# starts don't trigger a download. Skip this stage (--target builder) if
18-
# you always supply a global_settings.yml pointing at a non-local model.
23+
# Bakes Snowflake/snowflake-arctic-embed-xs into the merged data directory at
24+
# /var/cocoindex/cache/..., so on first run Docker's volume copy-up populates
25+
# the cocoindex-data volume with the model — no network fetch needed.
1926
FROM builder AS model_cache
2027

21-
RUN python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('Snowflake/snowflake-arctic-embed-xs'); print('Model cached.')"
28+
ENV HF_HOME=/var/cocoindex/cache/huggingface \
29+
SENTENCE_TRANSFORMERS_HOME=/var/cocoindex/cache/sentence-transformers
30+
31+
RUN mkdir -p /var/cocoindex/cache/huggingface /var/cocoindex/cache/sentence-transformers \
32+
&& python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('Snowflake/snowflake-arctic-embed-xs'); print('Model cached.')"
2233

2334
# ─── Stage 3: runtime ─────────────────────────────────────────────────────────
2435
FROM python:3.12-slim AS runtime
2536

26-
# Copy installed packages and cached model from previous stages
37+
# gosu for privilege-drop (PUID/PGID pattern); create non-root coco user.
38+
RUN apt-get update \
39+
&& apt-get install -y --no-install-recommends gosu \
40+
&& rm -rf /var/lib/apt/lists/* \
41+
&& groupadd -g 1000 coco \
42+
&& useradd -u 1000 -g 1000 -m coco
43+
44+
# Copy installed packages + pre-baked model from previous stages.
2745
COPY --from=model_cache /usr/local/lib/python3.12 /usr/local/lib/python3.12
2846
COPY --from=model_cache /usr/local/bin/cocoindex-code /usr/local/bin/cocoindex-code
2947
COPY --from=model_cache /usr/local/bin/ccc /usr/local/bin/ccc
30-
COPY --from=model_cache /root/.cache /root/.cache
48+
COPY --from=model_cache /var/cocoindex/cache /var/cocoindex/cache
49+
50+
# Pre-create writable paths so the entrypoint's chown (under PUID) works even on
51+
# a fresh container, and so the default root-uid path has them in place.
52+
RUN mkdir -p /var/cocoindex/db /var/run/cocoindex_code \
53+
&& chown -R coco:coco /var/cocoindex /var/run/cocoindex_code
3154

32-
# The codebase is mounted at runtime — nothing project-specific lives here.
3355
WORKDIR /workspace
3456

3557
# ── Runtime defaults (all overridable via -e / --env) ─────────────────────────
58+
#
59+
# COCOINDEX_CODE_DIR — holds global_settings.yml on the bind mount so users can
60+
# edit it directly on the host.
61+
# COCOINDEX_CODE_RUNTIME_DIR — keeps daemon.sock/pid/log on the container's
62+
# native filesystem (AF_UNIX sockets on bind mounts are unreliable on
63+
# Docker Desktop, and /var/run is the standard spot for ephemeral runtime
64+
# state — wiped on container recreate, no stale-socket risk).
65+
# COCOINDEX_CODE_DB_PATH_MAPPING — keeps the indexer's LMDB + SQLite databases
66+
# on the native filesystem for speed and correctness.
67+
# HF_HOME / SENTENCE_TRANSFORMERS_HOME — direct the model cache at the path
68+
# the cocoindex-data volume mounts over.
69+
ENV COCOINDEX_CODE_DIR=/workspace/.cocoindex_code \
70+
COCOINDEX_CODE_RUNTIME_DIR=/var/run/cocoindex_code \
71+
COCOINDEX_CODE_DB_PATH_MAPPING=/workspace=/var/cocoindex/db \
72+
COCOINDEX_CODE_DAEMON_SUPERVISED=1 \
73+
HF_HOME=/var/cocoindex/cache/huggingface \
74+
SENTENCE_TRANSFORMERS_HOME=/var/cocoindex/cache/sentence-transformers
3675

37-
# Map index databases into the container's native filesystem so SQLite avoids
38-
# the slow/unreliable cross-OS volume layer (especially on macOS / Windows).
39-
# Format: /source=/target — databases for projects under /workspace go to /db.
40-
ENV COCOINDEX_CODE_DB_PATH_MAPPING=/workspace=/db
41-
42-
# Additional extensions: add project-specific extras at runtime, e.g.:
43-
# -e COCOINDEX_CODE_EXTRA_EXTENSIONS="conf,sbt"
44-
# See README for ext:lang syntax to map extensions to existing parsers.
45-
46-
# Exclude patterns: override at runtime for your build system, e.g.:
47-
# -e COCOINDEX_CODE_EXCLUDE_PATTERNS='["**/target/**","**/node_modules/**"]'
48-
49-
# Embedding model: defaults to local sentence-transformers
50-
# (Snowflake/snowflake-arctic-embed-xs, pre-baked above).
51-
# To use a different model, pre-mount a global_settings.yml into
52-
# ~/.cocoindex_code/ before the container starts, e.g.
53-
# -v /path/to/global_settings.yml:/root/.cocoindex_code/global_settings.yml
76+
# Set COCOINDEX_CODE_HOST_PATH_MAPPING at run time — it depends on the host path
77+
# the user bind-mounts to /workspace and can't be baked into the image.
5478

55-
# ── Persistent daemon entrypoint ──────────────────────────────────────────────
56-
# Initializes user settings on first start, then runs the daemon in the
57-
# foreground so the container stays alive.
58-
# Use `docker exec` to invoke the CLI or start an MCP session — see README.
5979
COPY docker/entrypoint.sh /entrypoint.sh
6080
RUN chmod +x /entrypoint.sh
6181
ENTRYPOINT ["/entrypoint.sh"]

0 commit comments

Comments
 (0)