Skip to content

Commit 0569a4d

Browse files
chapterjasonclaude
andcommitted
Adapt workspace image to DooD (host Docker socket)
The template now bind-mounts /var/run/docker.sock instead of running an inner dockerd under sysbox. Strip the engine + containerd from the image, align the in-image docker group GID with the host socket at boot, drop the docker.service systemd dependency from coder-agent, and refresh docs/comments to describe the DooD + privileged setup. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 25594b8 commit 0569a4d

6 files changed

Lines changed: 88 additions & 94 deletions

File tree

README.md

Lines changed: 48 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# SoureCode Coder Workspace Template
22

33
A Coder workspace template plus a family of workspace images with pre-installed
4-
dev tooling. Each workspace runs its own `dockerd` under
5-
[sysbox](https://github.com/nestybox/sysbox), so you can build and test your
6-
own `Dockerfile`s, run `docker compose` stacks, etc. inside the workspace.
4+
dev tooling. Each workspace gets the host's Docker socket bind-mounted in
5+
(Docker-out-of-Docker), so you can `docker build`, `docker compose up`, etc.
6+
from inside the workspace using the host daemon.
77

88
One container per workspace — no nested devcontainer layer. IDEs (VS Code
99
Desktop, JetBrains, code-server, web-shell) all attach to the single
@@ -15,7 +15,7 @@ Images are published to GHCR under `ghcr.io/sourecode/coder-workspace`.
1515

1616
| Tag | Base | Adds |
1717
|---|---|---|
18-
| `base` | `debian:trixie-slim` | systemd + dockerd + `nvm` + `claude-code` + `rtk` + `web-shell` + `home-persist` + `jetbrains` |
18+
| `base` | `debian:trixie-slim` | systemd + docker CLI (DooD) + `nvm` + `claude-code` + `rtk` + `web-shell` + `home-persist` + `jetbrains` |
1919
| `node` | `:base` | named variant for future Node-specific tooling — currently identical to `base` (Node comes from nvm) |
2020
| `cpp` | `:base` | `llvm` (clang + toolchain), `cmake`, `sccache`, `/etc/profile.d/llvm-env.sh` exporting `CC`/`CXX` |
2121

@@ -43,21 +43,26 @@ goes through `home-persist`'s manifest system.
4343
## Architecture
4444

4545
```
46-
host docker daemon (sysbox-runc runtime registered)
46+
host docker daemon
47+
├── /var/run/docker.sock ─────────┐ (bind-mounted into every workspace)
4748
└── workspace container (ghcr.io/sourecode/coder-workspace:<tag>)
48-
├── systemd (PID 1)
49-
├── dockerd (for in-workspace docker build / docker compose)
49+
├── systemd (PID 1)
50+
├── docker CLI ────────────────┘ (Docker-out-of-Docker via host socket)
5051
└── coder-agent.service (runs /etc/coder/agent-init.sh as `coder`)
5152
```
5253

54+
The container runs `--privileged` and shares the host Docker socket. This is
55+
fine for a single-tenant box but means workspace root effectively equals host
56+
root — don't onboard untrusted users.
57+
5358
## Template files
5459

55-
- `main.tf` — Coder template. Launches the workspace container under
56-
`runtime = "sysbox-runc"`, injects `CODER_AGENT_TOKEN` via env, and uploads
57-
the agent init script to `/etc/coder/agent-init.sh`. The
58-
`coder-agent.service` systemd unit (baked into the image) runs that script
59-
on boot.
60-
- `src/base/Dockerfile` — shared base: Debian trixie + systemd + dockerd +
60+
- `main.tf` — Coder template. Launches the workspace container with
61+
`privileged = true`, bind-mounts `/var/run/docker.sock` from the host,
62+
injects `CODER_AGENT_TOKEN` via env, and uploads the agent init script to
63+
`/etc/coder/agent-init.sh`. The `coder-agent.service` systemd unit (baked
64+
into the image) runs that script on boot.
65+
- `src/base/Dockerfile` — shared base: Debian trixie + systemd + docker CLI +
6166
`coder` user + dev-kit scripts.
6267
- `src/node/Dockerfile`, `src/cpp/Dockerfile` — stack variants (`FROM :base`).
6368
- `scripts/<name>/install.sh` — bound into each Dockerfile at build time via
@@ -66,51 +71,13 @@ goes through `home-persist`'s manifest system.
6671

6772
## Prerequisites (on the Docker host)
6873

69-
1. Linux kernel >= 5.12 (>= 6.3 ideal, avoids shiftfs entirely)
70-
2. Native Docker (not the snap) at `/usr/bin/docker`
71-
3. Sysbox installed (see below)
72-
4. An existing Coder server (this template was developed against a
74+
1. Native Docker (not the snap) at `/usr/bin/docker`
75+
2. An existing Coder server (this template was developed against a
7376
docker-compose-deployed Coder)
7477

75-
## Install sysbox
76-
77-
Zero-container-deletion install, tolerates a single `dockerd` restart.
78-
79-
```bash
80-
# 1. pre-populate /etc/docker/daemon.json so sysbox's post-install step
81-
# doesn't need to touch the network config itself
82-
sudo tee /etc/docker/daemon.json >/dev/null <<'JSON'
83-
{
84-
"bip": "172.24.0.1/16",
85-
"default-address-pools": [
86-
{ "base": "172.31.0.0/16", "size": 24 }
87-
]
88-
}
89-
JSON
90-
91-
# Pick CIDRs free of your existing networks:
92-
# docker network inspect $(docker network ls -q) | grep -i subnet
93-
94-
# 2. one controlled restart so dockerd loads the keys
95-
sudo systemctl restart docker
96-
97-
# 3. install sysbox (Ubuntu/Debian amd64)
98-
wget https://downloads.nestybox.com/sysbox/releases/v0.7.0/sysbox-ce_0.7.0-0.linux_amd64.deb
99-
sudo apt-get install -y jq fuse3 ./sysbox-ce_0.7.0-0.linux_amd64.deb
100-
101-
# 4. verify
102-
docker info | grep -i runtime # should list sysbox-runc
103-
systemctl status sysbox --no-pager
104-
```
105-
106-
Smoke test that nested Docker works under sysbox:
107-
108-
```bash
109-
CID=$(docker run -d --rm --runtime=sysbox-runc nestybox/ubuntu-noble-systemd-docker)
110-
sleep 15
111-
docker exec "$CID" docker run --rm hello-world # should print the hello-world greeting
112-
docker stop "$CID"
113-
```
78+
No sysbox or special runtime is required — workspaces run under the default
79+
runc with `--privileged` and the host's `/var/run/docker.sock` bind-mounted
80+
in.
11481

11582
## Build the workspace images
11683

@@ -162,29 +129,42 @@ pushing a new version, either:
162129
container exited instead of running systemd. Check:
163130
```bash
164131
CID=$(docker ps -a --filter "name=coder-" -q | head -1)
165-
docker inspect "$CID" --format '{{.HostConfig.Runtime}} {{.Config.Image}} {{.State.Status}}'
132+
docker inspect "$CID" --format '{{.Config.Image}} {{.State.Status}}'
166133
docker logs "$CID" | tail -50
167134
```
168-
Runtime must be `sysbox-runc`. Image should match whatever the template's
169-
`workspace_image` parameter resolved to.
135+
Image should match whatever the template's `workspace_image` parameter
136+
resolved to.
170137

171138
- **Agent up but nothing connects** — inspect systemd and the agent unit:
172139
```bash
173140
docker exec "$CID" systemctl is-system-running
174-
docker exec "$CID" systemctl status docker coder-agent --no-pager
141+
docker exec "$CID" systemctl status coder-agent --no-pager
175142
docker exec "$CID" journalctl -u coder-agent --no-pager -n 100
176143
docker exec "$CID" ls -la /etc/coder/ # expect agent-init.sh present + executable
177144
docker exec "$CID" bash -lc "tr '\0' '\n' < /proc/1/environ | grep CODER_AGENT_TOKEN"
178145
```
179146

180-
## Why sysbox
147+
- **`docker` from inside the workspace says "permission denied"** — the
148+
bind-mounted host socket has the host's `docker` group GID, which may not
149+
match the in-image `docker` group. The entrypoint aligns them at boot;
150+
if it didn't run (or you exec'd a fresh shell before it finished), check:
151+
```bash
152+
docker exec "$CID" stat -c '%g' /var/run/docker.sock
153+
docker exec "$CID" getent group docker
154+
```
155+
The two GIDs must match for `coder` to use the socket without sudo.
156+
157+
## Why DooD (and not DinD or sysbox)
181158

182-
The workspace bakes `dockerd` in so you can `docker build`, `docker compose up`,
183-
or run a project's own Dockerfile straight from inside your workspace without
184-
going through the host daemon. Running an inner dockerd safely inside a
185-
container is exactly what sysbox provides — plain `runc` would require
186-
`--privileged` and you'd still fight shared-kernel artefacts. Sysbox handles
187-
it with proper namespace isolation.
159+
Bind-mounting the host socket lets `docker build`, `docker compose up`, and
160+
project Dockerfiles run from inside the workspace without nesting a second
161+
Docker engine. The previous setup ran an inner `dockerd` under sysbox for
162+
isolation; we dropped it because the maintenance cost (sysbox install,
163+
persistent `/var/lib/docker` volume per workspace, dockerd shutdown
164+
ordering) outweighed the benefit on a single-tenant deployment. The trade
165+
is that workspaces share image cache and network namespace with the host
166+
daemon, and `--privileged` means workspace root can reach host root — only
167+
do this where you trust every workspace owner.
188168

189169
## Developing on this repo
190170

@@ -205,7 +185,7 @@ scripts/
205185
sccache/install.sh
206186
web-shell/install.sh
207187
src/
208-
base/Dockerfile # debian-trixie + systemd + dockerd + dev-kit
188+
base/Dockerfile # debian-trixie + systemd + docker CLI + dev-kit
209189
cpp/Dockerfile # FROM :base + llvm/cmake/sccache
210190
node/Dockerfile # FROM :base
211191
main.tf # Coder template

docs/persistence.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ opt in.
2020
## The three moving parts
2121

2222
1. **The volume + mount.** `main.tf` declares `docker_volume.home_persist`
23-
(per-owner, lives in the host dockerd) and mounts it into every workspace
23+
(per-owner, lives in the host Docker daemon) and mounts it into every workspace
2424
container at `/mnt/home-persist`:
2525

2626
```hcl
@@ -92,7 +92,7 @@ opt in.
9292

9393
```
9494
┌─────────────────────────────────────────────────────────────────┐
95-
│ host dockerd
95+
│ host Docker daemon
9696
│ │
9797
│ docker volume: coder-<owner>-home-persist ◄── one per owner │
9898
│ │ │
@@ -110,7 +110,7 @@ opt in.
110110
└─────────────────────────────────────────────────────────────────┘
111111
```
112112

113-
The volume lives in the **host** dockerd, above any individual workspace.
113+
The volume lives in the **host** Docker daemon, above any individual workspace.
114114
Declared in `main.tf` as `docker_volume "home_persist"`, scoped by
115115
`coder_workspace_owner.me.name`, and bind-mounted into the workspace
116116
container at `/mnt/home-persist`.

main.tf

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -440,8 +440,11 @@ removed {
440440
}
441441
}
442442

443-
# docker_data volume is no longer managed — workspace DinD now uses the host
444-
# Docker socket. Volume is retained to avoid data loss.
443+
# docker_data volume is no longer managed — the workspace now runs
444+
# Docker-out-of-Docker against the host daemon (host /var/run/docker.sock is
445+
# bind-mounted in), so there is no in-container /var/lib/docker to persist.
446+
# Volume is retained to avoid data loss for any workspace that still has
447+
# images/cache in it from the sysbox era.
445448
removed {
446449
from = docker_volume.docker_data
447450
lifecycle {
@@ -499,10 +502,9 @@ resource "docker_container" "workspace" {
499502
name = "coder-${data.coder_workspace_owner.me.name}-${lower(data.coder_workspace.me.name)}"
500503
hostname = data.coder_workspace.me.name
501504

502-
# Give systemd enough time to shut dockerd down cleanly on stop. The
503-
# default (10s) causes SIGKILL mid-flush, which corrupts the persistent
504-
# /var/lib/docker volume on the next start.
505-
stop_timeout = 120
505+
# PID 1 is systemd; give it a few seconds beyond the default to drain
506+
# units cleanly on stop.
507+
stop_timeout = 30
506508
privileged = true
507509

508510
# PID 1 is /sbin/init (systemd). coder-agent.service (baked into the image)

src/base/Dockerfile

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ RUN install -m 0755 -d /etc/apt/keyrings && \
2828
# Baseline packages + Docker engine.
2929
RUN apt-get update && \
3030
apt-get install -y --no-install-recommends --no-install-suggests \
31-
bash bash-completion build-essential ca-certificates containerd.io curl docker-ce docker-ce-cli \
31+
bash bash-completion build-essential ca-certificates curl docker-ce-cli \
3232
docker-buildx-plugin docker-compose-plugin dtach file git gnupg \
3333
htop iproute2 iputils-ping jq less locales lsb-release lsof \
3434
man-db openssh-client pipx pkg-config procps python3 python3-pip rsync strace \
@@ -42,14 +42,9 @@ RUN apt-get update && \
4242
RUN ssh-keyscan -t rsa,ecdsa,ed25519 github.com gitlab.com bitbucket.org \
4343
>> /etc/ssh/ssh_known_hosts 2>/dev/null
4444

45-
# Disable containerd-snapshotter so image/layer data stays under
46-
# /var/lib/docker (classic overlay2) instead of /var/lib/containerd — the
47-
# persistent devcontainer cache volume only covers /var/lib/docker.
48-
RUN mkdir -p /etc/docker && \
49-
printf '{\n "features": { "containerd-snapshotter": false }\n}\n' \
50-
> /etc/docker/daemon.json
51-
52-
RUN systemctl enable docker
45+
# Workspace runs Docker-out-of-Docker: the host's /var/run/docker.sock is
46+
# bind-mounted in. Only the docker CLI + plugins are installed (no engine,
47+
# no containerd), so there's no in-image daemon to mask.
5348

5449
# Classic `docker-compose` name → the compose plugin binary.
5550
RUN ln -s /usr/libexec/docker/cli-plugins/docker-compose /usr/bin/docker-compose
@@ -126,8 +121,9 @@ RUN printf '\nif [ -d /etc/profile.d ]; then\n for i in /etc/profile.d/*.sh; do
126121
>> /etc/bash.bashrc
127122

128123
# Coder agent: systemd unit runs /etc/coder/agent-init.sh as the workspace
129-
# user after dockerd is up. The script itself is uploaded at container-create
130-
# time by the Terraform template (kreuzwerker/docker `upload` block).
124+
# user after the network is up. The script itself is uploaded at
125+
# container-create time by the Terraform template (kreuzwerker/docker
126+
# `upload` block).
131127
RUN mkdir -p /etc/coder
132128
COPY src/base/coder-agent.service /etc/systemd/system/coder-agent.service
133129
COPY src/base/web-shell.service /etc/systemd/system/web-shell.service

src/base/coder-agent.service

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[Unit]
22
Description=Coder Agent
3-
After=docker.service network-online.target
4-
Wants=network-online.target docker.service
3+
After=network-online.target
4+
Wants=network-online.target
55
ConditionPathExists=/etc/coder/agent-init.sh
66

77
[Service]

src/base/entrypoint.sh

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,9 @@
33
#
44
# Fresh Docker named volumes mount as root:root the first time they're
55
# attached, even though the underlying image path is owned by the workspace
6-
# user (docker's "copy-up" doesn't reliably apply under sysbox-runc). Claim
7-
# the mountpoints here so coder-agent.service — which starts later under
8-
# systemd — sees correctly-owned mounts. Idempotent: chown/chmod are no-ops
9-
# when values already match.
6+
# user. Claim the mountpoints here so coder-agent.service — which starts
7+
# later under systemd — sees correctly-owned mounts. Idempotent: chown/chmod
8+
# are no-ops when values already match.
109
set -eo pipefail
1110

1211
USERNAME="${USERNAME:-coder}"
@@ -24,4 +23,21 @@ if [ -d /mnt/shared ]; then
2423
chmod 0777 /mnt/shared
2524
fi
2625

26+
# DooD: align the in-image `docker` group GID with the host socket's GID so
27+
# the workspace user (member of `docker` in the image) can talk to the
28+
# bind-mounted host daemon without a chmod 666 on the socket.
29+
if [ -S /var/run/docker.sock ]; then
30+
sock_gid=$(stat -c '%g' /var/run/docker.sock)
31+
cur_gid=$(getent group docker | cut -d: -f3 || true)
32+
if [ -n "$sock_gid" ] && [ "$sock_gid" != "$cur_gid" ]; then
33+
# If another group already owns the target GID, rename it out of the way
34+
# so groupmod can take it.
35+
conflict=$(getent group "$sock_gid" | cut -d: -f1 || true)
36+
if [ -n "$conflict" ] && [ "$conflict" != "docker" ]; then
37+
groupmod -n "${conflict}-host" "$conflict"
38+
fi
39+
groupmod -g "$sock_gid" docker
40+
fi
41+
fi
42+
2743
exec "$@"

0 commit comments

Comments
 (0)