Problem
--separate-weights has three open bugs that share a root cause: the current implementation mutates the user's .dockerignore in-place and relies on a separate -weights Docker image that can go stale.
We'd like to make --separate-weights robust enough to enable by default.
Proposed approach
Replace the two-image build + .dockerignore mutation with a single Docker build using two BuildKit features that the codebase already has plumbing for:
Named build contexts
The generated Dockerfile becomes:
FROM weights
COPY checkpoints /src/checkpoints
FROM nvidia/cuda:11.8.0-...
# ... setup, pip install, etc ...
COPY --from=0 --link /src/checkpoints /src/checkpoints
COPY . /src
weights is a named build context (BuildContexts: {"weights": projectDir}) pointing at the project root. BuildKit resolves FROM weights to that directory instead of trying to pull a Docker image. The COPY commands in that stage selectively copy only the detected weight files. COPY --from=0 --link in the main stage puts them in independent layers, so code changes don't invalidate the weight cache.
The plumbing for named build contexts already exists at pkg/docker/buildkit.go:87-96 — it just isn't used by the standard generator today.
Dockerfile.dockerignore
BuildKit looks for <Dockerfile>.dockerignore in the Dockerfile directory. If found, it uses that instead of .dockerignore from the context root. Since cog already writes the Dockerfile to a temp directory, we write the generated ignore rules (user's original rules + weight exclusions) alongside it as Dockerfile.dockerignore. The user's .dockerignore is never touched.
Why this is robust enough to enable by default
No file mutation. The user's .dockerignore is never read, modified, backed up, or restored. There's no window where a failure can corrupt the project.
No phantom image dependency. There's no -weights image that can go missing after a docker prune, and no .cog/cache/weights_manifest.json that can go stale. BuildKit handles all caching internally via content-addressable layer hashes.
Idempotent. Running cog build with or without separate weights produces the same image contents. The only difference is layer structure. Switching between modes doesn't leave any state behind.
Graceful degradation of weight detection. If FindWeights() guesses wrong about which files are weights, the build still succeeds — files just end up in the wrong layer. With the old approach, a bad guess could corrupt .dockerignore and break the build entirely.
Single build is faster. BuildKit parallelises stages internally, so the weights stage and setup stages (apt, pip) run concurrently. The old approach serialised two full Docker builds with two context uploads.
Key changes
| File |
Change |
pkg/docker/command/command.go |
Add DockerignoreContents field to ImageBuildOptions |
pkg/docker/buildkit.go |
Write Dockerfile.dockerignore in temp dir alongside Dockerfile |
pkg/dockerfile/generator.go |
Simplify GenerateModelBaseWithSeparateWeights — returns single Dockerfile + dockerignore |
pkg/dockerfile/standard_generator.go |
Generate single multi-stage Dockerfile using named build context; update BuildContexts() to include "weights" |
pkg/image/build.go |
Replace two-build flow with single build; delete .dockerignore backup/restore functions; delete manifest cache logic |
pkg/cli/debug.go |
Update to new return signature |
pkg/dockerfile/standard_generator_test.go |
Update ~20 tests |
pkg/weights/manifest.go |
Remove (CRC32 cache no longer needed) |
Fixes #1323. Fixes #1917. Fixes #2548.
Problem
--separate-weightshas three open bugs that share a root cause: the current implementation mutates the user's.dockerignorein-place and relies on a separate-weightsDocker image that can go stale.cog push --separate-weightsfails with 404 because the local-weightsimage was pruned but the manifest cache (.cog/cache/weights_manifest.json) still thinks it exists. Docker tries to pullr8.im/…-weightsfrom the registry, which doesn't exist.--separate-weightspermanently modifies.dockerignore. If the build fails between backup and restore, the user's file is left corrupted with cog-generated weight exclusions. Subsequent builds without the flag then silently exclude weight files..dockerignoremutation, a bad guess breaks the build rather than just affecting layer efficiency.We'd like to make
--separate-weightsrobust enough to enable by default.Proposed approach
Replace the two-image build +
.dockerignoremutation with a single Docker build using two BuildKit features that the codebase already has plumbing for:Named build contexts
The generated Dockerfile becomes:
weightsis a named build context (BuildContexts: {"weights": projectDir}) pointing at the project root. BuildKit resolvesFROM weightsto that directory instead of trying to pull a Docker image. TheCOPYcommands in that stage selectively copy only the detected weight files.COPY --from=0 --linkin the main stage puts them in independent layers, so code changes don't invalidate the weight cache.The plumbing for named build contexts already exists at
pkg/docker/buildkit.go:87-96— it just isn't used by the standard generator today.Dockerfile.dockerignoreBuildKit looks for
<Dockerfile>.dockerignorein the Dockerfile directory. If found, it uses that instead of.dockerignorefrom the context root. Since cog already writes the Dockerfile to a temp directory, we write the generated ignore rules (user's original rules + weight exclusions) alongside it asDockerfile.dockerignore. The user's.dockerignoreis never touched.Why this is robust enough to enable by default
No file mutation. The user's
.dockerignoreis never read, modified, backed up, or restored. There's no window where a failure can corrupt the project.No phantom image dependency. There's no
-weightsimage that can go missing after adocker prune, and no.cog/cache/weights_manifest.jsonthat can go stale. BuildKit handles all caching internally via content-addressable layer hashes.Idempotent. Running
cog buildwith or without separate weights produces the same image contents. The only difference is layer structure. Switching between modes doesn't leave any state behind.Graceful degradation of weight detection. If
FindWeights()guesses wrong about which files are weights, the build still succeeds — files just end up in the wrong layer. With the old approach, a bad guess could corrupt.dockerignoreand break the build entirely.Single build is faster. BuildKit parallelises stages internally, so the weights stage and setup stages (apt, pip) run concurrently. The old approach serialised two full Docker builds with two context uploads.
Key changes
pkg/docker/command/command.goDockerignoreContentsfield toImageBuildOptionspkg/docker/buildkit.goDockerfile.dockerignorein temp dir alongside Dockerfilepkg/dockerfile/generator.goGenerateModelBaseWithSeparateWeights— returns single Dockerfile + dockerignorepkg/dockerfile/standard_generator.goBuildContexts()to include"weights"pkg/image/build.go.dockerignorebackup/restore functions; delete manifest cache logicpkg/cli/debug.gopkg/dockerfile/standard_generator_test.gopkg/weights/manifest.goFixes #1323. Fixes #1917. Fixes #2548.