fix: verify blob digests and sandbox Python backends#848
Conversation
Addresses two security vulnerabilities reported against Model Runner: - Blob digest verification: after downloading each layer, hash the complete file and reject it if the SHA256 does not match the digest declared in the manifest. A malicious registry could previously serve arbitrary bytes under any digest name and they would be stored without error. The full file is hashed (not just the streamed bytes) so that resumed downloads are also verified correctly. - Python backend sandboxing: add ConfigurationPython to the sandbox package and apply it to all Python-based inference backends (vLLM, vllm-metal, SGLang, MLX, Diffusers). On Darwin this uses sandbox-exec with a deny-by-default seatbelt profile matching the existing llama.cpp policy; on Windows the same job object limits are applied. Previously these backends ran completely unsandboxed, meaning attacker-controlled model files could execute arbitrary code on the host.
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The Darwin sandbox
ConfigurationPythonallows binding to any localhost TCP port via(local tcp "localhost:*"); if possible, consider tightening this to the specific ports or a narrower range that the backends actually use to reduce the attack surface. - The new
ConfigurationPythondefinitions for Darwin/Windows largely mirror the existing llama.cpp profiles; consider factoring shared parts into common helpers/constants so that future hardening or policy changes only need to be updated in one place.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The Darwin sandbox `ConfigurationPython` allows binding to any localhost TCP port via `(local tcp "localhost:*")`; if possible, consider tightening this to the specific ports or a narrower range that the backends actually use to reduce the attack surface.
- The new `ConfigurationPython` definitions for Darwin/Windows largely mirror the existing llama.cpp profiles; consider factoring shared parts into common helpers/constants so that future hardening or policy changes only need to be updated in one place.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Code Review
This pull request implements SHA256 digest verification for blobs before finalization and introduces a specialized sandbox profile for Python-based inference backends. While these changes enhance security and isolation, critical fixes are required: the verification logic must be expanded to include SHA512 to prevent security bypasses, and the SandboxPath in the diffusers, MLX, and vLLM-metal backends must be correctly set to the bin or binary directory to ensure libraries load properly on Darwin. Additionally, missing imports for crypto/sha512 and path/filepath should be added to support these fixes.
| "crypto/sha256" | ||
| "encoding/hex" |
There was a problem hiding this comment.
The store allows both sha256 and sha512 algorithms (see line 25), but the verification logic only handles sha256. To prevent a security bypass where a malicious registry could serve unverified sha512 blobs, we should support verification for all allowed algorithms. This requires importing crypto/sha512 and hash.
| "crypto/sha256" | |
| "encoding/hex" | |
| "crypto/sha256" | |
| "crypto/sha512" | |
| "encoding/hex" | |
| "errors" | |
| "fmt" | |
| "hash" | |
| "io" |
| if diffID.Algorithm == "sha256" { | ||
| completedFile, openErr := os.Open(incompletePath) | ||
| if openErr != nil { | ||
| _ = os.Remove(incompletePath) | ||
| return fmt.Errorf("open completed blob file for verification: %w", openErr) | ||
| } | ||
|
|
||
| hasher := sha256.New() | ||
| if _, copyErr := io.Copy(hasher, completedFile); copyErr != nil { | ||
| completedFile.Close() | ||
| _ = os.Remove(incompletePath) | ||
| return fmt.Errorf("hash completed blob file: %w", copyErr) | ||
| } | ||
| completedFile.Close() | ||
|
|
||
| computed := hex.EncodeToString(hasher.Sum(nil)) | ||
| if computed != diffID.Hex { | ||
| _ = os.Remove(incompletePath) | ||
| return fmt.Errorf("blob digest mismatch for %q: expected sha256:%s, got sha256:%s", | ||
| diffID.String(), diffID.Hex, computed) | ||
| } | ||
| } |
There was a problem hiding this comment.
The verification logic is hardcoded to sha256, which means sha512 blobs (which are allowed by the store) will skip verification entirely. This creates a security vulnerability. The logic should be updated to use the appropriate hasher based on the diffID.Algorithm.
if diffID.Algorithm == "sha256" || diffID.Algorithm == "sha512" {
completedFile, openErr := os.Open(incompletePath)
if openErr != nil {
_ = os.Remove(incompletePath)
return fmt.Errorf("open completed blob file for verification: %w", openErr)
}
var hasher hash.Hash
if diffID.Algorithm == "sha512" {
hasher = sha512.New()
} else {
hasher = sha256.New()
}
if _, copyErr := io.Copy(hasher, completedFile); copyErr != nil {
completedFile.Close()
_ = os.Remove(incompletePath)
return fmt.Errorf("hash completed blob file: %w", copyErr)
}
completedFile.Close()
computed := hex.EncodeToString(hasher.Sum(nil))
if computed != diffID.Hex {
_ = os.Remove(incompletePath)
return fmt.Errorf("blob digest mismatch for %q: expected %s:%s, got %s:%s",
diffID.String(), diffID.Algorithm, diffID.Hex, diffID.Algorithm, computed)
}
}| BinaryPath: d.pythonPath, | ||
| SandboxPath: "", | ||
| SandboxConfig: "", | ||
| SandboxConfig: sandbox.ConfigurationPython, |
There was a problem hiding this comment.
The SandboxPath is currently empty. On Darwin, the sandbox profile uses [UPDATEDBINPATH] and [UPDATEDLIBPATH] to grant access to the Python environment. [UPDATEDLIBPATH] is calculated as filepath.Join(filepath.Dir(updatedBinPath), "lib"). If SandboxPath is empty, the process will likely fail to load its bundled libraries. It should be set to the bin directory of the installation.
| SandboxConfig: sandbox.ConfigurationPython, | |
| SandboxPath: filepath.Join(d.installDir, "bin"), |
| "github.com/docker/model-runner/pkg/logging" | ||
| "github.com/docker/model-runner/pkg/sandbox" |
There was a problem hiding this comment.
To correctly support sandboxing for custom Python paths on Darwin, we need to determine the directory of the Python binary to set the SandboxPath. This requires the path/filepath package.
| "github.com/docker/model-runner/pkg/logging" | |
| "github.com/docker/model-runner/pkg/sandbox" | |
| "github.com/docker/model-runner/pkg/logging" | |
| "github.com/docker/model-runner/pkg/sandbox" | |
| "path/filepath" | |
| ) |
| BinaryPath: m.pythonPath, | ||
| SandboxPath: "", | ||
| SandboxConfig: "", | ||
| SandboxConfig: sandbox.ConfigurationPython, |
There was a problem hiding this comment.
The SandboxPath should be set to the directory containing the Python binary. This ensures that the Darwin sandbox correctly calculates [UPDATEDLIBPATH] (as the sibling lib directory), allowing the Python process to load its dependencies. This is especially important when a customPythonPath is used.
| SandboxConfig: sandbox.ConfigurationPython, | |
| SandboxPath: filepath.Dir(m.pythonPath), |
| BinaryPath: v.pythonPath, | ||
| SandboxPath: "", | ||
| SandboxConfig: "", | ||
| SandboxPath: v.installDir, |
There was a problem hiding this comment.
The SandboxPath is set to v.installDir, which is the root of the Python environment. However, the Darwin sandbox profile (see pkg/sandbox/sandbox_darwin.go:224) calculates [UPDATEDLIBPATH] by taking the parent of the provided path and appending lib. If SandboxPath is the root, UPDATEDLIBPATH will point to the parent directory of the environment instead of the environment's own lib directory, causing the backend to fail to load its libraries. It should be set to the bin directory.
| SandboxPath: v.installDir, | |
| SandboxPath: filepath.Join(v.installDir, "bin"), |
Addresses two security vulnerabilities reported against Model Runner:
Blob digest verification: after downloading each layer, hash the complete file and reject it if the SHA256 does not match the digest declared in the manifest. A malicious registry could previously serve arbitrary bytes under any digest name and they would be stored without error. The full file is hashed (not just the streamed bytes) so that resumed downloads are also verified correctly.
Python backend sandboxing: add ConfigurationPython to the sandbox package and apply it to all Python-based inference backends (vLLM, vllm-metal, SGLang, MLX, Diffusers). On Darwin this uses sandbox-exec with a deny-by-default seatbelt profile matching the existing llama.cpp policy; on Windows the same job object limits are applied. Previously these backends ran completely unsandboxed, meaning attacker-controlled model files could execute arbitrary code on the host.