fix(security): replace shell-templated skills-init with Go binary (#1842)#1928
Open
EItanya wants to merge 10 commits into
Open
fix(security): replace shell-templated skills-init with Go binary (#1842)#1928EItanya wants to merge 10 commits into
EItanya wants to merge 10 commits into
Conversation
…E GHSA #1842) The previous skills-init container rendered a Bash script from a Go template, interpolating user-controlled Agent CRD fields (Git URL/Ref/ Path/Name, OCI image, secret name) into `cat <<'ENDVAL' ... ENDVAL` heredocs. A low-privileged user with create/update on Agent could embed `ENDVAL` in any of those fields to escape the heredoc and execute arbitrary commands inside the init container, which could reach the node's IMDS or other in-cluster services. This commit eliminates the shell entirely: * New `skills-init` Go binary (`go/core/cmd/skills-init`) consumes a structured JSON config and invokes `git`/SSH via `exec.Command` with argv vectors. User strings never reach a shell. * OCI fetch moves to in-process `go-containerregistry` (no more krane binary or jq for docker-config merging). Tar extraction rejects absolute paths, `..` traversal, and symlinks whose targets escape the destination. * Controller emits a per-Agent ConfigMap with the JSON config; the pod template carries a `kagent.dev/skills-init-hash` annotation so config changes trigger pod rollout. * skills-init Dockerfile rewritten as a multi-stage build of the Go binary; krane and jq are dropped from the runtime image. Tested end-to-end on a kind cluster: git skills clone via argv and the agent starts; OCI pull/extract works for benign content; the tar extractor correctly rejects path-escape symlinks. Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR mitigates the command-injection risk in the skills-init init container by replacing the rendered shell script approach with a dedicated Go binary driven by a structured JSON ConfigMap, and by moving OCI image fetching/extraction in-process via go-containerregistry.
Changes:
- Add a new
skills-initGo binary andskillsinitinternal package to perform git clones and OCI exports without shell templating. - Update the Agent translator to emit a per-Agent ConfigMap containing
skills-initJSON config, mount it into the init container, and hash-annotate the pod template for rollouts. - Rework the
skills-initDockerfile into a multi-stage Go build and update tests/golden outputs accordingly.
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| Makefile | Builds skills-init image using ./go as build context. |
| go/go.mod | Adds go-containerregistry and related indirect dependencies. |
| go/go.sum | Updates module checksums for new dependencies. |
| go/core/internal/skillsinit/config.go | Defines the JSON “wire format” and mount constants for the binary. |
| go/core/internal/skillsinit/runner.go | Orchestrates docker auth merge, git auth setup, git clones, and OCI pulls. |
| go/core/internal/skillsinit/docker.go | Merges multiple dockerconfigjson secrets into a single DOCKER_CONFIG. |
| go/core/internal/skillsinit/ssh.go | Sets up SSH key / known_hosts seeding and token-based git credential helper. |
| go/core/internal/skillsinit/git.go | Implements git cloning and subPath materialization without a shell. |
| go/core/internal/skillsinit/copytree.go | Implements symlink-dereferencing tree copy for subPath handling. |
| go/core/internal/skillsinit/oci.go | Pulls/exports OCI images and extracts tar streams with path safety checks. |
| go/core/cmd/skills-init/main.go | Entrypoint for the init container binary. |
| go/core/internal/controller/translator/agent/adk_api_translator.go | Replaces script templating with JSON config + ConfigMap + init container command update. |
| go/core/internal/controller/translator/agent/manifest_builder.go | Adds ConfigMap to manifest outputs and hashes config into pod template annotations. |
| go/core/internal/controller/translator/agent/skills-init.sh.tmpl | Removes the old shell-template implementation. |
| go/core/internal/controller/translator/agent/skills_unit_test.go | Updates unit tests to use skillsinit types and renamed helpers. |
| go/core/internal/controller/translator/agent/git_skills_test.go | Updates translator tests to assert structured ConfigMap JSON content. |
| go/core/internal/controller/translator/agent/testdata/outputs/agent_with_skills.json | Updates golden output for ConfigMap, volume/mount, and command changes. |
| go/core/internal/controller/translator/agent/testdata/outputs/agent_with_git_skills.json | Updates golden output for ConfigMap, volume/mount, and command changes. |
| docker/skills-init/Dockerfile | Multi-stage build for the Go binary; removes krane/jq, adds runtime deps. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+58
to
+77
| func walkFollow(src, rel string, fn func(rel string, info fs.FileInfo, openFile func() (io.ReadCloser, error)) error) error { | ||
| full := filepath.Join(src, rel) | ||
| info, err := os.Stat(full) // Stat follows symlinks | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if rel != "" { | ||
| if err := fn(rel, info, func() (io.ReadCloser, error) { return os.Open(full) }); err != nil { | ||
| return err | ||
| } | ||
| } | ||
| if !info.IsDir() { | ||
| return nil | ||
| } | ||
| entries, err := os.ReadDir(full) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| for _, e := range entries { | ||
| if err := walkFollow(src, filepath.Join(rel, e.Name()), fn); err != nil { |
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
The Go-native copyTree/walkFollow added in the previous commit went further than necessary. The src + dst paths are both controlled by us (mktemp dir + a cleaned/validated subPath under a freshly-cloned repo), and they are passed to cp as argv entries — no shell, no metacharacter risk. cp -rL is the same primitive the old script used, just invoked the safe way. What stays native: - tar extraction in oci.go (path-escape and symlink-escape rejection is real security logic that "tar xf" does not provide) - docker config merging in docker.go (pure JSON manipulation, simpler than shelling to jq) - git, ssh-keyscan invocations: argv-vector exec.Command Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Audit of where user-controlled CRD fields end up in subprocess argv turned up two defense-in-depth gaps: * `git checkout <ref>` had no `--` separator. If a malicious Agent set the Ref to a string starting with `-`, git would parse it as a flag. isCommitSHA already restricts the commit branch to 40 hex chars, but the separator costs nothing. * The skill directory name (Skills.GitRefs[].Name, or derived from URL / Path / OCI ref) was used verbatim as the leaf of /skills/<name>. `Name: "../etc"` would have escaped the volume. Now restricted to `[A-Za-z0-9._-]+` and ".", ".." rejected explicitly. Also remove the unused context.Context parameter that the new Run entrypoint took but never consulted. Audit notes (no changes needed): * OCI image string is parsed by go-containerregistry, not a shell * ssh-keyscan host/port is already regex-filtered to [A-Za-z0-9.-] * dockerconfigjson secret names are K8s DNS-labels by API contract * subPath is already validated (no abs, no ..) plus defense-in-depth in applySubPath at runtime Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
- docker.go: use maps.Copy instead of manual loop (modernize lint) - git.go: use slices.Contains in hasDotDot (modernize lint) - oci.go: drop deprecated tar.TypeRegA alias (SA1019) - e2e: assert skills-init via ConfigMap JSON instead of legacy script body Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
peterj
reviewed
May 27, 2026
| return fmt.Errorf("skill name is empty") | ||
| } | ||
| if name == "." || name == ".." { | ||
| return fmt.Errorf("skill name %q is reserved", name) |
Collaborator
There was a problem hiding this comment.
nit: are . and .. a name? :)
peterj
previously approved these changes
May 27, 2026
Three new test surfaces that pin the data-only contract introduced by the shell-to-Go rewrite: - validateSkillName: rejection battery covering shell metas, traversal, newlines, null bytes, globs, brace expansion, unicode lookalikes. - prepareSkillsInitConfig: explicit-Name injection, OCI-derived name injection, subPath traversal, and a positive test confirming that malicious URL/Ref strings flow through verbatim (argv-only, never a shell). - skillsinit package: tar safeJoin/extractTar reject path traversal, absolute symlinks, and escaping relative symlinks; applySubPath rejects traversal and non-dir targets; hasDotDot covered directly. Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #1842 — arbitrary command execution in the
skills-initcontainer via Agent CRD fields.The previous implementation rendered a Bash script from a Go
text/template, interpolating user-controlled fields (Git URL/Ref/Path/Name, OCI image, secret names) intocat <<'ENDVAL' ... ENDVALheredocs. A low-privileged user withcreate/updateon Agent could embedENDVALto escape the heredoc and execute arbitrary commands inside the init container — reaching node IMDS, other in-cluster services, or the shared/skillsvolume.This PR eliminates the shell entirely:
skills-initGo binary (go/core/cmd/skills-init+go/core/internal/skillsinit/) consumes a structured JSON config and invokesgit/ssh-keyscanviaexec.Commandwith argv vectors. User strings never reach a shell.go-containerregistry(no morekranesubprocess, no morejqfor docker-config merging). Tar extraction rejects absolute paths,..traversal, and symlinks whose targets escape the destination.kagent.dev/skills-init-hashannotation so config changes still trigger pod rollout.kraneandjqare dropped from the runtime image (smaller surface area, fewer CVE sources).Test plan
UPDATE_GOLDEN=true)go vet ./...cleangitRefs[].path→ repo cloned via argv, subPath applied, agent container starts with files at/skills/<name>/refs→ pull + TLS + token auth succeed; tar extraction works for benign content🤖 Comment / PR created by Claude on behalf of @EItanya