Skip to content

fix(security): replace shell-templated skills-init with Go binary (#1842)#1928

Open
EItanya wants to merge 10 commits into
mainfrom
fix/cve-1842-skills-init-shell-injection
Open

fix(security): replace shell-templated skills-init with Go binary (#1842)#1928
EItanya wants to merge 10 commits into
mainfrom
fix/cve-1842-skills-init-shell-injection

Conversation

@EItanya
Copy link
Copy Markdown
Contributor

@EItanya EItanya commented May 26, 2026

Summary

Fixes #1842 — arbitrary command execution in the skills-init container via Agent CRD fields.

The previous implementation rendered a Bash script from a Go text/template, interpolating user-controlled fields (Git URL/Ref/Path/Name, OCI image, secret names) into cat <<'ENDVAL' ... ENDVAL heredocs. A low-privileged user with create/update on Agent could embed ENDVAL to escape the heredoc and execute arbitrary commands inside the init container — reaching node IMDS, other in-cluster services, or the shared /skills volume.

This PR eliminates the shell entirely:

  • New skills-init Go binary (go/core/cmd/skills-init + go/core/internal/skillsinit/) consumes a structured JSON config and invokes git / ssh-keyscan via exec.Command with argv vectors. User strings never reach a shell.
  • OCI fetch moves to in-process go-containerregistry (no more krane subprocess, no more jq for docker-config merging). Tar extraction rejects absolute paths, .. traversal, and symlinks whose targets escape the destination.
  • Controller emits a per-Agent ConfigMap with the JSON config; the pod template carries a kagent.dev/skills-init-hash annotation so config changes still trigger pod rollout.
  • Dockerfile rewritten as a multi-stage build of the Go binary; krane and jq are dropped from the runtime image (smaller surface area, fewer CVE sources).

Test plan

  • Unit tests updated to assert ConfigMap JSON content instead of script text — all green
  • Golden translator outputs regenerated (UPDATE_GOLDEN=true)
  • go vet ./... clean
  • End-to-end on a kind cluster:
    • Agent with gitRefs[].path → repo cloned via argv, subPath applied, agent container starts with files at /skills/<name>/
    • Agent with OCI refs → pull + TLS + token auth succeed; tar extraction works for benign content
    • Path-escape symlink in a tar entry → correctly rejected with a clear error
  • CI green
  • Maintainer review

🤖 Comment / PR created by Claude on behalf of @EItanya

…E GHSA #1842)

The previous skills-init container rendered a Bash script from a Go
template, interpolating user-controlled Agent CRD fields (Git URL/Ref/
Path/Name, OCI image, secret name) into `cat <<'ENDVAL' ... ENDVAL`
heredocs. A low-privileged user with create/update on Agent could embed
`ENDVAL` in any of those fields to escape the heredoc and execute
arbitrary commands inside the init container, which could reach the
node's IMDS or other in-cluster services.

This commit eliminates the shell entirely:

* New `skills-init` Go binary (`go/core/cmd/skills-init`) consumes a
  structured JSON config and invokes `git`/SSH via `exec.Command` with
  argv vectors. User strings never reach a shell.
* OCI fetch moves to in-process `go-containerregistry` (no more krane
  binary or jq for docker-config merging). Tar extraction rejects
  absolute paths, `..` traversal, and symlinks whose targets escape
  the destination.
* Controller emits a per-Agent ConfigMap with the JSON config; the pod
  template carries a `kagent.dev/skills-init-hash` annotation so config
  changes trigger pod rollout.
* skills-init Dockerfile rewritten as a multi-stage build of the Go
  binary; krane and jq are dropped from the runtime image.

Tested end-to-end on a kind cluster: git skills clone via argv and the
agent starts; OCI pull/extract works for benign content; the tar
extractor correctly rejects path-escape symlinks.

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Copilot AI review requested due to automatic review settings May 26, 2026 19:05
@github-actions github-actions Bot added the bug Something isn't working label May 26, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mitigates the command-injection risk in the skills-init init container by replacing the rendered shell script approach with a dedicated Go binary driven by a structured JSON ConfigMap, and by moving OCI image fetching/extraction in-process via go-containerregistry.

Changes:

  • Add a new skills-init Go binary and skillsinit internal package to perform git clones and OCI exports without shell templating.
  • Update the Agent translator to emit a per-Agent ConfigMap containing skills-init JSON config, mount it into the init container, and hash-annotate the pod template for rollouts.
  • Rework the skills-init Dockerfile into a multi-stage Go build and update tests/golden outputs accordingly.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
Makefile Builds skills-init image using ./go as build context.
go/go.mod Adds go-containerregistry and related indirect dependencies.
go/go.sum Updates module checksums for new dependencies.
go/core/internal/skillsinit/config.go Defines the JSON “wire format” and mount constants for the binary.
go/core/internal/skillsinit/runner.go Orchestrates docker auth merge, git auth setup, git clones, and OCI pulls.
go/core/internal/skillsinit/docker.go Merges multiple dockerconfigjson secrets into a single DOCKER_CONFIG.
go/core/internal/skillsinit/ssh.go Sets up SSH key / known_hosts seeding and token-based git credential helper.
go/core/internal/skillsinit/git.go Implements git cloning and subPath materialization without a shell.
go/core/internal/skillsinit/copytree.go Implements symlink-dereferencing tree copy for subPath handling.
go/core/internal/skillsinit/oci.go Pulls/exports OCI images and extracts tar streams with path safety checks.
go/core/cmd/skills-init/main.go Entrypoint for the init container binary.
go/core/internal/controller/translator/agent/adk_api_translator.go Replaces script templating with JSON config + ConfigMap + init container command update.
go/core/internal/controller/translator/agent/manifest_builder.go Adds ConfigMap to manifest outputs and hashes config into pod template annotations.
go/core/internal/controller/translator/agent/skills-init.sh.tmpl Removes the old shell-template implementation.
go/core/internal/controller/translator/agent/skills_unit_test.go Updates unit tests to use skillsinit types and renamed helpers.
go/core/internal/controller/translator/agent/git_skills_test.go Updates translator tests to assert structured ConfigMap JSON content.
go/core/internal/controller/translator/agent/testdata/outputs/agent_with_skills.json Updates golden output for ConfigMap, volume/mount, and command changes.
go/core/internal/controller/translator/agent/testdata/outputs/agent_with_git_skills.json Updates golden output for ConfigMap, volume/mount, and command changes.
docker/skills-init/Dockerfile Multi-stage build for the Go binary; removes krane/jq, adds runtime deps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread go/core/internal/skillsinit/oci.go
Comment thread go/core/internal/skillsinit/oci.go
Comment thread go/core/internal/skillsinit/copytree.go Outdated
Comment on lines +58 to +77
func walkFollow(src, rel string, fn func(rel string, info fs.FileInfo, openFile func() (io.ReadCloser, error)) error) error {
full := filepath.Join(src, rel)
info, err := os.Stat(full) // Stat follows symlinks
if err != nil {
return err
}
if rel != "" {
if err := fn(rel, info, func() (io.ReadCloser, error) { return os.Open(full) }); err != nil {
return err
}
}
if !info.IsDir() {
return nil
}
entries, err := os.ReadDir(full)
if err != nil {
return err
}
for _, e := range entries {
if err := walkFollow(src, filepath.Join(rel, e.Name()), fn); err != nil {
EItanya and others added 7 commits May 26, 2026 15:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
The Go-native copyTree/walkFollow added in the previous commit went
further than necessary. The src + dst paths are both controlled by us
(mktemp dir + a cleaned/validated subPath under a freshly-cloned repo),
and they are passed to cp as argv entries — no shell, no metacharacter
risk. cp -rL is the same primitive the old script used, just invoked
the safe way.

What stays native:
- tar extraction in oci.go (path-escape and symlink-escape rejection
  is real security logic that "tar xf" does not provide)
- docker config merging in docker.go (pure JSON manipulation, simpler
  than shelling to jq)
- git, ssh-keyscan invocations: argv-vector exec.Command

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Audit of where user-controlled CRD fields end up in subprocess argv
turned up two defense-in-depth gaps:

* `git checkout <ref>` had no `--` separator. If a malicious Agent set
  the Ref to a string starting with `-`, git would parse it as a flag.
  isCommitSHA already restricts the commit branch to 40 hex chars, but
  the separator costs nothing.
* The skill directory name (Skills.GitRefs[].Name, or derived from URL
  / Path / OCI ref) was used verbatim as the leaf of /skills/<name>.
  `Name: "../etc"` would have escaped the volume. Now restricted to
  `[A-Za-z0-9._-]+` and ".", ".." rejected explicitly.

Also remove the unused context.Context parameter that the new Run
entrypoint took but never consulted.

Audit notes (no changes needed):
* OCI image string is parsed by go-containerregistry, not a shell
* ssh-keyscan host/port is already regex-filtered to [A-Za-z0-9.-]
* dockerconfigjson secret names are K8s DNS-labels by API contract
* subPath is already validated (no abs, no ..) plus defense-in-depth
  in applySubPath at runtime

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
- docker.go: use maps.Copy instead of manual loop (modernize lint)
- git.go: use slices.Contains in hasDotDot (modernize lint)
- oci.go: drop deprecated tar.TypeRegA alias (SA1019)
- e2e: assert skills-init via ConfigMap JSON instead of legacy script body

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
return fmt.Errorf("skill name is empty")
}
if name == "." || name == ".." {
return fmt.Errorf("skill name %q is reserved", name)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: are . and .. a name? :)

peterj
peterj previously approved these changes May 27, 2026
Three new test surfaces that pin the data-only contract introduced by the
shell-to-Go rewrite:

- validateSkillName: rejection battery covering shell metas, traversal,
  newlines, null bytes, globs, brace expansion, unicode lookalikes.
- prepareSkillsInitConfig: explicit-Name injection, OCI-derived name
  injection, subPath traversal, and a positive test confirming that
  malicious URL/Ref strings flow through verbatim (argv-only, never a
  shell).
- skillsinit package: tar safeJoin/extractTar reject path traversal,
  absolute symlinks, and escaping relative symlinks; applySubPath
  rejects traversal and non-dir targets; hasDotDot covered directly.

Signed-off-by: Eitan Yarmush <eitan.yarmush@solo.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Transition skills-init container to golang

3 participants