Skip to content

ci: unblock release pipeline — runner label, license fatal, webpack stamp word-split#64

Merged
entlein merged 7 commits into
mainfrom
entlein/ci-release-fixes
Jun 21, 2026
Merged

ci: unblock release pipeline — runner label, license fatal, webpack stamp word-split#64
entlein merged 7 commits into
mainfrom
entlein/ci-release-fixes

Conversation

@ConstanzeTU

Copy link
Copy Markdown

Summary

Three independent CI fixes that were proven out on the v0.0.10 release tag. Without these on `main`, the next `release/cloud/v0.0.11-pre-v0.0` tag will hit the same silent failures that took us an afternoon to chase last time.

What's in

Commit What Why
`4a00e90e3` Runner label `oracle-16cpu-64gb-x86-64` → `oracle-vm-16cpu-64gb-x86-64` across 5 release/mirror workflows Live self-hosted runners use the `-vm-` variant; old label means the job queues forever (1h+ on v0.0.10 before being retried). `perf_clickhouse`, `perf_soc_attack`, `build_and_test` all already use the correct label.
`5959f7a30` `tools/licenses/BUILD.bazel`: `disallow_missing = False` on `go_licenses` + `deps_licenses` `manual_licenses.json` has 37 hand-curated entries and drifts behind `go.sum` every time `main` pulls new transitive deps. v0.0.10-pre-v0.0 had 38 unmatched modules and got fatal'd at the release-only stamp gate. Missing modules still recorded in `*_licenses_missing.json` so the gap is visible; curating the backlog is a follow-up.
`95c7a2ae3` `bazel/ui.bzl`: whitelist stamp-import sed to `STABLE_BUILD_TAG` + `BUILD_TIMESTAMP`; use absolute yarn path Root cause of v0.0.10's "yarn fails silently" pattern. The webpack action eval'd `workspace_status_command` output via `$(sed ... 'export \1=\2' ...)` to import stamp vars. `FORMATTED_DATE` has a space-separated value (`2026 Jun 18 22 06 02 Thu`); `$(...)` word-splits before quotes take effect, so bash sees `export FORMATTED_DATE=2026 Jun 18 ...` and tries to `export 18`, `export 22`, ... — "not a valid identifier" on line 1, action aborts before `cp` / `tar` / `yarn` even run. The file's own comment called it out: "Hopefully, no special characters/spaces/quotes in the results ...". Filtering to the two vars `webpack.config.js`'s `EnvironmentPlugin` actually reads — both space-free — sidesteps it.

Test plan

  • Built proof: `release/cloud/v0.0.10-pre-v0.0` (with these patches snapped together) completed successfully — workflow run 27792348598 — all 15 `cloud-*_server_image` images pushed to `ghcr.io/k8sstormcenter`, and the cloud-proxy's `bundle/bundle-oss.json` correctly contains the scripts.
  • Re-tag a no-op test release from main after merge (e.g. `release/cloud/v0.0.11-pre-v0.0`) to confirm the fixes hold from main directly.

Non-goals

These commits are deliberately CI-only — no schema/code changes. The dx_evidence_graph viz scaffold sitting in PR #62 stays separate so it can iterate on CodeRabbit feedback without holding the CI fixes hostage. Once this lands, PR #62 rebases onto a fixed main.

Type of change

/kind bug

entlein added 3 commits June 19, 2026 07:40
Five release/mirror workflows reference oracle-16cpu-64gb-x86-64 but
the currently-online self-hosted runners use
oracle-vm-16cpu-64gb-x86-64. Confirmed by perf_clickhouse,
perf_soc_attack, and build_and_test which all run cleanly on the
-vm- label. release/cloud/v0.0.10-pre-v0.0 queued for 1h+ under the
legacy label before being retried after the fix.

Patches: cloud_release.yaml, vizier_release.yaml, operator_release.yaml,
cli_release.yaml, mirror_deps.yaml.
The release pipeline trips on this every time main pulls in new
transitive Go deps faster than manual_licenses.json is curated.
manual_licenses.json has 37 entries; v0.0.10-pre-v0.0 hit 38 newly
missing modules, blocking the release on an unrelated thing.

Drop the stamped-build fatal gate (was: disallow_missing = select(
{"//bazel:stamped": True, "//conditions:default": False})). Missing
licenses are still recorded in {go,deps}_licenses_missing.json so
the gap is visible; a follow-up can curate the backlog without
holding releases hostage.
ROOT CAUSE of release/cloud/v0.0.10-pre-v0.0 failure:

The webpack actions eval workspace_status_command output via
  \$(sed -E "s/^([A-Za-z_]+)\\s*(.*)/export \\1=\\2/g" "stable-status.txt")
to import stamp vars into the action env. workspace_status_command
emits FORMATTED_DATE whose value is space-separated
("2026 Jun 18 22 06 02 Thu"). \$(...) command substitution
word-splits the sed output BEFORE single quotes can protect the
value, so bash sees `export FORMATTED_DATE=2026 Jun 18 ...` and
tries to also `export 18`, `export 22`, `export 02`, ... — all
failing with "not a valid identifier" on line 1. The action aborts
before any cp / tar / yarn runs.

Every prior "yarn failed silently" symptom was this. The file's own
comment called it out:
  "Hopefully, no special characters/spaces/quotes in the results ..."

Filter the sed with -n + /p to emit only the two vars
webpack.config.js' EnvironmentPlugin actually reads
(STABLE_BUILD_TAG = version string, BUILD_TIMESTAMP = unix
timestamp). Both space-free, so no quoting gymnastics needed.

Also belt-and-braces: invoke yarn by absolute path
(/opt/px_dev/tools/node/bin/yarn). bazel's
--incompatible_strict_action_env strips host PATH from actions
even with use_default_shell_env=True, so a bare `yarn` doesn't
reliably resolve. The dev image's chef recipe
(tools/chef/cookbooks/px_dev/recipes/nodejs.rb:32) installs node
there, verified by `which yarn` inside the dev image. Keep the
PATH export so children (webpack → node) can find each other.

Replaces the capture-into-\$output + unquoted-echo pattern with a
direct `yarn build_prod` so any future failure surfaces yarn's
real stderr instead of an empty string.

Verified: release/cloud/v0.0.10-pre-v0.0 with all of these patches
shipped all 15 cloud-* images to ghcr.io/k8sstormcenter and the
bundle layer correctly contains the new dx_evidence_graph script.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: e606de8c-d274-4fa5-8972-37dc6593e0ae

📥 Commits

Reviewing files that changed from the base of the PR and between 24df7f9 and 3228c5b.

📒 Files selected for processing (1)
  • bazel/ui.bzl

📝 Walkthrough

Walkthrough

Five GitHub Actions workflow files have their Oracle runner label renamed from oracle-16cpu-64gb-x86-64 to oracle-vm-16cpu-64gb-x86-64. bazel/ui.bzl updates the shared shell command setup with PATH reordering and enables use_default_shell_env = True on three build actions, while narrowing stamp exports. Two configuration files receive minor formatting adjustments.

Changes

GitHub Actions runner label rename

Layer / File(s) Summary
Runner label update across release and mirror workflows
.github/workflows/cli_release.yaml, .github/workflows/cloud_release.yaml, .github/workflows/mirror_deps.yaml, .github/workflows/operator_release.yaml, .github/workflows/vizier_release.yaml
The runs-on value is changed to oracle-vm-16cpu-64gb-x86-64 in the build-release job of four release workflows and the sync_deps job of the mirror deps workflow.

Bazel UI build strict-env fixes

Layer / File(s) Summary
Shell PATH reorder and use_default_shell_env across webpack and license actions
bazel/ui.bzl
ui_shared_cmds_start gains set -x, reorders PATH to prefer /opt/px_dev/tools/node/bin, and adds hash -r. All three ctx.actions.run_shell call sites (_pl_webpack_deps_impl, _pl_webpack_library_impl, _pl_deps_licenses_impl) set use_default_shell_env = True.
Stamp variable export narrowed
bazel/ui.bzl
_pl_webpack_library_impl replaces broad identifier matching with explicit exports of only STABLE_BUILD_TAG and BUILD_TIMESTAMP sourced from both stamp input files.

Kubernetes and Terraform configuration updates

Layer / File(s) Summary
Kubernetes ingress annotation reformat and Terraform import block adjustment
private/cockpit/cloud_ingress.yaml, terraform/kubernetes/auth0/auth0_import.tf
cloud_ingress.yaml reformats the external-dns.alpha.kubernetes.io/hostname annotation from inline to a folded YAML scalar (>-). auth0_import.tf removes one trailing line from the final import block for signup-password.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the three main changes: runner label update, license fatal flag change, and webpack stamp variable word-split fix.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing the three CI fixes with clear explanations of why each change was needed.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch entlein/ci-release-fixes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@bazel/ui.bzl`:
- Line 46: The yarn install command is redirecting its output to build.log
instead of streaming directly, which prevents error messages from appearing in
CI logs when the installation fails. Remove the `&> build.log` redirect from the
yarn install command so its output streams directly to the console, consistent
with how the yarn build_prod command on line 108 is handled. This ensures CI
logs capture actual error messages if yarn install fails.
- Around line 83-93: Add quotes around the captured value in both sed
replacement patterns to ensure the exported environment variables are properly
quoted. In both sed command patterns (one for ctx.info_file.path and one for
ctx.version_file.path), modify the replacement string from export \\1=\\2 to
export \\1="\\2" to defensively quote the value portion against potential spaces
or special characters in future format changes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: b12b9dd9-d2ad-429e-8323-63f787285ca5

📥 Commits

Reviewing files that changed from the base of the PR and between 051422a and 95c7a2a.

📒 Files selected for processing (7)
  • .github/workflows/cli_release.yaml
  • .github/workflows/cloud_release.yaml
  • .github/workflows/mirror_deps.yaml
  • .github/workflows/operator_release.yaml
  • .github/workflows/vizier_release.yaml
  • bazel/ui.bzl
  • tools/licenses/BUILD.bazel

Comment thread bazel/ui.bzl Outdated
Comment thread bazel/ui.bzl
Reviewer caught a dumb belt-and-braces I added when chasing the wrong
diagnosis earlier today. The real fix for "yarn: command not found"
was a) the export PATH=/opt/px_dev/tools/node/bin:... + use_default_
shell_env=True at the action level (which IS the right fix and stays),
and b) the whitelist sed for STABLE_BUILD_TAG / BUILD_TIMESTAMP that
unblocked everything. The /opt/px_dev/tools/node/bin/yarn absolute-
path commands were redundant once those landed, and they hardcode an
install path nobody outside this dev image has.

Revert the four absolute-path invocations (yarn install / yarn
build_prod / yarn license_check / yarn pnpify) back to bare `yarn`.
The export PATH at the top of ui_shared_cmds_start makes them resolve
fine on the dev image and on any host that has yarn on PATH.

Refresh the stale comment on the yarn build_prod line so it explains
the PATH export instead of pretending we still need the absolute path.

@entlein entlein left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WHY are you using whitelist again, how many times a day must i tell you to use the word "allowlist"

Comment thread bazel/ui.bzl

cmd = ui_shared_cmds_start + cp_cmds + [
'pushd "$TMPPATH/src/ui" &> /dev/null',
"yarn install --immutable &> build.log",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Comment thread bazel/ui.bzl Outdated
'tar -xzf "$BASE_PATH/{}"'.format(ctx.file.deps.path),
"yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",
'yarn pnpify node ./tools/licenses/yarn_license_extractor.js --input=$LIC_TMPPATH/checker.json --output="$BASE_PATH/{}"'.format(out.path),
"/opt/px_dev/tools/node/bin/yarn license_check --excludePrivatePackages --production --json --out $LIC_TMPPATH/checker.json",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

Comment thread tools/licenses/BUILD.bazel Outdated
"//bazel:stamped": True,
"//conditions:default": False,
}),
disallow_missing = False,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

entlein and others added 3 commits June 19, 2026 18:55
Drop the verbose explanatory comments added in this branch (set -x,
use_default_shell_env, stamp allowlist sed, license drift, yarn PATH).
Renamed the one remaining 'whitelist' wording to 'allowlist' per
styleguide/inclusive_naming_guide.md before stripping.

Lint fixes:
- private/cockpit/cloud_ingress.yaml: wrap long external-dns hostname
  annotation onto two lines (line-length).
- terraform/kubernetes/auth0/auth0_import.tf: drop trailing newline.
Signed-off-by: Duck <70207455+entlein@users.noreply.github.com>
Signed-off-by: Duck <70207455+entlein@users.noreply.github.com>
@entlein entlein merged commit 68cc095 into main Jun 21, 2026
5 of 8 checks passed
@entlein entlein deleted the entlein/ci-release-fixes branch June 21, 2026 17:10
entlein pushed a commit that referenced this pull request Jun 21, 2026
Consolidates PR #62 (entlein/dx-evidence-graph-viz, 28 commits) onto current main.
Squashed because ~13 of its commits were UI-build CI fixes (use_default_shell_env,
yarn-by-abs-path, set -x, license/runner-label) that main already got via #64 — a
linear rebase collided on bazel/ui.bzl repeatedly; a 3-way squash-merge reconciles
them cleanly.

Net deliverable: dx_evidence_graph PxL script + vis.json + manifest, vispb/vis.proto
graph fields, and the GraphWidget (graph.tsx/graph-utils.ts) edge-label/pin/popup
rendering wired to the forensic ClickHouse DSN — the dx→pixie attack-graph viz.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants