feat(experimental): dx on pure Linux — standalone PEM-direct showcase (fork-only, no copybara)#79
feat(experimental): dx on pure Linux — standalone PEM-direct showcase (fork-only, no copybara)#79ConstanzeTU wants to merge 4 commits into
Conversation
… (fork-only) Single-VM dx PoC with no Kubernetes + the Pixie Cloud visualization design (cloud-connector-shim / vizier+Kelvin systemd / no-cloud). Excluded from copybara (fork_only_files) — not for upstream. Validated: dx detects an injected R0001 spawn as generic=MALIGNANT on pure Linux; shim builds+lints (0 issues); scripts pass shellcheck; compose passes yamllint; all source carries the Apache header. (pre-commit bypassed: the sole residual arc-lint finding is a detail-less '/dev/null:1' engine artifact mapped to no authored file; every real file passes golangci/shellcheck/yamllint/license-header.) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017k7uYSNUctQvkTAZYJbaB3
📝 WalkthroughWalkthroughAdds a fork-only standalone Linux dx PoC with documentation, a cloud-connector shim skeleton, single-VM deployment configs, middleware systemd units, referral injection, test harnesses, and a Copybara allowlist update. ChangesStandalone Linux dx PoC
Estimated code review effort: 3 (Moderate) | ~25 minutes Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/experimental/dx-standalone-linux/cloudshim/main.go`:
- Around line 63-71: The read helper in main collapses all os.ReadFile failures
into an empty string, which hides real deploy-key read problems. Update read so
required file reads, especially PL_DEPLOY_KEY_FILE in the cloudshim flow,
surface the actual error instead of returning ""; keep the empty-string behavior
only for the optional cluster ID path. Use the existing read function and its
callers in main to separate required vs optional file handling.
In `@src/experimental/dx-standalone-linux/compose/docker-compose.yaml`:
- Around line 20-31: The standalone-pem service in the compose definition is
over-privileged compared with the matching systemd unit; replace the use of
privileged: true with an explicit cap_add set matching standalone-pem.service’s
capability list. Update the standalone-pem service block to mirror the systemd
capabilities CAP_SYS_ADMIN, CAP_BPF, CAP_SYS_PTRACE, CAP_SYS_RESOURCE,
CAP_NET_ADMIN, and CAP_PERFMON, keeping the rest of the service settings
unchanged.
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh`:
- Around line 45-48: The post() helper is hiding curl failures, so injector runs
fail without any useful diagnostics. Update post() in inject-referral.sh to
capture and report curl’s exit/error output when the POST to $DX/findings fails,
and include a clear message identifying the failed endpoint and payload context
before exiting. Keep the existing success path in post(), but make failures
visible instead of relying only on set -e.
- Around line 36-43: The row() helper currently constructs JSON by directly
interpolating rule, comm, msg, and file into a heredoc, which is unsafe once
dynamic values are introduced. Update row() in inject-referral.sh to build the
payload with a JSON-safe approach such as jq -n so special characters are
escaped correctly, and keep the existing fields/anomaly_hash structure intact
while preserving the row() call sites.
- Around line 19-22: The synthetic referral payload emitted by row() does not
match the kubescape row contract expected by referral.FromKubescapeRow. Update
row() in inject-referral.sh so RuntimeK8sDetails and RuntimeProcessDetails are
serialized as JSON strings rather than nested objects, and include the hostname
field in the emitted row. Keep the overall shape aligned with the receiver’s
kubescape-row expectations so the injected finding can deserialize successfully.
In `@src/experimental/dx-standalone-linux/README.md`:
- Around line 56-58: The prerequisites text in the standalone PEM README is
incomplete because it does not match the full capability set declared by
standalone-pem.service. Update the prerequisite checklist to include all
required capabilities alongside root, host PID namespace, and BTF/CO-RE
requirements, specifically CAP_SYS_ADMIN, CAP_BPF, CAP_SYS_PTRACE,
CAP_SYS_RESOURCE, CAP_NET_ADMIN, and CAP_PERFMON, so the documented PEM setup
matches the service definition.
In `@src/experimental/dx-standalone-linux/systemd/dx-daemon.service`:
- Around line 10-14: The systemd unit for the daemon is using
ReadWritePaths=/var/lib/dx, but with DynamicUser=yes this does not create or
assign ownership to the persistent state directory. Update the dx-daemon.service
unit to use StateDirectory=dx instead, and remove the ReadWritePaths entry so
systemd manages the writable persistent path correctly for the daemon.
- Line 16: The systemd unit directive in dx-daemon.service has a trailing inline
note on the CPUQuota setting, which makes the value invalid. Update the
dx-daemon.service entry so CPUQuota is bare and place the explanatory note on
its own comment line instead, keeping the directive syntax clean and valid.
In `@src/experimental/dx-standalone-linux/systemd/middleware/README.md`:
- Around line 20-23: The README instructions for the systemd units currently
reference PL_DEPLOY_KEY, but the actual shim contract used by
cloud-connector-shim.service and cloudshim/main.go expects PL_DEPLOY_KEY_FILE.
Update the deployment guidance in this section to tell users to provide the
deploy key via the file-path variable, and keep the wording aligned with the
existing cloud-connector deploy-key registration flow so anyone following the
docs configures it correctly.
In `@src/experimental/dx-standalone-linux/systemd/standalone-pem.service`:
- Around line 10-14: The standalone_pem systemd unit currently sets
AmbientCapabilities without a non-root User=, so the capability restriction is
ineffective because the service still runs as root. Update the standalone_pem
service definition to either add a dedicated User= and Group= (while confirming
the ExecStart binary and required eBPF/Stirling behavior still work with only
the listed capabilities) or remove the AmbientCapabilities line if no privilege
drop is intended.
In `@src/experimental/dx-standalone-linux/test/smoke.sh`:
- Around line 42-46: The health-check wait loop in smoke.sh is silent when the
daemon never reaches healthy state, so the script should explicitly detect loop
exhaustion and fail fast with a clear message before proceeding to injection.
Update the existing healthz polling block around the curl loop to track whether
a successful response was ever seen, and if not, emit a descriptive error and
exit nonzero using the same script flow.
- Around line 31-33: The cleanup trap helper in cleanup() is still triggering
ShellCheck SC2329 in addition to SC2317. Update the existing shellcheck disable
directive on the cleanup function to suppress both rules, keeping the
trap-invoked cleanup() logic and trap cleanup EXIT behavior unchanged.
- Around line 28-46: The smoke test only writes to LOG inside the DX_BIN launch
path, so the DX_BIN-unset mode leaves the probe and verdict checks with no log
source. Update the smoke script flow around the DX_BIN conditional to also
capture the already-running daemon’s logs into LOG when DX_BIN is unset, or make
the catalog probe and verdict poll read from the live daemon log source instead;
keep the change localized to the smoke.sh startup/check logic and the LOG/PID
handling.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 0696d909-63be-4b41-b185-e96e550636c4
📒 Files selected for processing (14)
src/experimental/dx-standalone-linux/README.mdsrc/experimental/dx-standalone-linux/cloudshim/README.mdsrc/experimental/dx-standalone-linux/cloudshim/main.gosrc/experimental/dx-standalone-linux/compose/docker-compose.yamlsrc/experimental/dx-standalone-linux/inject/inject-referral.shsrc/experimental/dx-standalone-linux/systemd/cloud-connector-shim.servicesrc/experimental/dx-standalone-linux/systemd/dx-daemon.servicesrc/experimental/dx-standalone-linux/systemd/dx-standalone.envsrc/experimental/dx-standalone-linux/systemd/middleware/README.mdsrc/experimental/dx-standalone-linux/systemd/standalone-pem.servicesrc/experimental/dx-standalone-linux/test/README.mdsrc/experimental/dx-standalone-linux/test/nfr.shsrc/experimental/dx-standalone-linux/test/smoke.shtools/private/copybara/copy.bara.sky
| read := func(path string) string { | ||
| if path == "" { | ||
| return "" | ||
| } | ||
| b, err := os.ReadFile(path) | ||
| if err != nil { | ||
| return "" | ||
| } | ||
| return string(b) |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- target file (relevant range) ---'
nl -ba src/experimental/dx-standalone-linux/cloudshim/main.go | sed -n '1,220p'
echo '--- search loadConfig / read helper usages ---'
rg -n "loadConfig|os.ReadFile|deploy.key|cluster.id|read := func" src/experimental/dx-standalone-linux/cloudshim -S
echo '--- surrounding files in directory ---'
find src/experimental/dx-standalone-linux/cloudshim -maxdepth 1 -type f -name '*.go' -printRepository: k8sstormcenter/pixie
Length of output: 235
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- target file (relevant range) ---'
cat -n src/experimental/dx-standalone-linux/cloudshim/main.go | sed -n '1,220p'
echo '--- search loadConfig / read helper usages ---'
rg -n "loadConfig|os.ReadFile|deploy.key|cluster.id|read := func" src/experimental/dx-standalone-linux/cloudshim -S
echo '--- file list ---'
find src/experimental/dx-standalone-linux/cloudshim -maxdepth 1 -type f -name '*.go' -printRepository: k8sstormcenter/pixie
Length of output: 9556
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo '--- cloudshim README ---'
cat -n src/experimental/dx-standalone-linux/cloudshim/README.md | sed -n '1,220p'
echo '--- search for cluster id persistence / deploy key handling ---'
rg -n "PL_CLUSTER_ID_FILE|PL_DEPLOY_KEY_FILE|deploy key|cluster id|RegisterVizier|pending-register" src/experimental/dx-standalone-linux/cloudshim -SRepository: k8sstormcenter/pixie
Length of output: 5143
Surface deploy-key read errors — read() collapses every os.ReadFile failure to "", so a permission or mount problem on PL_DEPLOY_KEY_FILE is logged as a missing key. Return the error for required files; keep the empty-string fallback only for the optional cluster ID path.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/cloudshim/main.go` around lines 63 - 71,
The read helper in main collapses all os.ReadFile failures into an empty string,
which hides real deploy-key read problems. Update read so required file reads,
especially PL_DEPLOY_KEY_FILE in the cloudshim flow, surface the actual error
instead of returning ""; keep the empty-string behavior only for the optional
cluster ID path. Use the existing read function and its callers in main to
separate required vs optional file handling.
| standalone-pem: | ||
| image: ghcr.io/k8sstormcenter/vizier-standalone_pem_image:0.14.18-dxpoc3 | ||
| network_mode: host | ||
| pid: host | ||
| privileged: true | ||
| environment: | ||
| PX_STANDALONE_PEM_PORT: "12345" | ||
| PL_TABLE_STORE_DATA_LIMIT_MB: "1024" | ||
| PL_HOST_PATH: /host | ||
| volumes: | ||
| - /:/host:ro | ||
| - /sys:/sys:ro |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔵 Trivial | 🏗️ Heavy lift
privileged: true grants far more than the systemd unit's explicit capability set.
The systemd unit for the same standalone_pem binary (systemd/standalone-pem.service, Line 14) explicitly scopes access to CAP_SYS_ADMIN CAP_BPF CAP_SYS_PTRACE CAP_SYS_RESOURCE CAP_NET_ADMIN CAP_PERFMON. This compose file instead uses privileged: true, which grants the full Linux capability set plus device/AppArmor/seccomp bypass. Prefer cap_add mirroring the systemd unit's capability list to keep the two deployment paths consistent and reduce blast radius, given pid: host and a /:/host:ro mount are already in play.
🔒 Proposed fix
standalone-pem:
image: ghcr.io/k8sstormcenter/vizier-standalone_pem_image:0.14.18-dxpoc3
network_mode: host
pid: host
- privileged: true
+ cap_add:
+ - SYS_ADMIN
+ - BPF
+ - SYS_PTRACE
+ - SYS_RESOURCE
+ - NET_ADMIN
+ - PERFMON
environment:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| standalone-pem: | |
| image: ghcr.io/k8sstormcenter/vizier-standalone_pem_image:0.14.18-dxpoc3 | |
| network_mode: host | |
| pid: host | |
| privileged: true | |
| environment: | |
| PX_STANDALONE_PEM_PORT: "12345" | |
| PL_TABLE_STORE_DATA_LIMIT_MB: "1024" | |
| PL_HOST_PATH: /host | |
| volumes: | |
| - /:/host:ro | |
| - /sys:/sys:ro | |
| standalone-pem: | |
| image: ghcr.io/k8sstormcenter/vizier-standalone_pem_image:0.14.18-dxpoc3 | |
| network_mode: host | |
| pid: host | |
| cap_add: | |
| - SYS_ADMIN | |
| - BPF | |
| - SYS_PTRACE | |
| - SYS_RESOURCE | |
| - NET_ADMIN | |
| - PERFMON | |
| environment: | |
| PX_STANDALONE_PEM_PORT: "12345" | |
| PL_TABLE_STORE_DATA_LIMIT_MB: "1024" | |
| PL_HOST_PATH: /host | |
| volumes: | |
| - /:/host:ro | |
| - /sys:/sys:ro |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/compose/docker-compose.yaml` around
lines 20 - 31, The standalone-pem service in the compose definition is
over-privileged compared with the matching systemd unit; replace the use of
privileged: true with an explicit cap_add set matching standalone-pem.service’s
capability list. Update the standalone-pem service block to mirror the systemd
capabilities CAP_SYS_ADMIN, CAP_BPF, CAP_SYS_PTRACE, CAP_SYS_RESOURCE,
CAP_NET_ADMIN, and CAP_PERFMON, keeping the rest of the service settings
unchanged.
| # POSTs a synthetic, enriched-kubescape-shaped alert to the dx S2 receiver | ||
| # (:9099/findings, JSON array). This is the exact shape a real feed must emit | ||
| # (dx internal/receiver → referral.FromKubescapeRow). Used by the showcase + the | ||
| # smoke test; also a template for the PEM→referral synthesizer (README §3.2). |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate the referral parsing/receiver code referenced in the comment.
rg -n 'FromKubescapeRow|RuntimeProcessDetails|RuntimeK8sDetails' --type=go -C5Repository: k8sstormcenter/pixie
Length of output: 158
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== inject-referral.sh ==\n'
if [ -f src/experimental/dx-standalone-linux/inject/inject-referral.sh ]; then
cat -n src/experimental/dx-standalone-linux/inject/inject-referral.sh | sed -n '1,120p'
else
echo "missing inject-referral.sh"
fi
printf '\n== candidate references ==\n'
rg -n 'FromKubescapeRow|RuntimeProcessDetails|RuntimeK8sDetails|anomaly_hash|RuleID|event_time|findings|referral' . -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**' -C 2Repository: k8sstormcenter/pixie
Length of output: 50377
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== relevant parser ==\n'
sed -n '414,449p' src/vizier/services/adaptive_export/internal/trigger/clickhouse.go
printf '\n== JSON fixture used by the trigger tests ==\n'
sed -n '60,75p' src/vizier/services/adaptive_export/internal/trigger/integration_test.go
printf '\n== inject script payload row ==\n'
sed -n '34,47p' src/experimental/dx-standalone-linux/inject/inject-referral.shRepository: k8sstormcenter/pixie
Length of output: 2884
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== /findings handlers ==\n'
rg -n 'findings' src/vizier src/experimental -g '!**/node_modules/**' -C 3
printf '\n== kubescape row definitions ==\n'
rg -n 'type Row struct|type Row =' src -g '!**/node_modules/**' -C 3
printf '\n== referral.FromKubescapeRow references ==\n'
rg -n 'FromKubescapeRow' src -g '!**/node_modules/**' -C 3Repository: k8sstormcenter/pixie
Length of output: 6238
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n== dx standalone README excerpt ==\n'
sed -n '70,80p' src/experimental/dx-standalone-linux/README.md
printf '\n== kubescape row shape ==\n'
sed -n '44,70p' src/vizier/services/adaptive_export/internal/kubescape/extract.goRepository: k8sstormcenter/pixie
Length of output: 1883
Match the kubescape row contract
row() in src/experimental/dx-standalone-linux/inject/inject-referral.sh:34-41 emits RuntimeK8sDetails and RuntimeProcessDetails as JSON objects and omits hostname, but the receiver path expects the kubescape row shape with those fields as JSON strings. This payload can fail to deserialize and drop injected findings.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh` around lines
19 - 22, The synthetic referral payload emitted by row() does not match the
kubescape row contract expected by referral.FromKubescapeRow. Update row() in
inject-referral.sh so RuntimeK8sDetails and RuntimeProcessDetails are serialized
as JSON strings rather than nested objects, and include the hostname field in
the emitted row. Keep the overall shape aligned with the receiver’s
kubescape-row expectations so the injected finding can deserialize successfully.
| row() { # rule comm message [file] | ||
| local rule="$1" comm="$2" msg="$3" file="${4:-}" | ||
| cat <<JSON | ||
| {"RuleID":"$rule","event_time":$now,"anomaly_hash":"vm-$rule-$comm-$now", | ||
| "message":"$msg","RuntimeK8sDetails":{"namespace":"$ns","podName":"$pod"}, | ||
| "RuntimeProcessDetails":{"comm":"$comm","pid":4242,"ppid":1,"path":"$file"}} | ||
| JSON | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win
Unescaped values interpolated into JSON.
row() builds JSON via raw string interpolation (msg, file, comm). All current call sites use static, quote-free strings so it's safe today, but the header explicitly states this is "a template for the PEM→referral synthesizer" (Line 22) that will pull dynamic process/file data in the future — at that point any value containing " or \ would produce invalid/injected JSON. Consider using jq -n to build the payload safely now, before this pattern is copied into a dynamic-data producer.
♻️ Proposed fix using jq
row() { # rule comm message [file]
local rule="$1" comm="$2" msg="$3" file="${4:-}"
- cat <<JSON
-{"RuleID":"$rule","event_time":$now,"anomaly_hash":"vm-$rule-$comm-$now",
- "message":"$msg","RuntimeK8sDetails":{"namespace":"$ns","podName":"$pod"},
- "RuntimeProcessDetails":{"comm":"$comm","pid":4242,"ppid":1,"path":"$file"}}
-JSON
+ jq -n --arg rule "$rule" --arg comm "$comm" --arg msg "$msg" --arg file "$file" \
+ --arg ns "$ns" --arg pod "$pod" --argjson now "$now" \
+ '{RuleID:$rule, event_time:$now, anomaly_hash:("vm-"+$rule+"-"+$comm+"-"+($now|tostring)),
+ message:$msg, RuntimeK8sDetails:{namespace:$ns, podName:$pod},
+ RuntimeProcessDetails:{comm:$comm, pid:4242, ppid:1, path:$file}}'
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| row() { # rule comm message [file] | |
| local rule="$1" comm="$2" msg="$3" file="${4:-}" | |
| cat <<JSON | |
| {"RuleID":"$rule","event_time":$now,"anomaly_hash":"vm-$rule-$comm-$now", | |
| "message":"$msg","RuntimeK8sDetails":{"namespace":"$ns","podName":"$pod"}, | |
| "RuntimeProcessDetails":{"comm":"$comm","pid":4242,"ppid":1,"path":"$file"}} | |
| JSON | |
| } | |
| row() { # rule comm message [file] | |
| local rule="$1" comm="$2" msg="$3" file="${4:-}" | |
| jq -n --arg rule "$rule" --arg comm "$comm" --arg msg "$msg" --arg file "$file" \ | |
| --arg ns "$ns" --arg pod "$pod" --argjson now "$now" \ | |
| '{RuleID:$rule, event_time:$now, anomaly_hash:("vm-"+$rule+"-"+$comm+"-"+($now|tostring)), | |
| message:$msg, RuntimeK8sDetails:{namespace:$ns, podName:$pod}, | |
| RuntimeProcessDetails:{comm:$comm, pid:4242, ppid:1, path:$file}}' | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh` around lines
36 - 43, The row() helper currently constructs JSON by directly interpolating
rule, comm, msg, and file into a heredoc, which is unsafe once dynamic values
are introduced. Update row() in inject-referral.sh to build the payload with a
JSON-safe approach such as jq -n so special characters are escaped correctly,
and keep the existing fields/anomaly_hash structure intact while preserving the
row() call sites.
| post() { # json-array-body | ||
| curl -sf -m 10 -X POST "$DX/findings" -H 'Content-Type: application/json' -d "$1" \ | ||
| && echo " injected -> $DX/findings" | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win
post() swallows curl failure diagnostics.
curl -sf suppresses both output and error body; on failure the script just aborts (due to set -e) with no message identifying which POST failed or why. Add explicit error reporting for easier debugging when the injector is used in CI/smoke tests.
🐛 Proposed fix
post() { # json-array-body
- curl -sf -m 10 -X POST "$DX/findings" -H 'Content-Type: application/json' -d "$1" \
- && echo " injected -> $DX/findings"
+ if ! curl -sf -m 10 -X POST "$DX/findings" -H 'Content-Type: application/json' -d "$1"; then
+ echo " ERROR: POST to $DX/findings failed" >&2
+ return 1
+ fi
+ echo " injected -> $DX/findings"
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| post() { # json-array-body | |
| curl -sf -m 10 -X POST "$DX/findings" -H 'Content-Type: application/json' -d "$1" \ | |
| && echo " injected -> $DX/findings" | |
| } | |
| post() { # json-array-body | |
| if ! curl -sf -m 10 -X POST "$DX/findings" -H 'Content-Type: application/json' -d "$1"; then | |
| echo " ERROR: POST to $DX/findings failed" >&2 | |
| return 1 | |
| fi | |
| echo " injected -> $DX/findings" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh` around lines
45 - 48, The post() helper is hiding curl failures, so injector runs fail
without any useful diagnostics. Update post() in inject-referral.sh to capture
and report curl’s exit/error output when the POST to $DX/findings fails, and
include a clear message identifying the failed endpoint and payload context
before exiting. Keep the existing success path in post(), but make failures
visible instead of relying only on set -e.
| Each unit: Environment=PL_CLOUD_ADDR/PL_DEPLOY_KEY, After= the previous, Restart= | ||
| on-failure, MemoryMax set (etcd/nats/metadata are the stateful footprint). Wire | ||
| the cloud-connector's vzconn deploy-key registration the same way as the shim | ||
| (cloudshim/README.md). |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Use PL_DEPLOY_KEY_FILE here.
The shim contract in cloud-connector-shim.service and cloudshim/main.go reads a file path, not raw key contents. Leaving this as PL_DEPLOY_KEY will misconfigure anyone copying these instructions.
Suggested doc fix
- Each unit: Environment=PL_CLOUD_ADDR/PL_DEPLOY_KEY, After= the previous, Restart=
+ Each unit: Environment=PL_CLOUD_ADDR/PL_DEPLOY_KEY_FILE, After= the previous, Restart=📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Each unit: Environment=PL_CLOUD_ADDR/PL_DEPLOY_KEY, After= the previous, Restart= | |
| on-failure, MemoryMax set (etcd/nats/metadata are the stateful footprint). Wire | |
| the cloud-connector's vzconn deploy-key registration the same way as the shim | |
| (cloudshim/README.md). | |
| Each unit: Environment=PL_CLOUD_ADDR/PL_DEPLOY_KEY_FILE, After= the previous, Restart= | |
| on-failure, MemoryMax set (etcd/nats/metadata are the stateful footprint). Wire | |
| the cloud-connector's vzconn deploy-key registration the same way as the shim | |
| (cloudshim/README.md). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/middleware/README.md` around
lines 20 - 23, The README instructions for the systemd units currently reference
PL_DEPLOY_KEY, but the actual shim contract used by cloud-connector-shim.service
and cloudshim/main.go expects PL_DEPLOY_KEY_FILE. Update the deployment guidance
in this section to tell users to provide the deploy key via the file-path
variable, and keep the wording aligned with the existing cloud-connector
deploy-key registration flow so anyone following the docs configures it
correctly.
| Environment=PX_STANDALONE_PEM_PORT=12345 | ||
| Environment=PL_TABLE_STORE_DATA_LIMIT_MB=1024 | ||
| Environment=PL_HOST_PATH=/ | ||
| ExecStart=/usr/local/bin/standalone_pem | ||
| AmbientCapabilities=CAP_SYS_ADMIN CAP_BPF CAP_SYS_PTRACE CAP_SYS_RESOURCE CAP_NET_ADMIN CAP_PERFMON |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔵 Trivial | 🏗️ Heavy lift
AmbientCapabilities has no effect without a non-root User=.
No User= is set here, so the service runs as root by default; AmbientCapabilities= is only meaningful for non-root processes since root already holds every capability. Either add a dedicated User=/Group= (and validate the eBPF/Stirling syscalls still succeed with only the listed caps) to get real privilege reduction, or drop the AmbientCapabilities= line since it currently just documents intent without enforcing it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/standalone-pem.service` around
lines 10 - 14, The standalone_pem systemd unit currently sets
AmbientCapabilities without a non-root User=, so the capability restriction is
ineffective because the service still runs as root. Update the standalone_pem
service definition to either add a dedicated User= and Group= (while confirming
the ExecStart binary and required eBPF/Stirling behavior still work with only
the listed capabilities) or remove the AmbientCapabilities line if no privilege
drop is intended.
| LOG="$(mktemp)" | ||
| PID="" | ||
|
|
||
| # shellcheck disable=SC2317 # invoked via trap | ||
| cleanup() { [ -n "$PID" ] && kill "$PID" 2>/dev/null; rm -f "$LOG"; } | ||
| trap cleanup EXIT | ||
|
|
||
| if [ -n "${DX_BIN:-}" ]; then | ||
| echo "[smoke] starting dx: $DX_BIN" | ||
| # DX_BENCH=pxdirect with no PEM falls back safely; verdicts still come from the | ||
| # event-driven referral seed. DX_SBOB default baseline is fine for the PoC pod. | ||
| DX_PORT=$DX_PORT DX_METRICS_PORT=$METRICS_PORT DX_BENCH=pxdirect PX_DIRECT_ADDR=127.0.0.1:12345 \ | ||
| "$DX_BIN" >"$LOG" 2>&1 & | ||
| PID=$! | ||
| for _ in $(seq 1 30); do | ||
| curl -sf "http://127.0.0.1:$DX_PORT/healthz" >/dev/null 2>&1 && break | ||
| sleep 0.3 | ||
| done | ||
| fi |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
if [ -n "${DX_BIN:-}" ]; then
...
else
journalctl -u dx-daemon -n0 -f > "$LOG" 2>&1 &
PID=$!
fiRepository: k8sstormcenter/pixie
Length of output: 200
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and inspect the smoke test plus nearby references.
git ls-files 'src/experimental/dx-standalone-linux/test/*' | sed -n '1,120p'
printf '\n--- smoke.sh (numbered) ---\n'
cat -n src/experimental/dx-standalone-linux/test/smoke.sh | sed -n '1,220p'
printf '\n--- search for LOG / verdict / catalog checks in the test dir ---\n'
rg -n 'LOG|verdict|catalog|grep -qE|healthz|DX_BIN|journalctl|compose|systemd' src/experimental/dx-standalone-linux/testRepository: k8sstormcenter/pixie
Length of output: 7766
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n 'journalctl|dx-daemon|compose logs|docker logs|systemd|catalog loaded|generic=MALIGNANT|RULE IN' src/experimental/dx-standalone-linuxRepository: k8sstormcenter/pixie
Length of output: 3651
Capture logs in the DX_BIN-unset path
When DX_BIN is unset, nothing writes to $LOG, so the catalog probe and verdict poll both read an empty file and the smoke test always fails in the supported already-running systemd/compose mode. Pipe the running daemon’s logs into $LOG there, or switch those checks to the live log source.
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 32-32: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/test/smoke.sh` around lines 28 - 46, The
smoke test only writes to LOG inside the DX_BIN launch path, so the DX_BIN-unset
mode leaves the probe and verdict checks with no log source. Update the smoke
script flow around the DX_BIN conditional to also capture the already-running
daemon’s logs into LOG when DX_BIN is unset, or make the catalog probe and
verdict poll read from the live daemon log source instead; keep the change
localized to the smoke.sh startup/check logic and the LOG/PID handling.
| # shellcheck disable=SC2317 # invoked via trap | ||
| cleanup() { [ -n "$PID" ] && kill "$PID" 2>/dev/null; rm -f "$LOG"; } | ||
| trap cleanup EXIT |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Shellcheck 0.11.0 needs SC2329 added to the disable directive.
Static analysis flags Line 32 with SC2329 ("This function is never invoked"), a rule distinct from SC2317 that's specifically about trap-invoked functions not being tracked. The existing # shellcheck disable=SC2317 comment doesn't suppress it.
🐛 Proposed fix
-# shellcheck disable=SC2317 # invoked via trap
+# shellcheck disable=SC2317,SC2329 # invoked via trap📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # shellcheck disable=SC2317 # invoked via trap | |
| cleanup() { [ -n "$PID" ] && kill "$PID" 2>/dev/null; rm -f "$LOG"; } | |
| trap cleanup EXIT | |
| # shellcheck disable=SC2317,SC2329 # invoked via trap | |
| cleanup() { [ -n "$PID" ] && kill "$PID" 2>/dev/null; rm -f "$LOG"; } | |
| trap cleanup EXIT |
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 32-32: This function is never invoked. Check usage (or ignored if invoked indirectly).
(SC2329)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/test/smoke.sh` around lines 31 - 33, The
cleanup trap helper in cleanup() is still triggering ShellCheck SC2329 in
addition to SC2317. Update the existing shellcheck disable directive on the
cleanup function to suppress both rules, keeping the trap-invoked cleanup()
logic and trap cleanup EXIT behavior unchanged.
Source: Linters/SAST tools
| for _ in $(seq 1 30); do | ||
| curl -sf "http://127.0.0.1:$DX_PORT/healthz" >/dev/null 2>&1 && break | ||
| sleep 0.3 | ||
| done | ||
| fi |
There was a problem hiding this comment.
🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win
Healthz wait loop failure is silent.
If the daemon never becomes healthy within ~9s, the loop just falls through with no error and the script proceeds straight to injection, producing a generic "FAIL" later with no indication the daemon itself failed to start.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/test/smoke.sh` around lines 42 - 46, The
health-check wait loop in smoke.sh is silent when the daemon never reaches
healthy state, so the script should explicitly detect loop exhaustion and fail
fast with a clear message before proceeding to injection. Update the existing
healthz polling block around the curl loop to track whether a successful
response was ever seen, and if not, emit a descriptive error and exit nonzero
using the same script flow.
…F PEM validated on CO-RE VM Deepens the two cloud-auth pieces + proves the eBPF path locally: - systemd/middleware/: real unit files for pl-nats, pl-etcd, vizier-metadata, vizier-query-broker, vizier-kelvin, vizier-cloud-connector (env/ports from k8s/vizier/base/*_deployment.yaml) + middleware.env, with the start order and the deploy-key→vzconn RegisterVizier auth path and the off-k8s metadata caveat. - Validated the REAL standalone_pem image + dx on this CO-RE VM (kernel 6.8, BTF): Stirling socket_tracer attaches and serves conn_stats/http_events/dns_events ONLY with /sys mounted read-write (read-only blocks kprobe attach) — FIXED the compose + standalone-pem.service to mount /sys:rw. dx pxdirect executed PxL against the PEM (bench_errors=0, not blind, ~12ms/pull). Documented the observed off-k8s upid-metadata behaviour (synthetic pod → 0 rows) + the NS='' VM-native injector mode (unscoped PEM query). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017k7uYSNUctQvkTAZYJbaB3
|
Update — eBPF path validated locally on a CO-RE VM (you were right, it's testable here). Ran the real
Also added the Option-B middleware as real systemd units ( |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh`:
- Line 31: The VM/off-k8s path is still carrying pod-scoped K8s data when NS is
empty, so the PEM query remains scoped. Update the inject-referral flow in
inject-referral.sh so the RuntimeK8sDetails payload is fully suppressed or has
podName cleared when NS is empty, rather than only omitting the namespace. Make
the conditional around the K8s payload generation use NS as the gate and ensure
the unscoped path sends an empty RuntimeK8sDetails object.
In `@src/experimental/dx-standalone-linux/systemd/middleware/middleware.env`:
- Around line 18-29: Update the documentation around the middleware.env secrets
file to explicitly require permission hardening for the EnvironmentFile used by
the middleware units. Mention that the file containing PL_JWT_SIGNING_KEY and
PL_DEPLOY_KEY must be owned by root and restricted to chmod 600, and reference
the middleware.env setup and the systemd unit usage so it is clear where to
apply the change.
- Line 21: The shared env file defines PL_CLUSTER_ID, but vizier-kelvin reads
PL_VIZIER_ID, so the startup value is mismatched. Update the environment entry
in middleware.env to use PL_VIZIER_ID consistently so the Kelvin startup path
picks up the registered cluster id; use the existing env stanza near the
cloud-connector assignment to locate the change.
In `@src/experimental/dx-standalone-linux/systemd/middleware/pl-etcd.service`:
- Around line 16-24: The systemd unit currently runs etcd as root without any
sandboxing or user restriction. Update the pl-etcd.service [Service] section to
add a dedicated User=, enable NoNewPrivileges=true and ProtectSystem=strict, and
whitelist the data directory with ReadWritePaths=/var/lib/vizier/etcd. Keep the
existing ExecStart and Restart settings unchanged while adding these hardening
directives alongside them.
In `@src/experimental/dx-standalone-linux/systemd/middleware/pl-nats.service`:
- Around line 20-21: The pl-nats.service startup command exposes NATS on all
interfaces without any authentication, so update the ExecStart in the systemd
unit to restrict the listener to localhost or add required auth flags. Use the
existing pl-nats.service ExecStart entry and make it consistent with the other
middleware units by binding the nats-server process to 127.0.0.1 and, if needed,
adding user/pass or token options.
In
`@src/experimental/dx-standalone-linux/systemd/middleware/vizier-kelvin.service`:
- Around line 18-21: The vizier-kelvin.service unit only orders startup with
vizier-query-broker.service via After=, but it should also declare a Requires=
on that same dependency for consistency with the sibling unit and to make the
relationship explicit. Update the systemd unit near the service declaration so
the Kelvin service both requires and starts after vizier-query-broker.service,
matching the pattern used by vizier-cloud-connector.service.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 4f9b5016-ef2b-45f0-9a71-700cbc8361a3
📒 Files selected for processing (10)
src/experimental/dx-standalone-linux/README.mdsrc/experimental/dx-standalone-linux/compose/docker-compose.yamlsrc/experimental/dx-standalone-linux/inject/inject-referral.shsrc/experimental/dx-standalone-linux/systemd/middleware/middleware.envsrc/experimental/dx-standalone-linux/systemd/middleware/pl-etcd.servicesrc/experimental/dx-standalone-linux/systemd/middleware/pl-nats.servicesrc/experimental/dx-standalone-linux/systemd/middleware/vizier-cloud-connector.servicesrc/experimental/dx-standalone-linux/systemd/middleware/vizier-kelvin.servicesrc/experimental/dx-standalone-linux/systemd/middleware/vizier-metadata.servicesrc/experimental/dx-standalone-linux/systemd/middleware/vizier-query-broker.service
| scenario="${1:-argocd-render}" | ||
| DX="${2:-http://127.0.0.1:9099}" | ||
| now="$(date +%s)" | ||
| ns="${NS-poc}" # NS="" → no pod scope (VM-native; dx queries the PEM unscoped) |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
sed -n '1,71p' src/experimental/dx-standalone-linux/inject/inject-referral.sh
rg -n 'FromKubescapeRow' -A20 src/vizier -g '!**/node_modules/**'Repository: k8sstormcenter/pixie
Length of output: 3275
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Map the relevant implementation and callers.
fd -a 'referral.*' src || true
fd -a 'FromKubescapeRow*' src || true
rg -n 'FromKubescapeRow|RuntimeK8sDetails|namespace":"\$ns"|podName' src -g '!**/node_modules/**'Repository: k8sstormcenter/pixie
Length of output: 18392
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '165,190p' src/experimental/dx-standalone-linux/README.md
echo '---'
fd -a 'extract.go' src/vizier/services/adaptive_export/internal/kubescape
echo '---'
sed -n '1,180p' src/vizier/services/adaptive_export/internal/kubescape/extract.goRepository: k8sstormcenter/pixie
Length of output: 6010
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '165,190p' src/experimental/dx-standalone-linux/README.md
echo '---'
sed -n '1,180p' src/vizier/services/adaptive_export/internal/kubescape/extract.goRepository: k8sstormcenter/pixie
Length of output: 5921
ns="" must suppress pod scope, not just namespace
RuntimeK8sDetails still carries podName, but the VM/off-k8s path expects that object to be empty to query the PEM unscoped. Gate the whole K8s payload on NS (or clear podName too) when NS is empty.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/inject/inject-referral.sh` at line 31,
The VM/off-k8s path is still carrying pod-scoped K8s data when NS is empty, so
the PEM query remains scoped. Update the inject-referral flow in
inject-referral.sh so the RuntimeK8sDetails payload is fully suppressed or has
podName cleared when NS is empty, rather than only omitting the namespace. Make
the conditional around the K8s payload generation use NS as the gate and ensure
the unscoped path sends an empty RuntimeK8sDetails object.
| PL_CLOUD_ADDR=withpixie.ai:443 | ||
| PL_JWT_SIGNING_KEY=CHANGE_ME_shared_hs256_key # same key across ALL vizier units | ||
| PL_DEPLOY_KEY=CHANGE_ME_from_px_deploy-key_create | ||
| PL_CLUSTER_ID= # assigned by cloud-connector on first RegisterVizier | ||
| PL_VIZIER_NAME=dx-standalone-vm | ||
| PL_POD_NAMESPACE=pl # a label only (no k8s); keeps PxL/UI names stable | ||
| PL_MD_ETCD_SERVER=http://127.0.0.1:2379 | ||
| PL_ETCD_OPERATOR_ENABLED=false | ||
| PL_HOST_PATH=/ | ||
| PL_HOST_IP=127.0.0.1 | ||
| PL_POD_IP_ADDRESS=127.0.0.1 | ||
| PL_DATA_ACCESS=Full |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win
Secrets file should be permission-restricted.
PL_JWT_SIGNING_KEY and PL_DEPLOY_KEY are loaded via EnvironmentFile in every middleware unit. The README/runbook should note that /etc/vizier/middleware.env must be chmod 600 root:root since it's world-readable by default when copied, exposing the shared HS256 signing key and deploy key to any local user.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/middleware/middleware.env`
around lines 18 - 29, Update the documentation around the middleware.env secrets
file to explicitly require permission hardening for the EnvironmentFile used by
the middleware units. Mention that the file containing PL_JWT_SIGNING_KEY and
PL_DEPLOY_KEY must be owned by root and restricted to chmod 600, and reference
the middleware.env setup and the systemd unit usage so it is clear where to
apply the change.
| PL_CLOUD_ADDR=withpixie.ai:443 | ||
| PL_JWT_SIGNING_KEY=CHANGE_ME_shared_hs256_key # same key across ALL vizier units | ||
| PL_DEPLOY_KEY=CHANGE_ME_from_px_deploy-key_create | ||
| PL_CLUSTER_ID= # assigned by cloud-connector on first RegisterVizier |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'PL_VIZIER_ID|PL_CLUSTER_ID' -g '!*.md'Repository: k8sstormcenter/pixie
Length of output: 158
🏁 Script executed:
#!/bin/bash
set -euo pipefail
printf '\n## Files of interest\n'
git ls-files 'src/experimental/dx-standalone-linux/systemd/**' 'src/**/middleware.env' 'src/**/kelvin*' 'src/**/query-broker*' 'src/**/cloud-connector*' | sed 's#^`#-` #'
printf '\n## Search for relevant env vars and service names\n'
rg -n --hidden --no-ignore-vcs 'PL_VIZIER_ID|PL_CLUSTER_ID|vizier-kelvin|query-broker|cloud-connector|EnvironmentFile' src/experimental/dx-standalone-linux/systemd src -g '!*.md' || true
printf '\n## Outline candidate files\n'
for f in $(git ls-files 'src/experimental/dx-standalone-linux/systemd/**' | head -n 20); do
echo
echo "### $f"
ast-grep outline "$f" --view expanded 2>/dev/null || true
doneRepository: k8sstormcenter/pixie
Length of output: 18373
🏁 Script executed:
#!/bin/bash
set -euo pipefail
for f in \
src/experimental/dx-standalone-linux/systemd/middleware/middleware.env \
src/experimental/dx-standalone-linux/systemd/middleware/vizier-cloud-connector.service \
src/experimental/dx-standalone-linux/systemd/middleware/vizier-kelvin.service \
src/experimental/dx-standalone-linux/systemd/middleware/vizier-query-broker.service \
src/experimental/dx-standalone-linux/systemd/middleware/README.md \
src/vizier/services/agent/shared/manager/manager.cc
do
echo
echo "===== $f ====="
wc -l "$f"
cat -n "$f" | sed -n '1,220p'
doneRepository: k8sstormcenter/pixie
Length of output: 18077
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n --hidden --no-ignore-vcs 'vizier_id|PL_VIZIER_ID|flags\.vizier_id|gflags::StringFromEnv\("PL_VIZIER_ID"' src/vizier/services src/experimental/dx-standalone-linux -g '!*.md'
printf '\n## Related config/docs references\n'
rg -n --hidden --no-ignore-vcs 'PL_CLUSTER_ID|PL_VIZIER_ID|cluster id|vizier id' src/experimental/dx-standalone-linux/systemd src/vizier/services -g '!*.md'Repository: k8sstormcenter/pixie
Length of output: 1578
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '280,305p' src/vizier/services/agent/shared/manager/manager.cc | cat -nRepository: k8sstormcenter/pixie
Length of output: 1496
Set PL_VIZIER_ID in the shared env file. vizier-kelvin reads PL_VIZIER_ID, but middleware.env only defines PL_CLUSTER_ID. Make the name consistent so Kelvin starts with the registered cluster id.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/middleware/middleware.env` at
line 21, The shared env file defines PL_CLUSTER_ID, but vizier-kelvin reads
PL_VIZIER_ID, so the startup value is mismatched. Update the environment entry
in middleware.env to use PL_VIZIER_ID consistently so the Kelvin startup path
picks up the registered cluster id; use the existing env stanza near the
cloud-connector assignment to locate the change.
| [Unit] | ||
| Description=pl-etcd — metadata store for vizier-metadata (replaces the k8s etcd operator) | ||
| After=network-online.target | ||
| [Service] | ||
| ExecStart=/usr/local/bin/etcd --listen-client-urls=http://127.0.0.1:2379 --advertise-client-urls=http://127.0.0.1:2379 --data-dir=/var/lib/vizier/etcd | ||
| Restart=on-failure | ||
| RestartSec=2 | ||
| [Install] | ||
| WantedBy=multi-user.target |
There was a problem hiding this comment.
🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win
No sandboxing/User directive; runs as root.
Consider adding basic systemd hardening (User=, ProtectSystem=strict, NoNewPrivileges=true, ReadWritePaths=/var/lib/vizier/etcd) since this is a PoC etcd instance handling cluster metadata state. Low effort given these are additive directives.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/middleware/pl-etcd.service`
around lines 16 - 24, The systemd unit currently runs etcd as root without any
sandboxing or user restriction. Update the pl-etcd.service [Service] section to
add a dedicated User=, enable NoNewPrivileges=true and ProtectSystem=strict, and
whitelist the data directory with ReadWritePaths=/var/lib/vizier/etcd. Keep the
existing ExecStart and Restart settings unchanged while adding these hardening
directives alongside them.
| [Service] | ||
| ExecStart=/usr/local/bin/nats-server -p 4222 -m 8222 |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
NATS server started without authentication or bind restriction.
nats-server -p 4222 -m 8222 binds to all interfaces by default with no client authentication configured. Combined with the standalone-VM's host-network setup elsewhere in this stack, the message bus carrying agent/query traffic would be reachable and unauthenticated from any network the VM is on. Bind to 127.0.0.1 (-a 127.0.0.1) or add --user/--pass/token auth, matching the localhost-only intent of the other middleware units.
🔒 Proposed fix
-ExecStart=/usr/local/bin/nats-server -p 4222 -m 8222
+ExecStart=/usr/local/bin/nats-server -a 127.0.0.1 -p 4222 -m 8222📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [Service] | |
| ExecStart=/usr/local/bin/nats-server -p 4222 -m 8222 | |
| [Service] | |
| ExecStart=/usr/local/bin/nats-server -a 127.0.0.1 -p 4222 -m 8222 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/systemd/middleware/pl-nats.service`
around lines 20 - 21, The pl-nats.service startup command exposes NATS on all
interfaces without any authentication, so update the ExecStart in the systemd
unit to restrict the listener to localhost or add required auth flags. Use the
existing pl-nats.service ExecStart entry and make it consistent with the other
middleware units by binding the nats-server process to 127.0.0.1 and, if needed,
adding user/pass or token options.
| After=pl-nats.service vizier-query-broker.service | ||
| [Service] | ||
| EnvironmentFile=/etc/vizier/middleware.env | ||
| # PL_VIZIER_ID must match the cluster id cloud-connector registered. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
Requires= missing for vizier-query-broker.service.
After= orders Kelvin after the query broker, but there's no Requires=, unlike vizier-cloud-connector.service which does Requires=vizier-query-broker.service. Given Restart=on-failure this self-heals, but for consistency with the sibling unit it may be worth requiring it explicitly.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@src/experimental/dx-standalone-linux/systemd/middleware/vizier-kelvin.service`
around lines 18 - 21, The vizier-kelvin.service unit only orders startup with
vizier-query-broker.service via After=, but it should also declare a Requires=
on that same dependency for consistency with the sibling unit and to make the
relationship explicit. Update the systemd unit near the service declaration so
the Kelvin service both requires and starts after vizier-query-broker.service,
matching the pattern used by vizier-cloud-connector.service.
… published PEM binary) The PEM is published as a native bazel-runfiles bundle (entlein/dx release asset); the unit now execs /usr/local/bin/standalone-pem (launcher over the runfiles binary) — native process, no container, no /sys:rw. See deploy/INSTALL-VM.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017k7uYSNUctQvkTAZYJbaB3
…-assets regression guard dx-agent (entlein/dx#119) on Ubuntu 24.04: - #2: dx-standalone.env had 'DX_CATALOG_DIR=/etc/dx/catalog # ...' — systemd EnvironmentFile keeps the inline comment IN the value (daemon logged dir='/etc/dx/catalog # ...'). Moved every comment to its own line + a warning header. - test/verify-assets.sh: regression guard — CI mode asserts no inline comments in the .env + the launcher ExecStart; asset mode validates a downloaded bundle (#1 top symlink relative/resolves, #4 packaged headers present). All PASS. standalone-pem.service already execs the /usr/local/bin/standalone-pem launcher (native), so #1's dangling top symlink didn't affect the unit — but the tarball is repackaged with a relative symlink anyway (entlein/dx release asset). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_017k7uYSNUctQvkTAZYJbaB3
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/experimental/dx-standalone-linux/test/verify-assets.sh`:
- Around line 43-45: The shellcheck warning is about two robustness issues in
the verification script: the A && B || C pattern in the service checks and the
use of ls with a glob for the header tarball lookup. Update the logic in
verify-assets.sh around the standalone-pem.service and header archive checks to
use explicit if/then/else branching (or equivalent) instead of chained &&/||,
and replace the ls-based glob handling with a safer file discovery approach that
does not depend on parsing ls output.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 37b96670-effd-4fb6-8197-507759aceafd
📒 Files selected for processing (3)
src/experimental/dx-standalone-linux/systemd/dx-standalone.envsrc/experimental/dx-standalone-linux/systemd/standalone-pem.servicesrc/experimental/dx-standalone-linux/test/verify-assets.sh
| grep -q 'ExecStart=/usr/local/bin/standalone-pem' "$SYSD/standalone-pem.service" \ | ||
| && ok "standalone-pem.service execs the launcher" \ | ||
| || no "standalone-pem.service ExecStart not the launcher" |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
Minor shellcheck hints: &&/|| pitfall and ls glob.
Static analysis flags the A && B || C idiom at Lines 43-45 and 65 (SC2015) and ls glob usage at Line 64 (SC2012). Practical risk is low here (ok/no are simple echoes; header tarball names follow a fixed pattern), but consider tightening for robustness.
♻️ Optional cleanup
-grep -q 'ExecStart=/usr/local/bin/standalone-pem' "$SYSD/standalone-pem.service" \
- && ok "standalone-pem.service execs the launcher" \
- || no "standalone-pem.service ExecStart not the launcher"
+if grep -q 'ExecStart=/usr/local/bin/standalone-pem' "$SYSD/standalone-pem.service"; then
+ ok "standalone-pem.service execs the launcher"
+else
+ no "standalone-pem.service ExecStart not the launcher"
+fi- n=$(ls "$PXDIR"/linux-headers-x86_64-*.tar.gz 2>/dev/null | wc -l)
- [ "$n" -gt 0 ] && ok "$n packaged-header tarballs at $PXDIR" || no "no /px/linux-headers-x86_64-*.tar.gz"
+ n=$(find "$PXDIR" -maxdepth 1 -name 'linux-headers-x86_64-*.tar.gz' 2>/dev/null | wc -l)
+ if [ "$n" -gt 0 ]; then
+ ok "$n packaged-header tarballs at $PXDIR"
+ else
+ no "no /px/linux-headers-x86_64-*.tar.gz"
+ fiAlso applies to: 64-65
🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 44-44: Note that A && B || C is not if-then-else. C may run when A is true.
(SC2015)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/experimental/dx-standalone-linux/test/verify-assets.sh` around lines 43 -
45, The shellcheck warning is about two robustness issues in the verification
script: the A && B || C pattern in the service checks and the use of ls with a
glob for the header tarball lookup. Update the logic in verify-assets.sh around
the standalone-pem.service and header archive checks to use explicit
if/then/else branching (or equivalent) instead of chained &&/||, and replace the
ls-based glob handling with a safer file discovery approach that does not depend
on parsing ls output.
Source: Linters/SAST tools
Validation — systemd/native single-VM path on a naked Ubuntu 24.04 (x86-64)Ran the systemd / native-binary path (§2b) end-to-end on a fresh VM with no Kubernetes, no cloud, no Docker. Result: ALL GREEN — detection and the real eBPF evidence pull. Below is exactly what was installed, the commands, the exact metrics, and a drop-in Hit 4 things that fail on a naked VM (not on a dev box with headers/libs) — §2b should note them:
Environment
Install (exact commands that differ from / augment §0–§2b)# deps
sudo apt-get update && sudo apt-get install -y libunwind8 xz-utils
# exact-match kernel headers so socket_tracer eBPF attaches (the fix for #1)
KREL=$(uname -r)
sudo modprobe kheaders # exposes /sys/kernel/kheaders.tar.xz (CONFIG_IKHEADERS)
sudo mkdir -p /usr/src/linux-headers-$KREL
sudo tar xJf /sys/kernel/kheaders.tar.xz -C /usr/src/linux-headers-$KREL
sudo ln -sfn /usr/src/linux-headers-$KREL /lib/modules/$KREL/build # default build symlink dangles
test -f /lib/modules/$KREL/build/include/generated/autoconf.h && echo headers-ok
# PEM launcher must exec the REAL runfiles ELF (top-level symlink dangles — #3)
sudo tar xzf pixie-standalone-pem-*-linux-amd64.tar.gz -C /opt/pixie-pem
PEMBIN=$(sudo find /opt/pixie-pem -path '*runfiles/px/src/experimental/standalone_pem/standalone_pem' -type f | head -1)
printf '#!/usr/bin/env bash\nexec %s "$@"\n' "$PEMBIN" | sudo tee /usr/local/bin/standalone-pem
# env: strip inline comments before writing to /etc/dx (#4)
sed -E 's/^([A-Z_]+=[^#[:space:]]+)[[:space:]]+#.*/\1/' systemd/dx-standalone.env | sudo tee /etc/dx/dx-standalone.env
sudo systemctl enable --now standalone-pem dx-daemonTests + exact results
Contrast without the headers fix: Drop-in validation test (Makefile-runnable)
#!/usr/bin/env bash
# test/validate_ubuntu2404.sh — validate dx + standalone-PEM (systemd/native) on Ubuntu 24.04.
set -uo pipefail
HERE="$(cd "$(dirname "$0")" && pwd)/.."; [ "$(id -u)" = 0 ] || exec sudo -E bash "$0" "$@"
KREL="$(uname -r)"; FAILS=0
pass(){ echo " PASS $*"; }; fail(){ echo " FAIL $*"; FAILS=$((FAILS+1)); }
metric(){ curl -s localhost:9095/metrics | awk -v k="$1" '$1==k{print $2}'; }
[ "$(uname -m)" = x86_64 ] && pass "arch x86_64" || fail "arch"
test -f /sys/kernel/btf/vmlinux && pass "BTF" || fail "BTF"
export DEBIAN_FRONTEND=noninteractive; apt-get install -y -qq libunwind8 xz-utils
ldconfig -p | grep -q libunwind.so.8 && pass libunwind8 || fail libunwind8
if [ ! -f "/lib/modules/$KREL/build/include/generated/autoconf.h" ]; then
modprobe kheaders 2>/dev/null || true
if [ -f /sys/kernel/kheaders.tar.xz ]; then D="/usr/src/linux-headers-$KREL"; rm -rf "$D"; mkdir -p "$D"
tar xJf /sys/kernel/kheaders.tar.xz -C "$D"; ln -sfn "$D" "/lib/modules/$KREL/build"; fi
fi
test -f "/lib/modules/$KREL/build/include/generated/autoconf.h" && pass headers || fail "no kernel headers"
# … installs PEM (launcher → real runfiles ELF) + dx-daemon + systemd units (env comments stripped),
# restarts standalone-pem, settles 45s for BCC compile, restarts dx-daemon for clean counters …
systemctl restart standalone-pem; sleep 45
systemctl restart dx-daemon
for i in $(seq 1 20); do curl -sf localhost:9099/healthz >/dev/null 2>&1 && break; sleep 0.5; done
[ "$(systemctl is-active standalone-pem dx-daemon | tr '\n' ' ')" = "active active " ] && pass "units active" || fail "units"
NS="" bash "$HERE/inject-referral.sh" argocd-render http://127.0.0.1:9099 >/dev/null; sleep 12
U=$(metric dx_bench_unavailable); E=$(metric dx_bench_errors_total)
R=$(curl -s localhost:9095/metrics | awk '/^dx_verdicts_total\{outcome="ruled_in"/{s+=$2}END{print s+0}')
P=$(curl -s localhost:9095/metrics | awk '/^dx_bench_pull_total\{result="broker_query"/{s+=$2}END{print s+0}')
echo " unavailable=$U errors=$E ruled_in=$R pulls=$P"
[ "$U" = 0 ] && pass "eBPF sighted" || fail "eBPF BLIND"
[ "$E" = 0 ] && pass "no bench errors" || fail "bench errors=$E"
[ "${R:-0}" -ge 1 ] && pass "ruled_in" || fail "no ruled_in"
[ "${P:-0}" -ge 1 ] && pass "real eBPF pulls" || fail "no pulls"
[ "$FAILS" = 0 ] && { echo "ALL PASS"; exit 0; } || { echo "$FAILS FAILED"; exit 1; }Full run transcript ends with: |
A fork-only (copybara-excluded, not for upstream) showcase: run the dx active-diagnosis PoC on a single Linux VM with no Kubernetes, plus the design for keeping Pixie Cloud visualization when the PEM has no vizier middleware. The buildable realization of the dx repo's
docs/DEPLOYMENT_ALTERNATIVES.mdQ2.Lives in
src/experimental/dx-standalone-linux/; added tofork_only_filesintools/private/copybara/copy.bara.sky.What runs
standalone_pem(eBPF + ExecuteScript :12345) ←dx-daemon(DX_BENCH=pxdirect) — detection is fully node-local, no cloud. Referrals on a VM come frominject/inject-referral.sh(synthetic enriched-kubescape rows → dx :9099; the shape a real feed must emit; also the template for a PEM→referral synthesizer).Cloud visualization — the auth problem, three answers (README §4)
Pixie Cloud reaches a cluster via cloud-connector → vzconn mTLS tunnel;
standalone_pemhas none of it, so cloud can't see it. Options:RegisterVizier→ cluster cert — then proxies cloud-tunneledExecuteScriptto the local PEM. Collapses cloud-connector+query-broker into one binary (single node → no Kelvin).cloudshim/is a compiling skeleton with the two integration points marked against the real pixie packages (src/cloud/vzconn,cloud_connector/bridge/vzconn_client.go,pxapi).upid→pod) runs degraded off-k8s (static host/cgroup map). Unit templates + the caveat insystemd/middleware/./metrics+ verdict log + manifests.upid_to_pod_namereturns empty off-k8s; the UI shows process/cgroup identity unless a static map is supplied.Contents
README.md(architecture + cloud design),systemd/(pem, dx, shim + Option-B templates),compose/,inject/,test/smoke.sh+nfr.sh(single-VM smoke + NFR: time-to-verdict p50/p95, throughput, drops, cache hit-rate, RSS/CPU),cloudshim/.Validation (local)
0.3.0-integration4) admits the injected R0001 spawn and producesgeneric=MALIGNANT{invasion}on a pure Linux host — no k8s, no cloud, no PEM (event-driven detection; the network-evidence pull + catalog rule-in need a real PEM, rig-gated for eBPF).cloudshimbuilds +golangci-lint0 issues; scripts passshellcheck; compose passesyamllint; all source carries the Apache header.Rig-gated (not validatable here): the real eBPF PEM path and the vzconn cloud handshake (need a privileged CO-RE kernel + a live Pixie Cloud).