Add cloud-init log collection to CI artifacts#383
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds Landing Zone cloud-init diagnostics (SSH) and a non-SSH fallback that collects local libvirt/QEMU console and VM state; provisioning now persists the VM serial console to /var/log/libvirt/qemu/${LZ_VM_NAME}-console.log. Also introduces a VM-name helper and switches wget downloads to curl; adds jq to prerequisites. ChangesLanding Zone diagnostics and infra updates
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/verification/collect_ci_artifacts.sh`:
- Around line 827-851: Derive the landing zone VM name from the environment
metadata first instead of always using the hardcoded pattern: replace the
lz_vm_name assignment (currently
lz_vm_name="${ENCLAVE_CLUSTER_NAME:-enclave-test}_landingzone_0") with logic
that first attempts to read the landing zone VM name from your environment
metadata (e.g., the .vms.landing_zone.name key via the existing metadata source
or a helper/JSON parse), then fall back to
"${ENCLAVE_CLUSTER_NAME}_landingzone_0" and finally to
"enclave-test_landingzone_0"; update uses of lz_vm_name (the serial console,
domain_log and virsh calls) to rely on this derived value so fallback collection
hits the actual libvirt VM when the metadata specifies a different name.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 719d78c8-5730-4541-8b5f-cfc4e48f1c20
📒 Files selected for processing (2)
scripts/infrastructure/provision_landing_zone.shscripts/verification/collect_ci_artifacts.sh
1c6b5d0 to
5c572be
Compare
|
artifacts are properly and cloud-init logs can be seen in console logs. Take a look via |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/infrastructure/provision_landing_zone.sh`:
- Around line 117-118: The current pipeline uses "subscription-manager repos
--list | awk '/Repo ID:/{print $3}' | xargs -r subscription-manager repos
--enable" which can pass multiple REPO_IDs after a single "--enable"; change it
so each repo ID is enabled with its own "--enable" flag (for example replace
xargs invocation with "xargs -r -I{} subscription-manager repos --enable {}" or
an equivalent loop) and stop masking failures with "|| true" so the script fails
if enabling AppStream fails before running "dnf install -y git". Ensure you
modify the command that produces repo IDs (the awk pipeline) and the xargs/loop
that calls "subscription-manager repos --enable" and remove the "|| true" that
hides errors.
In `@scripts/verification/collect_ci_artifacts.sh`:
- Around line 419-423: The current command substitution for env_file can fail
under set -euo pipefail when no environment*.json exists; change the lookup so
it never causes a non-zero exit (e.g., replace the fragile ls substitution with
a safe finder that returns empty instead of failing). Specifically update the
env_file assignment (the env_file variable lookup that searches
environment*.json under WORKING_DIR) to use a non-failing method such as using
find (find "$WORKING_DIR" -maxdepth 1 -name 'environment*.json' | head -1) or
append a "|| true" to the ls pipeline so env_file is empty rather than causing
the script to abort, keeping the subsequent if [ -f "$env_file" ] logic intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a0734e56-a25b-4d4f-ae7d-e0ceb90ffc2d
📒 Files selected for processing (4)
scripts/infrastructure/provision_landing_zone.shscripts/runners/install_runner_ci_requirements.shscripts/verification/collect_ci_artifacts.shsetup_env.sh
bc05052 to
38b0e91
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
scripts/infrastructure/provision_landing_zone.sh (1)
91-91: ⚡ Quick winAdd retries to cloud image download for CI resilience.
Line 91 currently fails on transient network blips; adding bounded retries makes provisioning significantly more stable.
Proposed change
- curl -fL --progress-bar -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL" + curl -fL --progress-bar \ + --retry 5 --retry-delay 2 --retry-all-errors \ + -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/infrastructure/provision_landing_zone.sh` at line 91, The curl download of the cloud image is brittle on transient network errors; update the curl invocation in scripts/infrastructure/provision_landing_zone.sh (the line using curl -fL -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL") to perform bounded retries and backoff: add curl retry flags (for example --retry <n> --retry-delay <s> --retry-connrefused and optionally --retry-max-time) or wrap the curl call in a small retry loop that attempts the download up to a fixed number of times with increasing sleep intervals before failing, and ensure it still fails non-0 if all retries exhaust to preserve existing error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/runners/install_runner_ci_requirements.sh`:
- Line 109: The current pipeline streams the ShellCheck tarball directly into
tar without integrity checks; change the flow so the asset is first downloaded
to a file named "shellcheck-${SHELLCHECK_VERSION}.linux.x86_64.tar.xz", then
verify its SHA-256 against a pinned EXPECTED_SHA256 (from the matching GitHub
release tag) using sha256sum -c - (or an equivalent check), fail the script on
mismatch, and only after successful verification run tar -xJf on that file;
update any references to the temporary filename and ensure the script exits
non-zero on verification failure.
In `@setup_env.sh`:
- Line 51: The installer invocation in setup_env.sh currently runs
"${AWSCLI_TMP}/aws/install" which is not idempotent across reruns; change the
invocation to call the installer with the --update flag (e.g.,
"${AWSCLI_TMP}/aws/install --update") so repeated runs handle already-installed
or differing AWS CLI v2 versions cleanly and make the script re-runnable.
---
Nitpick comments:
In `@scripts/infrastructure/provision_landing_zone.sh`:
- Line 91: The curl download of the cloud image is brittle on transient network
errors; update the curl invocation in
scripts/infrastructure/provision_landing_zone.sh (the line using curl -fL -o
"${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL") to perform bounded
retries and backoff: add curl retry flags (for example --retry <n> --retry-delay
<s> --retry-connrefused and optionally --retry-max-time) or wrap the curl call
in a small retry loop that attempts the download up to a fixed number of times
with increasing sleep intervals before failing, and ensure it still fails non-0
if all retries exhaust to preserve existing error behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9582661d-0086-44b9-bbe3-88b445b0e0fe
📒 Files selected for processing (3)
scripts/infrastructure/provision_landing_zone.shscripts/runners/install_runner_ci_requirements.shsetup_env.sh
2b462f5 to
34cfa12
Compare
Configure Landing Zone VM to log serial console output to a file during provisioning. When SSH fails, collect this console log which contains boot messages, cloud-init logs, and error output. The serial console log captures: - Boot sequence and kernel messages - cloud-init phases and their output - SSH daemon startup - Any errors that prevent SSH from working This enables debugging provisioning failures without requiring SSH access or libguestfs-tools. Signed-off-by: Rafa Porres Molina <rporresm@redhat.com> Assisted-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Collect cloud-init logs from Landing Zone VM to aid debugging VM provisioning issues (network config, user setup, package installation, SSH key injection).
Captures journal entries, status output, and result JSON showing success/failure of cloud-init phases.
Summary by CodeRabbit