Skip to content

Add cloud-init log collection to CI artifacts#383

Merged
rporres merged 1 commit into
mainfrom
cloud-init-logs
May 27, 2026
Merged

Add cloud-init log collection to CI artifacts#383
rporres merged 1 commit into
mainfrom
cloud-init-logs

Conversation

@rporres
Copy link
Copy Markdown
Contributor

@rporres rporres commented May 21, 2026

Collect cloud-init logs from Landing Zone VM to aid debugging VM provisioning issues (network config, user setup, package installation, SSH key injection).

Captures journal entries, status output, and result JSON showing success/failure of cloud-init phases.

Summary by CodeRabbit

  • Chores
    • Capture cloud-init status, unit logs, and result output from the landing zone for diagnostics.
    • Persist VM serial console to per-VM log files for easier post‑mortems.
    • Installer/provisioner favors curl over wget, enables AppStream/repos when needed, installs git via package steps, adds jq/unzip, and installs AWS CLI v2 via the official installer. Adjusted filesystem ownership handling and package list updates.
  • Bug Fixes / Reliability
    • Added non‑SSH fallback to collect VM console logs and VM state locally when IP discovery or SSH fail.

Review Change Stack

@github-actions github-actions Bot added the validation Validation and testing label May 21, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds Landing Zone cloud-init diagnostics (SSH) and a non-SSH fallback that collects local libvirt/QEMU console and VM state; provisioning now persists the VM serial console to /var/log/libvirt/qemu/${LZ_VM_NAME}-console.log. Also introduces a VM-name helper and switches wget downloads to curl; adds jq to prerequisites.

Changes

Landing Zone diagnostics and infra updates

Layer / File(s) Summary
Provisioning: quoting, curl download, cloud-init runcmd, serial logging
scripts/infrastructure/provision_landing_zone.sh
Quotes chown args, switches cloud image download to curl -fL, replaces packages: with runcmd that enables AppStream (via subscription-manager) and installs git, and adds --serial file,path=/var/log/libvirt/qemu/${LZ_VM_NAME}-console.log to virt-install.
VM name helper and IP discovery
scripts/verification/collect_ci_artifacts.sh
Adds get_landing_zone_vm_name() to derive the Landing Zone VM name from the latest environment*.json (with a constructed fallback) and uses it in get_landing_zone_ip before calling virsh domifaddr.
Landing Zone cloud-init diagnostics collection
scripts/verification/collect_ci_artifacts.sh
Adds collect_lz_cloud_init(lz_ip) which SSHes to the Landing Zone to record cloud-init status, selected journalctl logs for cloud-init units, and /run/cloud-init/result.json (or a missing-file note) into landing-zone/cloud-init.txt. Integrated into collect_deployment after DNS collection.
Non-SSH fallback and deployment integration
scripts/verification/collect_ci_artifacts.sh
Adds collect_lz_fallback_logs() and a fallback branch in collect_deployment that, when Landing Zone IP discovery or SSH fails, collects local libvirt QEMU console and domain logs plus VM state into the landing-zone/ output directory.
Runner installer and environment prerequisites
scripts/runners/install_runner_ci_requirements.sh, setup_env.sh
Removes wget from the dnf package list, replaces wget download pipelines with `curl -fsSL ...

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly reflects the primary change in the changeset: adding cloud-init log collection functionality to the CI artifact collection script.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cloud-init-logs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rporres rporres force-pushed the cloud-init-logs branch from ea73dea to c02f96b Compare May 21, 2026 10:00
@rporres rporres changed the title Add cloud-init log collection to CI artifacts [WIP] Add cloud-init log collection to CI artifacts May 21, 2026
@rporres rporres force-pushed the cloud-init-logs branch from c02f96b to 745f43b Compare May 21, 2026 10:38
@github-actions github-actions Bot added the infrastructure Infrastructure setup (VMs, networks) label May 21, 2026
@rporres rporres force-pushed the cloud-init-logs branch from 745f43b to c7b5c4a Compare May 21, 2026 10:54
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/verification/collect_ci_artifacts.sh`:
- Around line 827-851: Derive the landing zone VM name from the environment
metadata first instead of always using the hardcoded pattern: replace the
lz_vm_name assignment (currently
lz_vm_name="${ENCLAVE_CLUSTER_NAME:-enclave-test}_landingzone_0") with logic
that first attempts to read the landing zone VM name from your environment
metadata (e.g., the .vms.landing_zone.name key via the existing metadata source
or a helper/JSON parse), then fall back to
"${ENCLAVE_CLUSTER_NAME}_landingzone_0" and finally to
"enclave-test_landingzone_0"; update uses of lz_vm_name (the serial console,
domain_log and virsh calls) to rely on this derived value so fallback collection
hits the actual libvirt VM when the metadata specifies a different name.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 719d78c8-5730-4541-8b5f-cfc4e48f1c20

📥 Commits

Reviewing files that changed from the base of the PR and between ea73dea and c7b5c4a.

📒 Files selected for processing (2)
  • scripts/infrastructure/provision_landing_zone.sh
  • scripts/verification/collect_ci_artifacts.sh

Comment thread scripts/verification/collect_ci_artifacts.sh Outdated
@rporres rporres force-pushed the cloud-init-logs branch 2 times, most recently from 1c6b5d0 to 5c572be Compare May 21, 2026 11:36
@rporres
Copy link
Copy Markdown
Contributor Author

rporres commented May 21, 2026

artifacts are properly and cloud-init logs can be seen in console logs. Take a look via gh run download 26223515872 inside an enclave repo

@rporres rporres changed the title [WIP] Add cloud-init log collection to CI artifacts Add cloud-init log collection to CI artifacts May 21, 2026
@rporres rporres force-pushed the cloud-init-logs branch from 5c572be to 31c6d75 Compare May 25, 2026 13:10
@github-actions github-actions Bot added the ci-cd CI/CD infrastructure label May 25, 2026
@rporres rporres force-pushed the cloud-init-logs branch from b79ba76 to 4d3e898 Compare May 25, 2026 14:02
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/infrastructure/provision_landing_zone.sh`:
- Around line 117-118: The current pipeline uses "subscription-manager repos
--list | awk '/Repo ID:/{print $3}' | xargs -r subscription-manager repos
--enable" which can pass multiple REPO_IDs after a single "--enable"; change it
so each repo ID is enabled with its own "--enable" flag (for example replace
xargs invocation with "xargs -r -I{} subscription-manager repos --enable {}" or
an equivalent loop) and stop masking failures with "|| true" so the script fails
if enabling AppStream fails before running "dnf install -y git". Ensure you
modify the command that produces repo IDs (the awk pipeline) and the xargs/loop
that calls "subscription-manager repos --enable" and remove the "|| true" that
hides errors.

In `@scripts/verification/collect_ci_artifacts.sh`:
- Around line 419-423: The current command substitution for env_file can fail
under set -euo pipefail when no environment*.json exists; change the lookup so
it never causes a non-zero exit (e.g., replace the fragile ls substitution with
a safe finder that returns empty instead of failing). Specifically update the
env_file assignment (the env_file variable lookup that searches
environment*.json under WORKING_DIR) to use a non-failing method such as using
find (find "$WORKING_DIR" -maxdepth 1 -name 'environment*.json' | head -1) or
append a "|| true" to the ls pipeline so env_file is empty rather than causing
the script to abort, keeping the subsequent if [ -f "$env_file" ] logic intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a0734e56-a25b-4d4f-ae7d-e0ceb90ffc2d

📥 Commits

Reviewing files that changed from the base of the PR and between c7b5c4a and 4d3e898.

📒 Files selected for processing (4)
  • scripts/infrastructure/provision_landing_zone.sh
  • scripts/runners/install_runner_ci_requirements.sh
  • scripts/verification/collect_ci_artifacts.sh
  • setup_env.sh

Comment thread scripts/infrastructure/provision_landing_zone.sh Outdated
Comment thread scripts/verification/collect_ci_artifacts.sh
@rporres rporres force-pushed the cloud-init-logs branch 2 times, most recently from bc05052 to 38b0e91 Compare May 25, 2026 15:39
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
scripts/infrastructure/provision_landing_zone.sh (1)

91-91: ⚡ Quick win

Add retries to cloud image download for CI resilience.

Line 91 currently fails on transient network blips; adding bounded retries makes provisioning significantly more stable.

Proposed change
-    curl -fL --progress-bar -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL"
+    curl -fL --progress-bar \
+      --retry 5 --retry-delay 2 --retry-all-errors \
+      -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/infrastructure/provision_landing_zone.sh` at line 91, The curl
download of the cloud image is brittle on transient network errors; update the
curl invocation in scripts/infrastructure/provision_landing_zone.sh (the line
using curl -fL -o "${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL") to
perform bounded retries and backoff: add curl retry flags (for example --retry
<n> --retry-delay <s> --retry-connrefused and optionally --retry-max-time) or
wrap the curl call in a small retry loop that attempts the download up to a
fixed number of times with increasing sleep intervals before failing, and ensure
it still fails non-0 if all retries exhaust to preserve existing error behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/runners/install_runner_ci_requirements.sh`:
- Line 109: The current pipeline streams the ShellCheck tarball directly into
tar without integrity checks; change the flow so the asset is first downloaded
to a file named "shellcheck-${SHELLCHECK_VERSION}.linux.x86_64.tar.xz", then
verify its SHA-256 against a pinned EXPECTED_SHA256 (from the matching GitHub
release tag) using sha256sum -c - (or an equivalent check), fail the script on
mismatch, and only after successful verification run tar -xJf on that file;
update any references to the temporary filename and ensure the script exits
non-zero on verification failure.

In `@setup_env.sh`:
- Line 51: The installer invocation in setup_env.sh currently runs
"${AWSCLI_TMP}/aws/install" which is not idempotent across reruns; change the
invocation to call the installer with the --update flag (e.g.,
"${AWSCLI_TMP}/aws/install --update") so repeated runs handle already-installed
or differing AWS CLI v2 versions cleanly and make the script re-runnable.

---

Nitpick comments:
In `@scripts/infrastructure/provision_landing_zone.sh`:
- Line 91: The curl download of the cloud image is brittle on transient network
errors; update the curl invocation in
scripts/infrastructure/provision_landing_zone.sh (the line using curl -fL -o
"${LZ_WORKING_DIR}/${CLOUD_IMAGE_NAME}" "$CLOUD_IMAGE_URL") to perform bounded
retries and backoff: add curl retry flags (for example --retry <n> --retry-delay
<s> --retry-connrefused and optionally --retry-max-time) or wrap the curl call
in a small retry loop that attempts the download up to a fixed number of times
with increasing sleep intervals before failing, and ensure it still fails non-0
if all retries exhaust to preserve existing error behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9582661d-0086-44b9-bbe3-88b445b0e0fe

📥 Commits

Reviewing files that changed from the base of the PR and between 5552816 and 882340d.

📒 Files selected for processing (3)
  • scripts/infrastructure/provision_landing_zone.sh
  • scripts/runners/install_runner_ci_requirements.sh
  • setup_env.sh

Comment thread scripts/runners/install_runner_ci_requirements.sh
Comment thread setup_env.sh
@rporres rporres force-pushed the cloud-init-logs branch from c2edd4f to d01422d Compare May 26, 2026 07:43
@github-actions github-actions Bot added ci-cd CI/CD infrastructure and removed ci-cd CI/CD infrastructure labels May 26, 2026
@rporres rporres force-pushed the cloud-init-logs branch 6 times, most recently from 2b462f5 to 34cfa12 Compare May 26, 2026 10:10
Configure Landing Zone VM to log serial console output to a file
during provisioning. When SSH fails, collect this console log which
contains boot messages, cloud-init logs, and error output.

The serial console log captures:
- Boot sequence and kernel messages
- cloud-init phases and their output
- SSH daemon startup
- Any errors that prevent SSH from working

This enables debugging provisioning failures without requiring SSH
access or libguestfs-tools.

Signed-off-by: Rafa Porres Molina <rporresm@redhat.com>
Assisted-by: Claude Sonnet 4.5 <noreply@anthropic.com>
@rporres rporres force-pushed the cloud-init-logs branch from 34cfa12 to ff26309 Compare May 26, 2026 15:54
@github-actions github-actions Bot removed the ci-cd CI/CD infrastructure label May 26, 2026
@rporres rporres merged commit 99618ad into main May 27, 2026
22 checks passed
@rporres rporres deleted the cloud-init-logs branch May 27, 2026 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Infrastructure setup (VMs, networks) validation Validation and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants