chore(e2e-next): Fix custom linters for fork PRs#3784
Merged
Conversation
rlmcpherson
approved these changes
Apr 3, 2026
Piotr1215
approved these changes
Apr 7, 2026
rlmcpherson
added a commit
to rlmcpherson/vcluster
that referenced
this pull request
Apr 7, 2026
…tion * upstream/main: chore(e2e-next): Migrate e2e_cli tests (loft-sh#3797) fix(cli): respect admin override for requireTemplate in vcluster platform create (loft-sh#3725) chore(e2e-next): Fix custom linters for fork PRs (loft-sh#3784) chore(e2e-next): Test refactor ENGPLAT-399 Add --secure flag for TLS verification (loft-sh#3781)
tamalsaha
added a commit
to kluster-manager/vcluster
that referenced
this pull request
May 15, 2026
* chore(deps): bump tj-actions/changed-files from 47.0.0 to 47.0.1 (#3404)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 47.0.0 to 47.0.1.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/changed-files/compare/v47.0.0...v47.0.1)
---
updated-dependencies:
- dependency-name: tj-actions/changed-files
dependency-version: 47.0.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* refactor: Added NetworkPolicyIngressRule and NetworkPolicyEgressRule config types (#3396)
* [create-pull-request] automated change (#3421)
* fix: inconsistent etcd snapshots (#3423)
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
* Deprecate ingress nginx (#3395)
* upgrade platform version to 4.6.0-alpha.8 (#3425)
* fix: fall back to login flow if config access key is invalid (#3422)
* Cross vcluster apis (#3418)
* ENG-10378 | Cross vCluster APIs (#3389)
* Add vcluster.yaml resource proxy configuration
* Start proxy
* Move proxy config to experimental
* Rename start func
* Vendor in dev platform changes
* Bump admin-apis
* Update config schema (#3406)
* Cleanup vendor modules (#3415)
* Cleanup duplicate config entry, revert admin apis bump (#3416)
* Eng 10546/cleanup duplicate config (#3417)
* Cleanup duplicate config entry, revert admin apis bump
* Update vendor, cleanup not used config struct
* ENG-10546 | Inject custom error responder (#3428)
* Extract StartAPIServer from StartAPIServiceProxy
* Add HandlerWithErrorResponder to allow injection of custom errorResponder
* Log helm command in debug instead of info (#3429)
* Add accessResources to resource proxy config
* chore(deps): bump anchore/sbom-action from 0.20.11 to 0.21.0 (#3435)
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.20.11 to 0.21.0.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.20.11...v0.21.0)
---
updated-dependencies:
- dependency-name: anchore/sbom-action
dependency-version: 0.21.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ENG-10923 | Add config validation for experimental custom resource proxy (#3436)
* Add config validation for experimental custom resource proxy
* upgrade platform version to 4.6.0-alpha.10 (#3442)
* Revamp README: use-case based structure with new features and architectures
- Restructure README to be use-case based, aligned with vcluster.com homepage
- Add new architectures: Standalone, Auto Nodes, Private Nodes with expandable sections
- Add What's New section highlighting v0.30 (VPN & Netris) and v0.26 (Hybrid Scheduling)
- Update use cases table with correct solution links
- Reimagine Key Features section with better organization
- Add architecture comparison table and expandable architecture details
- Add architecture diagrams (PNG) for all architecture types
- Update social badges: Slack (4.2K+), X/Twitter (3.5K+), LinkedIn (14K+)
- Add Killercoda section for browser-based testing
- Update Trusted By section with correct case study links
- Expandable Conference Talks and Community Voice sections (latest to oldest)
- Simplify contributing section
- Reduce logo size and spacing
- Fix all broken links and update to correct vcluster.com URLs
* Update jsonschema regex for target vc and make it required (#3443)
* Add optional project to targetVirtualCluster ref (#3447)
* chore(backport): create prs with conflict markers for visibility (#3437)
Backport failures previously aborted silently, requiring log diving
to understand which files conflicted. With commitConflicts enabled,
PRs are created with conflict markers visible in the diff, letting
reviewers gauge resolution effort at a glance.
Addresses OPS-461
* fix(ci): duplicate comments prevented via issue id deduplication (#3449)
Linear sync created duplicate comments on issues when same ID appeared
in both PR body and branch name. Example: ENG-8061 got two identical
"Now available in stable release v0.30.4" comments 1 second apart.
Root cause: IssueIDs() extracted from both PR body AND branch name,
returning duplicates when both contained the same issue reference.
This was exposed by commit e48040042 which added stable release comments
for already-released issues - before that, duplicates were silently
skipped because issue was already in "Released" state.
Resolves OPS-460
* feat: Ensure than spec.resources for a pod is supported on the host cluster before syncing it (#3440)
* feat: add vCluster docker driver (#3460)
* fix: add missing slash after port in registry proxy URL replacement (#3434)
* fix: add missing slash after port in registry proxy URL replacement
* feat: Refactor helper function and add tests
* feat: add docker registry proxy (#3465)
* fix: remove dead code in podsyncer (#3462)
* fix(assets scripts): support multiarch images (#3454)
Fixes an issue when a single architecture image was pushed to the private registry
* feat: Add hostAliases to the controlPlane statefulSet pod configuration (#3432)
* ci(lint): verify go mod tidy and vendor are in sync (#3444)
PRs occasionally merged with out-of-sync vendor directories, causing
build failures after merge. Root cause: go mod tidy/vendor not run
before committing changes to go.mod or Go files.
This check runs go mod tidy && go mod vendor, then uses git status
--porcelain to detect any uncommitted changes. Fails fast with clear
instructions if developer forgot to sync.
Follows existing pattern from "Verify schema changes" step in same
workflow.
Related: OPS-368
* chore: update security contact to vcluster.com (#3464)
Resolves OPS-468
* fix(linear-sync): support variable-length team keys in issue regex (#3469)
DEVOPS-471
Hardcoded \w{3}-\d{4} regex only matched 3-letter team keys like
ENG or OPS. Linear renamed OPS to DEVOPS (6 chars), breaking issue
detection in PR bodies and branch names.
New regex \w{2,10}-\d{1,5} supports:
- Team keys from 2-10 characters (QA, ENG, DEVOPS, etc.)
- Issue numbers from 1-5 digits (realistic for any team)
Added test cases for DEVOPS, QA, mixed team keys, and edge cases.
* fix: docker dns & better logging (#3478)
* Check against using the same resource for both sync and proxy (#3471)
* ci: add conflict marker detection to prevent accidental merges (#3466)
backport PRs with commitConflicts enabled can have unresolved conflict
markers committed to the branch. while visible in the diff, nothing
prevents accidentally merging these PRs.
adds a ci check that:
- scans for conflict markers (<<<<<<, ======, >>>>>>)
- posts a pr comment listing files and line numbers
- fails the check to block merging
refs: OPS-461
* ci(lint): show diff when go mod tidy check fails (#3481)
Debugging CI failures where go.mod changes in CI but not locally.
Adding git diff output to see exactly what go mod tidy modifies.
* chore(deps): bump anchore/sbom-action from 0.21.0 to 0.21.1 (#3463)
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.21.0 to 0.21.1.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.21.0...v0.21.1)
---
updated-dependencies:
- dependency-name: anchore/sbom-action
dependency-version: 0.21.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* ci(lint): ignore indirect comment changes in go mod check (#3482)
The direct/indirect classification is cosmetic and doesn't affect
builds. Different local environments can produce different results
for the same go mod tidy command, causing false positives.
Only fail on actual dependency version changes, not on whether a
dependency is marked as direct or indirect.
* Validate against api group being used in both sync and proxy (#3483)
Previous logic didnt wat as proxy overshadows the whole APIGroup when running
* ci(lint): revert indirect ignore, add gitignore tip (#3485)
Reverts the indirect comment ignore logic since the root cause was
found: gitignored directories (e.g. licenses/) that exist locally
but not in CI can affect dependency resolution.
Adds a tip in the error message to help developers debug this issue
by checking for ignored files with 'git status --ignored'.
* [ENG-9177] User-readable license error messages (#3468)
* [ENG-9177] User-readable license error messages
Addresses ENG-9177. Improves license error messages to include the display name for a feature as well as the feature name in error messages. Adds unit tests for the new functionality.
* fix lint error
* update imports to remove stripe
* update to use featurename if lookup fails
* update callers to use valid featureName
* fix standalone error
* feat: add load balancer support & refactor (#3486)
* fix: pro feature enabled check (#3488)
* [ENG-9177] License error fixes (#3490)
- Update NewFeatureError to take licenseapi.FeatureName instead of
string
- Update the error message to be agnostic to vcluster or vcluster-pro
usage
* fix: add --docker to vcluster platform destroy (#3492)
* fix(linear-sync): look up team per issue instead of using global default (#3495)
"vCluster / Platform" team no longer exists after company reorg. Different
teams have different workflow state IDs for "Released" state, so we must
look up the released state ID per-team based on each issue's actual team.
Key changes:
- Remove hardcoded -linear-team-name flag
- Add GetIssueDetails() to fetch issue state and team in single API call
- Cache released state IDs by team name to avoid redundant lookups
- Pass pre-fetched IssueDetails to MoveIssueToState() to eliminate double API call
- Add debug info (available teams/states) when workflow lookup fails
- Add deduplicateIssueIDs() to handle same issue in multiple PRs
* fix: get containerd socket path (#3521)
* fix: skip error event on resourceversion conflict in pod syncer (#3527)
When a resourceversion conflict occurs during pod update, controller
runtime will automatically retry and eventually maintain data consistency.
However, recording a warning event in this case gives users a false
impression of failure. This change skips the error event recording when
a conflict error is detected, allowing controller runtime to handle the
retry transparently.
* chore(deps): bump anchore/sbom-action from 0.21.1 to 0.22.0 (#3526)
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.21.1 to 0.22.0.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.21.1...v0.22.0)
---
updated-dependencies:
- dependency-name: anchore/sbom-action
dependency-version: 0.22.0
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump actions/checkout from 4 to 6 (#3520)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix: cilium, vpn & alpine issue (#3528)
* fix: platform version check for admin email prompt (#3523) (#3524)
* Fix platform version check
* Remove internal release check
* Fix unit tests
(cherry picked from commit 3ebb8e2dedcdca818c9c98bfca91a48c13afb8cd)
Co-authored-by: Nikola Prokopić <5638639+nprokopic@users.noreply.github.com>
* feat: add Claude Code GitHub Workflow (#3533)
* "Claude PR Assistant workflow"
* "Claude Code Review workflow"
* fix(ci): use pull_request_target for fork pr support (#3534)
Fork PRs can't access org secrets with pull_request trigger.
Using pull_request_target runs workflow in base repo context,
allowing access to ANTHROPIC_API_KEY for Claude reviews.
* Update platform version to v4.6.0-rc.12 (#3532)
(cherry picked from commit ba03f4013999049e2131bcb05c59c6f2fbe63580)
Co-authored-by: Nikola Prokopić <nikola.prokopic@loft.sh>
* fix(ci): use github token instead of oidc for claude review (#3539)
the oidc token exchange with claude github app was failing with
'invalid oidc token'. using explicit github_token bypasses the
oidc flow entirely.
also changed pull-requests permission from read to write so claude
can post review comments.
* feat: use LICENSE_TOKEN from env (#3541)
* vcluster platform config restructure (#3433)
* Bump deps
* Integrate new config and emit warnings when upgrading
* Adjust schema gen script
* Adjust chart templates and values to the new format
* Remove legacy convert config command
* Adjust GH workflows to the new flow without vcluster-config
* Get rid of unnecessary comments
* fix: check platform fields in IsProFeatureEnabled
* feat: sync containers resources inplace resize on host cluster (#3494)
Increase k8s version in devspace.yaml for k8s distro to v1.35.0
* Update minimum platform version to 4.6.0 (#3540) (#3543)
(cherry picked from commit 3d3005dace9f8aa418a8f042eec530ec47df45af)
Co-authored-by: Nikola Prokopić <5638639+nprokopic@users.noreply.github.com>
* fix: fix pods/resize verbs in chart role templates (#3544)
* fix(ci): checkout pr code instead of base branch in claude review (#3545)
pull_request_target defaults to checking out the base branch HEAD,
not the PR's actual code. Claude was reviewing main branch content
and flagging "missing" changes that the PR itself introduced.
Using ref: head.sha ensures we checkout the PR commit. fetch-depth: 0
provides full history for diff/blame operations. Fork handling step
configures origin correctly for external contributor PRs.
Pattern adopted from vcluster-docs/.github/workflows/claude-review.yml
Closes DEVOPS-501
* refactor: only give nodes/proxy permissions if proxy kubelets (#3546)
* chore(deps): bump actions/checkout from 4 to 6 (#3553)
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)
---
updated-dependencies:
- dependency-name: actions/checkout
dependency-version: '6'
dependency-type: direct:production
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump anchore/sbom-action from 0.22.0 to 0.22.1 (#3552)
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.22.0 to 0.22.1.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.22.0...v0.22.1)
---
updated-dependencies:
- dependency-name: anchore/sbom-action
dependency-version: 0.22.1
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* fix: check if deletion.auto is configured (#3554)
* [create-pull-request] automated change (#3550)
* refactor: replace generic plugin with comprehensive pr review (#3555)
Generic code-review plugin missed RBAC issues in PR #3551 that Codex
caught. Plugin uses 80% confidence threshold and diff-only focus which
filtered out cross-file correlation issues.
Switching to comprehensive review pattern from claude-code-action
examples with:
- Custom prompt with explicit focus areas
- Progress tracking enabled
- Fork handling for external contributors
- Direct tool access instead of plugin abstraction
Closes DEVOPS-525
* feat(linear-sync): filter issue ids by known linear team keys (#3537)
* feat(linear-sync): filter issue ids by known linear team keys
linear-sync regex matches patterns like XX-123 which catches false
positives (pr-3354, snap-1, build-99). these generate unnecessary api
calls and log noise when the tool tries to look up non-existent issues.
solution fetches valid team keys (eng, doc, devops, etc) at startup via
ListTeams() and filters extracted issue ids before linear api lookup.
only ids with known team prefixes proceed to GetIssueDetails().
Closes DEVOPS-475
* chore: trigger ci
* fix(ci): add github_token to claude-code-action (#3557)
Without github_token, the action defaults to OIDC authentication
which requires id-token: write permission. PR #3555 accidentally
removed this line while refactoring the prompt.
Fixes claude-review check failures on all new PRs.
* test: Cleanup restored PVC in the volume restore e2e test (#3556)
* Cleanup restored PVC in the volume restore e2e test
* Increase snapshot test timeout as snapshot creation can take longer sometimes
* feat: Add configuration for resourceClaim and deviceClasses (#3551)
* Reconcile resourceclaim and resourceclaimtemplates in pod. (#3561)
* feat: Add configurations for resourceclaimtemplates sync to host
* feat: Add resourceclaims and resourceclaimtemplates name translation in pod syncer
* fix: add resourceclaim status role and fix issue in resourceClaim translation
* feat: Disable resourceclaim controller in vcluster when resourceClaimTemplate reconciliation is enable
* feat:
- Add and register mappers for resourceclaims, resourceclaimtemplates and device classes.
- Reconcile resourceclaims in pod only when appropriate configs are enabled
* Docker: run modprobe only when overlay/bridge/br_netfilter are not loaded (#3562)
* chore: update dependencies for k8s v1.35.0 (#3496)
* chore: update dependencies for k8s v1.35.0
* fix: add terminalSizeQueueWrapper to fix lint error
bump coreDNS default image to coredns/coredns:1.14.1
* chore: Update controller-runtime to v0.23.0,
Add Apply method to blockingcacheclient
Add Apply method to pluginhookclient
Add util helper for ApplyConfiguration interface
Add missing methods to fakeManager
* chore: update deprecated record.EventRecorder to events.EventRecorder
* fix: specify apiGroup for events in chart templates
fix functions in util/testing/manager.go
* fix: make status.QOS immutable during pod sync
* fix: update fluent-bit chart version in test and fix fluent-bit repository
* chore: update etcd version in go.mod
* fix: fix linter issues on events.GetEventRecorder
* fix: add patch verbs for events permission
* chore: update kine version in Dockerfile
* chore: update ingress-nginx chart version for e2e tests to 4.14.2
* fix: avoid removing finalizers on configmap in snapshot processing when configmap is already deleted
Make qosClass immmutable in podSyncer
* fix: * improve logic in syncer and snapshot reconciliation to handle e2e tests failures
* Improve logging and assertion in snapshot e2e test
* fix: ensure qosClass assignment is immutable to stabilize CI e2e tests
* fix: ensure qosClass immutability in pod syncing (fix for e2e tests too)
* fix: Remove changes to snapshothandler.go volumeSnapshot cleanup logic
* fix: fix review comments by codex :
* Fix issues with cache implementation in in blockingcacheclient and pluginhookclient
* Make volumeSnapshotUpdateResult logic in snapshothandler more clear.
* Fix: Code review fixes on management of Conflict error during reconciliation
* fix: use same verbs for events.k8s.io as legacy events in role
* fix: code review fixes
* fix: missing netstat (#3567)
* fix: missing netstat
* refactor: make sure docker driver is mentioned
* Remove ingress-nginx (#3566)
Removed option for ingress nginx installation from CLI. Removed unused
ingress nginx config options from vcluster config. The ingress nginx
annotations that are used by the control plane ingress will remain
deprecated but will not be removed until, ideally, their functionality
is mostly duplicated with an alternative.
* Update CODEOWNERS (#3571)
* feature(debug-shell): introduce debug shell command (#3572)
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
* Revert removal of ingress-nginx (#3575)
We are not ready to remove support for ingress-nginx because we
do not have a suitable replacement that can support all important
functionality.
* fix: (#3573)
- fix controller setting for privatesNodes and virtualscheduler
- translate resourceclaimStatuses in pod
* feat: ensure apiresource exist when dra features sync are enabled (#3576)
* fix: default config to pass pro check (#3577)
* fix: default config to pass pro check
* fix: helm chart rbac templates
* chore: update CODEOWNERS (#3579)
* chore(deps): bump anchore/sbom-action from 0.22.1 to 0.22.2 (#3565)
Bumps [anchore/sbom-action](https://github.com/anchore/sbom-action) from 0.22.1 to 0.22.2.
- [Release notes](https://github.com/anchore/sbom-action/releases)
- [Changelog](https://github.com/anchore/sbom-action/blob/main/RELEASE.md)
- [Commits](https://github.com/anchore/sbom-action/compare/v0.22.1...v0.22.2)
---
updated-dependencies:
- dependency-name: anchore/sbom-action
dependency-version: 0.22.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore(deps): bump tj-actions/changed-files from 47.0.1 to 47.0.2 (#3584)
Bumps [tj-actions/changed-files](https://github.com/tj-actions/changed-files) from 47.0.1 to 47.0.2.
- [Release notes](https://github.com/tj-actions/changed-files/releases)
- [Changelog](https://github.com/tj-actions/changed-files/blob/main/HISTORY.md)
- [Commits](https://github.com/tj-actions/changed-files/compare/v47.0.1...v47.0.2)
---
updated-dependencies:
- dependency-name: tj-actions/changed-files
dependency-version: 47.0.2
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* chore: bump MinimumVersionTag to v4.7.0
Bumps MinimumVersionTag to the new Platform release v4.7.0.
Ref: ENGPROV-236
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit b78126d731ef7d5d01ee9bd85897d0520d281f69)
* chore: disable dependabot in favor of renovate (#3589)
Renovate now handles dependency updates for this repo.
Removing dependabot config to avoid duplicate PRs.
* feat(ci): publish head chart (#3591)
Publishes the head version of the Helm chart on every commit to main.
Triggers only if the Helm chart was changed.
* fix(chart): quote .Release.Name in label values to prevent YAML int coercion (#3592)
When the Helm release name is a purely numeric string (e.g. "1"), YAML
parsers interpret unquoted label values as integers rather than strings.
This causes yaml.UnmarshalStrict to fail with a type-mismatch error in
the vCluster pod at startup.
Add the `quote` pipeline function to all `release:` and
`vcluster.loft.sh/managed-by:` label values that use .Release.Name
directly as a standalone value in:
- chart/templates/statefulset.yaml
- chart/templates/service.yaml
- chart/templates/networkpolicy.yaml
- chart/templates/etcd-statefulset.yaml
- chart/templates/etcd-service.yaml
- chart/templates/pod-disruption-budget.yaml
Resolves ENG-8736
Co-authored-by: pascal.breuninger (🤖) <noreply@loft.sh>
* chore(ci): remove slack release notification from vcluster (#3594)
tagging order changed: vcluster is now tagged first, then vcluster-pro.
the notification should fire on the last tag so all assets are ready.
moved to vcluster-pro repo instead.
References DEVOPS-571
* refactor: remove k3s (#3595)
* chore(deps): bump github.com/cloudflare/circl from 1.6.1 to 1.6.3 (#3606)
Bumps [github.com/cloudflare/circl](https://github.com/cloudflare/circl) from 1.6.1 to 1.6.3.
- [Release notes](https://github.com/cloudflare/circl/releases)
- [Commits](https://github.com/cloudflare/circl/compare/v1.6.1...v1.6.3)
---
updated-dependencies:
- dependency-name: github.com/cloudflare/circl
dependency-version: 1.6.3
dependency-type: indirect
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* refactor: make pulling binaries more resilient (#3608)
* fix(docker): wait for systemd readiness before installing vcluster (#3610)
* fix(docker): wait for systemd readiness before installing vcluster
the install-standalone.sh script calls systemctl daemon-reload and
systemctl enable --now vcluster.service. when docker exec runs
immediately after docker run, systemd may not have started dbus yet
inside the container, causing "Failed to connect to bus: No such file
or directory".
this is a race condition that affects CI environments (GitHub Actions
runners) more frequently due to nested containerization overhead and
cgroup v2 systemd driver. kind avoids this by waiting for systemd
readiness before running kubeadm.
add waitForSystemd() that polls `systemctl is-system-running` for up
to 60s before calling installVClusterStandalone. this eliminates the
race without adding unnecessary delay — the poll interval is 500ms
and exits as soon as systemd reports "running" or "degraded".
Closes DEVOPS-591
* fix(docker): move systemd readiness check into install command
the standalone waitForSystemd() Go function issued a separate docker
exec to poll systemd before the install script ran. per review feedback,
consolidate the check into the same bash -c invocation that runs
install-standalone.sh. this eliminates an extra docker exec round-trip
and keeps the wait logic co-located with the command that needs it.
* fix: redacted secret Data in logs when secret is update, Change Update logs to debug (#3611)
* chore(ci): check MinimumVersionTag in the release pipeline (#3593)
Checks if MinimumVersionTag contains a stable release.
* feat(certs): regenerate leaf certificates when expiring soon (#3626)
* feat(certs): regenerate leaf certificates when expiring soon
* chore: remove extra cert files handling
* ci(justfile): migrate local dev targets from kind to vind (#3616)
vind (vCluster Docker driver) provides native Docker image access,
eliminating the need for explicit `kind load docker-image` steps.
It also uses the cluster name directly as the node hostname instead
of the `<name>-control-plane` convention, simplifying configuration.
Changes:
- Rename create-kind/delete-kind targets to create-vind/delete-vind
- Replace kind CLI commands with vcluster docker driver commands
- Remove kind load docker-image step (vind shares images natively)
- Update sed node hostname replacement from vcluster-control-plane to vcluster
- Replace KIND_NAME env var with HOST_NODE_NAME
References DEVOPS-579
* feat(standalone): add put with CAS and initialize etcd client for private nodes (#3631)
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
* Use workload sleep if the VirtualClusterInstance has been annotated for it (#3602)
* Use workload sleep if the VirtualClusterInstance has been appropriately annotated for it
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* Fix cluster lookup and add force duration
* Add workload only wake
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
---------
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* refactor: wait for cp node join & better logging (#3634)
* chore(ci): remove unused claude issue_comment workflow (#3628)
workflow was never successfully running (all runs show action_required/skipped).
also had no author_association gate — any github user could trigger @claude
on public issues, consuming API credits without authorization.
part of pull_request_target security audit.
Closes DEVOPS-634
* [create-pull-request] automated change (#3632)
* fix(ci): prevent version check from breaking alpha release pipeline (#3636)
* fix(ci): prevent version check from breaking alpha release pipeline
GitHub Actions' implicit success() evaluates the entire transitive
needs chain. When check_minimum_version_tag was skipped via job-level
if (for alpha/pre-release tags), all downstream jobs also evaluated
as not-successful, breaking the release pipeline for alpha releases.
Move the alpha/pre-release condition to step-level if statements so
the job always runs (satisfying the needs chain) but skips its steps
when the tag contains '-'. Remove the workaround condition on the
publish job.
* fix(ci): restore version-tag guard on publish job
the previous commit removed the entire job-level if from publish,
which also dropped the startsWith(github.ref, 'refs/tags/v') guard.
non-version tags could have triggered the full publish pipeline.
restore the v-tag guard at job level — only the alpha-skip condition
needed to move to step level on check_minimum_version_tag.
* feat: Support storing snapshots in Azure Blob (#3630)
* Add Azure SDK for go (blob storage module)
* Add azidentity go module
* Implement Azure blob support
* Add "azure" prefix to Azure-specific flags
* Separate SAS token from blob URL
* Add unit test for parsing Azure blob URL
* Fix snapshot get
* Propagate CLI context when making Azure calls
* Use default Azure credentials instead of Azure CLI credentials
* Use func to create RestoreClient
* Move CreateStore
* Add link to Azure SDK go docs
* Add snapshot option property to mark snapshot delegation
* Create SAS only when specified by options.DelegateFromCLIToCluster
* Extract getting blob info
* Remove BlobClientInfo struct
* Use default credentials when SAS is not set
* Change AccountURL to AccountName
* Create blob client with storage key
* Set DelegateFromCLIToCluster options flag when restoring
* Move Azure client funcs to client.go
* Don't export client funcs
* Use GetBlobInfo instead of getBlobInfo
* split blob url path more efficiently and clearly into 2 parts
* delete unused func
* Rename GetBlobInfo to getBlobInfo
* Add unit test for parsing blob url
* Add unit test for Azure options
* Update comment
* Fix SAS token expiry
* go mod tidy
* Fix linter errors
* Fix getting snapshots from a container
* Refactor blob client creation
* Fix vendor/modules.txt
* fix: ensure that toleration update are propagated to host cluster (#3627)
* fix: ensure that toleration update are propagated to host cluster. add unit and e2e tests
* fix: ensure that host pod tolerations are kept during syncing,
ensure that enforcedTolerations are not duplicated when already defined on virtual pod
* fix: fix gate vsriable to decide when to sync toleration to include case when host cluster missed virtual or enforced tolerations
* fix: fix duplicate tolerations with different tolerationSeconds bug while reconciling pod
* fix: cleanup up e2e_enforce_tolerations , suimplify sync tolerations logic
* feat: Add platform add standalone command (#3605)
* feat: Add standalone subcommand
* Fix checkServiceIsRunning detection and Use CommandContext for all exec commands
* Added comments for the standalone helper functions
* Set insecure default value to false
* Fixed standalone install example
* Log control-plane readiness checks unexpected status code errors
* Install vcluster binaries via atomic replace
* Limit configPartialUnmarshal scope to relevant config section
* Install vcluster-cli link to /usr/local/bin/vcluster
* Fixed typo in function name
* Fix systemctl enable and start service
* Revert token create --control-plane output changes
* Fix typo in the standalone install command description
* Check for min tmp space only for the install command
* Allow extra-env value with = sign
* Use os.Lstat to check the status of the cli link
* Fix download control plane bundle logic to ignore generatable keys
* Fix cli download and fips flag
* Moved standalone install golang implementaiton to a tmp branch
Moved standalone install golang to https://github.com/cbalan/vcluster/tree/standalone-install-golang
* Reverted change not relevant to platform add command
* Use loft logger for platform add standalone output
* fix(ci): skip version bump PRs for older platform releases (#3645)
* fix(ci): skip version bump PRs for older platform releases
the update-platform-minimum-version workflow blindly replaced
MinimumVersionTag with any incoming semver tag, even if it was
older than the current value. this caused downgrade PRs like #3643
(v4.4.3 replacing v4.7.0).
add a version comparison step using sort -V that reads the current
MinimumVersionTag from pkg/platform/version.go, compares it with the
incoming tag, and skips the sed replacement and PR creation when the
incoming tag is not newer.
Closes DEVOPS-651
* fix(ci): pin third-party actions to SHA and pass inputs via env
actions/checkout and peter-evans/create-pull-request were referenced
by mutable tag which is a supply chain risk. pin both to full commit
SHAs per loft-sh github actions conventions.
also move client_payload.tag interpolation from run: blocks to env:
mappings to prevent script injection via crafted tag values.
* test(ci): add version comparison test for platform version bump workflow
validates that the sort -V based comparison correctly skips older and
equal versions while allowing newer versions to proceed. covers all
three dimensions (major, minor, patch) for both skip and proceed cases.
* Validate endpoint URLs during token creation
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* Use kubeadm's parsing
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* E2e framework next (#3387)
* chore(tests): e2e-next framework init
# Conflicts:
# go.mod
* chore(tests): use cluster instead of envfuncs
* chore(tests): vcluster setup upgrade to use ctx
* chore(test): upgrade test_k8sdefaultendpoint
* chore(test): upgrade test_helm_charts
* chore(test): rewrite test_init_manifests to new framework
* chore(test): lint fixes
* chore(tests): fix after test execution
* chore(test): cleanup
* chore(tests): Add e2e-labels command to Justfile
* chore(test): e2e-next rename host clusterName
* chore: skip e2e-next linter errors and from unit tests
* chore(tests): fix after CR, update framework version
# Conflicts:
# go.mod
# go.sum
# vendor/modules.txt
* chore(test): lint fixes
* chore(tests): Added WithGeneratedName
* chore(test): framework upgrade
* chore(test): framework upgrade
* chore(test): removed generated name
* chore(tests): rename deploy to install
* chore(test): cleanup
* chore(tests): move vcluster conf to files
* chore(tests): cleanup
* chore(tests): fixes after CR
* chore(tests): Fix AfterAll removing vcluster check
* chore(tests): fixes after CR
* feat(ci): add ginkgo e2e pipeline for PRs (#3350)
Adds a new pipeline to run Ginkgo E2E tests
* chore(test): fixes and workflow updates
* chore(test): change vcluster provider to internal
* chore(test): add debug
* chore(test): debug
* Update e2e-ginkgo.yaml
* chore(test): debug
* chore(tests): debug
* chore(test): debug
* chore(test): add vcluster cli os.Getenv
* chore(tests): Changed withversion to withpath
* chore(test): try static vcluster path
* test: refactor suite to use suite dependencies and concurrent setup / teardown
* test: run tests concurrently
* test: run CI tests concurrently
* test: set CI ginkgo e2e permissions
* test: update framework
* chore(lint): fix e2e-framework fetch in lint (#3525)
* chore: remove unused code
* chore: minor cleanup tasks
* chore: e2e-framework version update
* chore: extend timeout for k8sdefaultendpoint test
* chore(e2e-next): updates after PR
* chore(e2e-net): Framework version upgrade
---------
Co-authored-by: Adrian <adrian.kabala@loft.sh>
Co-authored-by: Dmytro Sydorov <dmytro.sydorov@loft.sh>
* chore(test): remove ordered from test (#3653)
* fix validation for endpoint scheme and ipv6
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* fix: change log update level to debug (#3618)
* docs: add vind as local development option alongside kind (#3615)
* docs: add vind as local development option alongside kind
vind (vCluster Docker driver) provides a simpler local dev experience —
no explicit image loading needed, consistent with CI environment.
References DEVOPS-579
* docs: remove nonexistent just delete-vind e2e step
The vind e2e section referenced `just delete-vind` which has no
matching recipe in the Justfile, and `just e2e` is wired to
create-kind/delete-kind. Replace with a note clarifying that e2e
currently requires Kind.
* Add CODEOWNERS entry for e2e-next directory (#3654)
* fix(ci): skip homebrew upload for non-latest stable releases (#3613)
* fix(ci): skip homebrew upload for non-latest stable releases
Maintenance releases (e.g. v0.31.1) were overwriting newer stable
releases (v0.32.0) in the homebrew tap. GoReleaser's skip_upload: auto
only guards against pre-releases, not older stable versions released
after a newer one.
Add a semver comparison step that detects when the current tag is not
the highest stable tag and passes --skip=brew to GoReleaser, preventing
the tap overwrite while still building and publishing all other assets.
Closes DEVOPS-630
* fix(ci): compare against tap version instead of git tags
The previous approach compared the release tag against the latest git
tag, which blocked ALL older-branch patches from reaching homebrew —
even when the tap hadn't been updated yet.
Now fetches the actual version from the homebrew-tap formula and only
skips if the release would downgrade it. This means:
- v0.31.1 after v0.32.0 is in the tap → skipped (prevents downgrade)
- v0.31.1 when tap still has v0.31.0 → allowed (upgrades the tap)
- Tap unreachable → falls back to allowing upload
* fix(ci): use correct goreleaser skip flag --skip=homebrew
GoReleaser v2 CLI accepts --skip=homebrew, not --skip=brew.
* fix(ci): fail release when brew tap version check is unreachable
the previous code fell through to a warning when curl failed to fetch
the tap version, silently allowing the upload. if an old patch release
was being published and the curl happened to fail, it would overwrite
the latest version in the homebrew tap with no visible signal (pipeline
stays green).
add retry with exponential backoff (3 attempts, 2/4/8s delays) and
exit 1 when all retries are exhausted to make the pipeline go red.
* chore(e2e-next): Migrate e2e/servicesync (#3657)
* chore(e2e-next): Migrate e2e/servicesync
ENGQA-184 subs: ENGQA-185 and ENGQA-186
* chore(e2e-next): lint fix
* chore(e2e-next): remove servicesync tests
* chore(e2e-next): fixes after CR
* chore(e2e-next): revert Ordered for servicesync
* fix(snapshot): add timeout and socket cleanup to restore kine startup (#3642)
* fix(snapshot): add timeout and socket cleanup to restore kine startup
the setLatestRevisionSQLite function had an infinite loop waiting for
kine to create the sqlite database file. if kine failed to start (e.g.
due to a stale socket from a previous run), the loop would spin forever,
causing the entire snapshot restore e2e test suite to hang until the
40-minute go test timeout killed it.
two fixes:
- remove stale /data/kine.sock before starting kine to prevent
"address already in use" errors
- replace infinite os.Stat polling loop with a select that watches
doneChan (kine exit), a 30s timeout, and the file creation ticker
Closes DEVOPS-641
* fix(snapshot): remove goto and drain doneChan on kine timeout
the goto in the kine startup wait loop was poor go practice. replaced
with a boolean flag to break the select loop cleanly.
also drains the unbuffered doneChan on the timeout path to prevent a
goroutine leak — StartKineWithDone always sends on doneChan when
RunCommand exits, and an unbuffered channel blocks the sender forever
if nobody receives.
* resolve cve caused by zlib dependency (#3672)
* fix(ci): allow lint workflow to pass for fork PRs (#3671)
The `if: github.repository_owner == 'loft-sh'` condition was intended to
skip forks but never worked — for pull_request events the workflow always
runs on the base repo, so repository_owner is always 'loft-sh'.
Fork PRs fail because GH_ACCESS_TOKEN is unavailable and `go mod tidy`
cannot resolve the private loft-sh/e2e-framework dependency.
Fix: set GOFLAGS=-mod=vendor at job level so all Go commands use the
committed vendor directory. Skip the git auth and mod-tidy verification
steps for fork PRs (detected via head.repo.full_name != repository).
Internal PRs still get the full mod-tidy check with GOFLAGS reset.
Closes DEVOPS-652
* Enhance PR template with E2E-next testing guidelines
Added instructions for E2E tests and custom test suites.
* chore(e2e-next): migrate syncer/fromhost/configmaps test (#3683)
* chore(e2e-next): migrate syncer/fromhost/configmaps test
Closes ENGQA-176
* chore(e2e-next): lint fix
* chore(e2e-next): fixes after CR
* fix: Check Azure flags in snapshot CLI commands (#3678) (#3685)
* Print correct `vcluster snapshot get` command for Azure
* Check if Azure CLI flags are set when needed
* Fix args validation in `vcluster snapshot get` command
(cherry picked from commit d309271b29808ac4605a34c8001786aa56eeeb9e)
Co-authored-by: Nikola Prokopić <5638639+nprokopic@users.noreply.github.com>
* Fix Palatform typo
* chore: bump default platform to v4.8.0 (#3691)
(cherry picked from commit 6a551fb74a801b05f67933154a1727b5cf462b90)
Co-authored-by: Ryan Swanson <ryan.swanson@loft.sh>
* chore(e2e-next): Added e2e ginkgo nightly jobs (#3686)
* chore(e2e-next): Added e2e ginkgo nightly jobs
* chore(e2e-next): fixes after comments
* chore(e2e-next): fix after code review
* chore(e2e-next): Fix nightly build tag name (#3705)
* Add Claude Code agent setup configuration
Add hooks, skills, references, rules, and scripts for e2e test
migration workflow along with project-level CLAUDE.md and Justfile.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: adopt e2e-tdd-workflow plugin from ai-skills marketplace
Move shared e2e TDD infrastructure (hooks, skills, scripts) to the
e2e-tdd-workflow plugin in loft-sh/ai-skills. This repo now consumes
the plugin via marketplace registration in settings.json.
Removed (now provided by plugin):
- 6 hook scripts (.claude/hooks/)
- 2 skills (implement-e2e-test, migrate-e2e-test)
- 2 scripts (scan_builders.sh, scan_labels.sh)
Kept repo-local:
- e2e-tdd-workflow.md (repo-specific build cycle)
- All rules/ (plugin system doesn't support rules/)
- All references/ (repo-specific addendums)
- e2e-migration-validator skill (repo-specific)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: remove e2e-framework-conditional-deps reference (now in plugin)
Identical to the plugin's references/e2e-framework-conditional-deps.md.
The vCluster-specific cluster mapping table at the end was already
documented in clusters/clusters.go.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(e2e-next): re-render YAML templates after flag parsing
* fix: Add Azure flags to restore command (#3709)
* Add missing Azure flags to restore command
* feat: add support for tlsroute (#3711)
* chore(e2e-next): Migrate fromhost secret test (#3704)
* chore(e2e-next): Migrate fromhost secret test
* fix(e2e-next): use HaveKeyWithValue for better failure diagnostics
* ci: add scheduled check for homebrew tap version drift (#3712)
* ci: add scheduled check for homebrew tap version drift
the release pipeline (PR #3613) guards against tap downgrades during a
release, but drift can also happen from manual tap edits, goreleaser
bugs, or other causes outside the pipeline. this daily workflow
compares the tap formula version against the latest stable release
and alerts #eng-releases on Slack when they diverge.
checks both vcluster and vcluster-experimental formulas.
Closes DEVOPS-654
* fix(ci): avoid false drift alert when experimental formula fetch fails
when fetch_formula_version returns empty for vcluster-experimental,
the v prefix alone ("v") passed the -n check and triggered a spurious
slack alert. only emit the output key when a version was actually
retrieved.
* refactor(ci): use shared action for brew tap drift check
extracts fetch/compare/notify logic into a reusable composite action in
loft-sh/github-actions. the vcluster workflow becomes a thin wrapper
that passes repo, formulas, and webhook url. reduces duplication and
makes the action available to other repos with homebrew taps.
Closes DEVOPS-654
* Revert "refactor(ci): use shared action for brew tap drift check"
This reverts commit 8082036031bb2fc16f52e0de980f1fbd93353bc7.
* chore: trigger ci
* fix(ci): address pr review feedback on brew tap drift check
- pin slack action to commit hash (zizmor unpinned-uses)
- add slack-notifications environment for secret access (zizmor secrets-outside-env)
- use gh release list native --exclude-pre-releases/--exclude-drafts flags
- normalize v prefix before version comparison to avoid false drift
- add pr-triggered test job to run drift tests on pipeline changes
- add test cases for mixed v-prefix scenarios (10 tests, all pass)
* refactor(ci): extract brew tap drift logic into sourceable script
the test was reimplementing the comparison logic rather than testing
the actual code the workflow runs. extracting to a shared script means
tests exercise the real functions (normalize_version, compare_versions,
fetch_formula_version) and the workflow calls the same code. 17 tests
now cover normalization, single-formula drift, and experimental formula
drift paths.
* chore(e2e-next): Print syncer logs if they are failures (#3715)
* chore: scope devops codeowners to production release pipelines (#3716)
* chore: scope devops codeowners to production release pipelines
the blanket /.github/workflows/ rule on line 9 came AFTER specific
e2e/release rules, overriding them due to last-match-wins semantics.
this meant devops approval was required for QA e2e workflow changes,
blocking the QA team unnecessarily.
fix: reorder so the blanket devops rule comes first, then specific
overrides for e2e (QA) and release (CTO office) come after. matches
the pattern already used in loft-enterprise.
* fix: keep devops as owner of release workflow
the release.yaml override removed devops from the production release
pipeline since last-match-wins semantics. add devops back alongside
cto-office so both teams review release changes.
* fix: use team handle for backport workflow ownership
replace personal handle @Piotr1215 with @loft-sh/devops-team for
backport.yaml ownership. codeowners should use team handles, not
individual accounts.
* chore(e2e-next): Migrated priorityclasses to e2e-next (#3717)
* chore(e2e-next): Migrated priorityclasses to e2e-next
* chore(e2e-next): fixes after CR
* chore(e2e-next): Fixes after CR
* chore(e2e-next): fixes after CR
* chore(e2e-next): remove ordered from test_servicesync (#3718)
* feat: add nodeMonitoring feature (#3714)
* fix(vind): Added values to the platform virtual cluster instance (#3724)
* feat: add runtimeClassName to vCluster (#3728)
* Migrate all test/e2e tests to e2e-next (#3726)
* chore(e2e-next): Migrate all e2e tests
* chore(e2e-next): Create a test suites
* chore(e2e-next): Updates
* chore(e2e-next): fixes
* chore(e2e): fix ginkgo execution
* chore(e2e-next): Fix snapshot test
* chore(e2e-next): fix after CR
* chore(e2e-next): fix snapshot test
* chore(e2e-next): fix snapshot test
* chore(e2e-next): fix snapshot test
* chore(e2e-next): fixes after cr and disable old test suites
* chore(e2e-next): fixes
* chore(e2e-next): Fixes after e2e/ migrations (#3730)
* chore(e2e-next): Fixes after e2e/ migrations
* chore(e2e-next): fixes after CR
PR labels added
* chore(e2e-next): lint fix
* chore(e2e-next): Fixes after CR
* chore(e2e-next): fixes after cr
* test(e2e-next): add vind test spec (#3706)
* test(e2e-netx): add vind test spec
* fix: do not modify the default kubeconfig file
* fix: rebase conflicts
* chore(e2e-next): Migrate metricsproxy tests (#3732)
* chore(e2e-next): Migrate metricsproxy tests
* chore(e2e-next): fix issues
* chore(e2e-next): exclude old metrics_proxy tests
* fix(platform): clear platform config section instead of deleting entire config.json (#3722)
`vcluster platform destroy` previously removed the entire ~/.vcluster/config.json,
wiping unrelated settings like driver type. now only the platform section is reset
to its default, preserving all other config.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: ensure that results are scoped to vcluster context when calling /api/v1/nodes/{node}/proxy/... paths (#3723)
* fix: ensure that results are scoped to vcluster context when calling /api/v1/nodes/{node}/proxy/... paths
* fix: add e2e tests for kubelet proxy
* feat: Support runningpods path for nodes/proxy in shared mode and add e2e tests
* fix(e2e-next): use exported function pattern for kubelet proxy test (#3748)
* fix(e2e-next): use exported function pattern for kubelet proxy test
Convert test_kubelet_proxy.go from auto-registration (var _ = Describe)
to the exported function pattern (DescribeKubeletProxy(vcluster)) that
all other tests in the test_core/sync package use.
The auto-registration pattern hardcodes clusters.KubeletProxyVCluster
as a package-level dependency. When vcluster-pro imports any function
from the test_core/sync package (e.g. DescribePodSync), Go compiles the
entire package - including test_kubelet_proxy.go. Its var _ = Describe
runs at init time and registers a test against kubelet-proxy-vcluster,
which is not provisioned in the pro test environment. This causes
"cluster not found in context" failures for all 5 kubelet proxy specs.
The fix follows the same pattern as every other test in the package:
- Export DescribeKubeletProxy(vcluster suite.Dependency) bool
- Accept the vcluster as a parameter instead of hardcoding it
- Register it explicitly in suite_e2e_test.go
This way each repo (OSS, pro, enterprise) decides which tests to run
against which clusters via their suite files.
* chore(e2e-next): lint fix
* chore(e2e-next): fixes after review
* fix(ci): add bridge netfilter and parallel procs to nightly e2e (#3736)
* fix(ci): add bridge netfilter and parallel procs to nightly e2e
the nightly workflow was missing two things that the PR workflow
(run-ginkgo-e2e action) already had:
1. bridge netfilter modules - without bridge-nf-call-iptables,
NetworkPolicy enforcement on Kind does not work correctly.
isolation mode tests use networkPolicy: enabled which requires
this kernel module for pod-to-pod and pod-to-apiserver traffic.
2. --procs=8 - nightly was running with the default procs=1
(sequential). this matches the PR action which already uses
--procs=8.
* fix(ci): use large-8_32 runner for nightly e2e
ubuntu-latest (2 CPU, 7 GB) cannot run 9+ vclusters concurrently -
vcluster pods fail to start due to resource exhaustion on the Kind
node. switch nightly to large-8_32 (8 CPU, 32 GB) self-hosted runner.
also add actionlint config for the self-hosted runner label.
* fix: vCluster version message (#3750)
* fix: vcluster version command (#3752)
Signed-off-by: Marcin Franczyk <marcin0franczyk@gmail.com>
* doc(e2e): add convention when to use gstruct
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore(e2e-next): Update framework depenency (#3764)
* chore: replace @loft-sh/cto-office with @loft-sh/eng-tech-leads in codeowners (#3760)
the eng-tech-leads team replaces the cto-office team for code review
ownership across the repository.
Closes DEVOPS-716
* Add osimages and sshkeys to platform destroy resources
Signed-off-by: Ryan Swanson <ryan.swanson@loft.sh>
* fix: allow external api ingress and skip unused network policies for private nodes (#3765)
When network policies are enabled, the control plane NetworkPolicy blocks
all external access to the vCluster API (kubectl, ingress controllers,
Gateway API, Konnectivity agents), requiring users to manually add an
ingress rule for port 8443. Add a default open ingress rule for port
8443/TCP so the API is reachable out of the box.
Skip the workload (vc-work-*) and DNS (vc-kube-dns-*) NetworkPolicies
when privateNodes is enabled, since no workload or CoreDNS pods run on
the host in that mode — the policies target zero pods and are no-ops.
Closes ENGNODE-267
* ci(release): use large runner for publish job (#3772)
The publish job runs GoReleaser for multi-arch Go compilation and Docker
image builds, consuming ~20 min on standard 2-vCPU runners. Switching to
large-8_32 (8 vCPU, 32GB RAM) — proven pattern from e2e workflows — to
cut release time roughly in half.
Closes DEVOPS-715
* feat(ci): add slack alert on release pipeline failure (#3763)
* feat(ci): add slack alert on release pipeline failure
Closes DEVOPS-680
* fix(ci): harden release failure notification
use always() for skipped-dep resilience, add permissions: {}, pin action SHA
* chore(e2e-next): Add custom linter to E2E-next tests (#3759)
* chore(e2e-next): Add custom linter to E2E-next tests
and extend logging for setup func
ENGQA-521
* chore(e2e-next): fix lint builds
* chore(e2e-next): lint fix
* chore(e2e-next): add missing describefunc in readme
* chore(e2e-next): Fixes after CR
* fix(linear-sync): add stable release comment dedup (#3758)
* fix(linear-sync): add stable release comment dedup
Without this check, the linear-sync tool would add duplicate "Now
available in stable release" comments on issues that were already
released in a pre-release. This backports the dedup logic from
vcluster-pro and loft-enterprise copies.
Closes DEVOPS-713
* fix(linear-sync): scope stable release comment dedup to specific tag
the prefix-only check ("Now available in stable release") was too broad
and would skip posting comments for later stable releases if any prior
stable release comment existed. now checks for the specific release tag
so cherry-picks released in a later version still get their own comment.
addresses codex review feedback on #3758.
* chore(e2e-next): Migrate e2e_certs tests (#3776)
* chore(e2e-next): Migrate e2e_certs tests
* chore(e2e-next): lint & ci fix
* chore(e2e-next): fixes
* chore(e2e-next): lint fixes
* chore(e2e-next): Runner change to large-8-32 (#3785)
* chore(e2e-next): Runner change to large-8-32
* chore(e2e-next): fix runs-on
* chore(e2e-next): next refactor describe functions to accept labels (#3786)
* chore(e2e-next): next refactor describe functions to accept labels
* chore(e2e-next): update runs-on
* chore(e2e-next): add non default suite
* ENGPLAT-399 Add --secure flag for TLS verification (#3781)
* ENGPLAT-399 respect platform insecure config
fix: respect platform insecure config instead of hardcoding InsecureSkipVerify
Thread config.Platform.Insecure through all call sites so
TLS verification is on by default and only skipped when the user
explicitly opts in via --insecure.
Closes ENGPLAT-399
* fix: preserve insecure TLS fallback during platform bootstrap
The start command is the bootstrap flow — the platform always serves a
self-signed certificate at this stage. Rather than reading
config.Platform.Insecure (which defaults to false for fresh installs and
has no --insecure flag on `vcluster platform start`), bootstrap probes
and login should handle self-signed certs directly:
- Readiness probes (port-forward, router, reachability) always skip TLS
verification since they are unauthenticated health checks against a
just-installed instance
- Login tries secure first, falls back to insecure on TLS error, and
persists the decision to config for future CLI operations
The Transport(insecure), IsLoftReachable(insecure), and clihelper/http
test additions from the original PR are preserved — post-bootstrap
commands (login, connect) still respect config.Platform.Insecure.
* Revert " fix: preserve insecure TLS fallback during platform bootstrap"
This reverts commit d5181a10edbdd414a042b7221dab229630f1053b.
* feat: add --insecure flag to vcluster platform start
Allows users to skip TLS certificate verification during bootstrap when
the platform serves a self-signed certificate. The flag sets
config.Platform.Insecure in memory so all downstream health checks and
login calls respect it, and LoginWithAccessKey persists the value for
future CLI operations.
* Switch flag from insecure to secure
Defaults to current insecure behavior to preserve current bootstrapping
functionality.
* chore(e2e-next): Test refactor
* chore(e2e-next): Fix custom linters for fork PRs (#3784)
* fix(cli): respect admin override for requireTemplate in vcluster platform create (#3725)
* fix(cli): respect admin override for requireTemplate in platform create vcluster
when a project has requireTemplate enabled, non-admin users are now given
a clear early error instead of the misleading "vcluster is using a template"
message from flag validation.
admin users (those with create permission on virtualclusterinstances/restricted)
bypass the template requirement via a SelfSubjectAccessReview check, matching
the existing behaviour in the UI and the backend validator.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(cli): move namespace conversion into canCreateVClusterWithoutTemplate
Move projectutil.ProjectNamespace() call inside canCreateVClusterWithoutTemplate
so callers pass the project name directly, per code review feedback.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* chore(e2e-next): Migrate e2e_cli tests (#3797)
* chore(e2e-next): Migrate e2e_cli tests
* chore(e2e-next): move cli to separate vcluster
* chore(e2e-next): Add cli vcluster
* chore(e2e-next): timeout fix
* chore(e2e-next): fixes
* chore(e2e-next): add background-proxy=false
* chore(e2e-next): fixes
* chore(e2e-next): fix tests
* chore(e2e-next): fixes
* Add SecurityContext config options to AutoUpgrade (#3796)
* feat: add snapshot and restore support for Docker driver (vind) (#3790)
Add the ability to snapshot and restore Docker-based vClusters (vind).
This exports all Docker volumes (etcd state, kubelet data, PVC data)
and the vCluster config directory into a single portable .tar.gz file.
Supports local files and remote OCI/S3 registries:
vcluster snapshot create my-cluster ./snap.tar.gz --driver docker
vcluster snapshot create my-cluster oci://ghcr.io/org/repo:tag --driver docker
vcluster create my-cluster --restore ./snap.tar.gz --driver docker
vcluster create my-cluster --restore oci://ghcr.io/org/repo:tag --driver docker
Restore streams tar entries directly to Docker and disk without buffering
the entire archive in memory. Config file paths are validated against
path traversal. Multi-node clusters are supported -- worker nodes with
pre-existing volumes skip the kubeadm join step.
New files:
- pkg/cli/snapshot_docker.go
- pkg/cli/restore_docker.go
- pkg/cli/snapshot_docker_test.go
Modified files:
- cmd/vclusterctl/cmd/create.go (--restore flag for Docker driver)
- cmd/vclusterctl/cmd/snapshot/create.go (--driver flag, Docker dispatch)
- cmd/vclusterctl/cmd/restore.go (--driver flag, Docker dispatch)
- pkg/cli/create_docker.go (skip join for restored worker volumes)
- pkg/cli/delete_docker.go (dockerVolumeExists helper)
* fix: version detection for snapshots (#3809)
* chore(e2e-next): Migrate pause resume tests (#3806)
* chore(e2e-next): Migrate pause resume tests
ENGQA-189
* chore(e2e-next): Remove unused text
* chore(e2e-next): Fixes after CR
* chore(e2e-next): fix kubecontext
* chore(e2e-next): Add ordered to lifecycle test
* fix: snapshot issues (#3811)
* ci: migrate clean-github-cache to loft-sh/github-actions (#3815)
Replace local workflow with reusable workflow from loft-sh/github-actions.
SHA-pinned to clean-github-cache/v1 (36bd60e).
* ci: migrate cleanup-backport-branches to loft-sh/github-actions (#3816)
Replace local workflow with reusable workflow from loft-sh/github-actions.
SHA-pinned to cleanup-backport-branches/v1 (36bd60e).
* ci: migrate backport to loft-sh/github-actions (#3817)
Replace local workflow with reusable workflow from loft-sh/github-actions.
SHA-pinned to backport/v1 (36bd60e). Forwards gh-access-token secret
for backport PR creation.
* ci: migrate actionlint to loft-sh/github-actions (#3818)
Replace local workflow with reusable workflow from loft-sh/github-actions.
SHA-pinned to actionlint/v1 (36bd60e). Preserves github-pr-check reporter.
Job-level permissions ensure called workflow receives correct token scope.
* chore(e2e-next): fix snapshot test (#3814)
* chore(e2e-next): fix snapshot test
* chore(e2e-next): fix after CR
* chore(e2e-next): remove NetworkPolicyEnforcementSpec test (#3813)
* chore(e2e-next): remove NetworkPolicyEnforcementSpec test
Remove the NetworkPolicy enforcement test and all related infrastructure
(NonDefault label, nondefault suite, CI references) because this test has
never been executed in CI since it was written in 2021.
Evidence:
- Original commit (c06d24d6f, 2021-12-06): Calico CNI was commented out
with "Disabling the use of Calico plugin due to unstability"
- CI workflow (d3633b9fc, 2022-06-09): Added --ginkgo.skip='.*NetworkPolicy.*'
with comment "Skips NetworkPolicy tests because they require network plugin
with support (e.g. Calico)"
- ENG-9904 (PR #3374, 2025-11): Enabled policies.networkPolicy in vcluster
config but explicitly noted "disabled in HA tests due to known DNS/kindnet
issues" - no Calico was added to CI
- Nightly workflow: excluded via !non-default label filter
- Slack (#release-engineering, 2025-07-15): Team confirmed --ginkgo.skip
for NetworkPolicy is standard practice
The test requires Calico CNI to enforce NetworkPolicies but CI always used
kindnet which does not enforce them. A proper test with Calico infrastructure
should be created as a new ticket.
* chore(e2e-next): lint fix
* Updates security policy (#3792)
* chore(e2e-gingko): fix test run cancel when editing a PR description (#3789)
* Fix CLI commands for tenant clusters with zero pods (#3757)
* Fix finding clusters with zero pods
* Fix pause/resume for already scaled-down vClusters
When a vCluster StatefulSet/Deployment is already at 0 replicas but not
paused, the scaleDown functions now set the paused annotation instead of
skipping the workload. This ensures resume can find and scale it back up.
The configured HA replica count is read from the vCluster config secret
so that resume restores the correct number of replicas.
* Return clear error when connecting to a scaled-down vCluster
Instead of hanging for 30 seconds waiting for pods that don't exist,
fail immediately with a message telling the user to pause and resume first.
* Refactor scaleDown functions to remove code duplication
Extract common annotation/patch logic into prepareScaleDown with
getReplicas/setReplicas helpers. Add logging when a workload was
already scaled down before pausing.
* Return clear error when resuming a scaled-down but not paused vCluster
Guide the user to pause the tenant cluster first before resuming,
so the pause annotation and replica count are properly recorded.
* Skip virtual cluster collection in debug collect when scaled down
Instead of failing with a timeout, log a warning and still collect
host-side info (release, logs, host resources).
* Reject adding a scaled-down vCluster to the platform
A scaled-down tenant cluster would show up as Starting in the
platform, which is misleading. Return an error asking the user
to scale it up first.
* Detect platform-paused vClusters via sleep mode label
The platform's sleep mode controller sets loft.sh/sleep-mode label
instead of loft.sh/paused annotation. Check both in getVCluster so
the status is correctly reported as Paused.
Also guard connect's auto-resume against platform-paused clusters,
returning a clear error directing the user to the platform driver.
* Gate ScaledDown status on workload spec.replicas, not pod…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What issue type does this pull request address? (keep at least one, remove the others)
/kind bugfix
/kind enhancement
/kind feature
/kind documentation
/kind test
What does this pull request do? Which issues does it resolve? (use
resolves #<issue_number>if possible)resolves #
Please provide a short message that should be published in the vcluster release notes
Fixed an issue where vcluster ...
What else do we need to know?
E2E Tests
Default Test Execution
The mandatory PR suite runs automatically. Only specify additional test suites below if needed.
Adding New Test Suites
When adding a new ginkgo test suite:
Additional test suites
Additional test suite(s) that will be executed before the mandatory PR suite: