vtorc: pass --cell flag on Vitess v24+#787
Conversation
`--cell` was introduced as a VTOrc flag in Vitess v24 (vitessio/vitess#19047) and becomes a hard requirement in v25 (vitessio/vitess#20048) where vtorc returns FAILED_PRECONDITION if the flag is empty. Add a `MajorVersionFromImage` helper under `pkg/operator/vitess` that parses the major version from the image tag, and have vtorc's `flags()` emit `--cell=<cell>` only when the parsed major is >= 24. Rolling tags (e.g. `vitess/lite:mysql80`), `latest`, digest-only references, and unparseable tags do not get the flag. Pre-v24 users are unaffected; users on rolling tags can still force `cell` via `ExtraFlags` if needed. Resolves #786 Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Replace the custom `^v(\d+)\.` regex with `semver.ParseTolerant` from `github.com/blang/semver/v4`, which was already an indirect dependency. ParseTolerant strips the optional `v` prefix and correctly treats suffixes like `-rc1-mysql80` as SemVer prerelease identifiers. The image-tag extraction (drop digest, split on last `:`) stays manual since the tag itself is what we want to feed semver. Test matrix is unchanged for our real-world tags; the previously rejected `vitess/lite:24.0.0` (no `v` prefix) now parses to v24, which is the more correct behaviour. Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
The initial cluster YAML (`101_initial_cluster.yaml`) pins `vitess/lite:v24.0.0-rc1-mysql80` for vtorc, which is on the v24+ side of the operator's new --cell version gate. Add a `checkPodSpecBySelectorWithTimeout` assertion that --cell=zone1 shows up in the rendered vtorc pod spec. The assertion is intentionally placed before `upgradeToLatest`, which swaps to the rolling `mysql80` tag where the operator can't determine the major version and therefore (correctly) omits the flag. Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
There was a problem hiding this comment.
Pull request overview
This PR updates the VTOrc Deployment generation to pass the --cell flag when running Vitess versions that support (and in v25+, require) it, avoiding startup failures on upgrade while keeping compatibility with pre-v24 images that reject the flag.
Changes:
- Add
vitess.MajorVersionFromImage(image)to extract the Vitess major version from versioned image tags (and return “unknown” for rolling/digest-only refs). - Gate VTOrc’s
--cellflag emission onmajor >= 24, while still allowingExtraFlagsto override. - Add unit tests for the version parsing helper and the VTOrc flag gating, plus an e2e assertion that v24-based VTOrc pods include
--cell.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
pkg/operator/vitess/version.go |
Introduces image-tag major version parsing used for version-gated behavior. |
pkg/operator/vitess/version_test.go |
Adds test coverage for version parsing across tags/RCs/rolling/digest-only refs. |
pkg/operator/vtorc/deployment.go |
Emits --cell only when the configured VTOrc image is Vitess v24+. |
pkg/operator/vtorc/deployment_test.go |
Verifies base flags and --cell gating behavior for VTOrc. |
test/endtoend/upgrade_test.sh |
Ensures the v24 initial cluster’s VTOrc pod spec includes --cell=zone1. |
go.mod |
Adds a direct dependency on github.com/blang/semver/v4 for version parsing. |
docs/release-notes/2_18_0_summary.md |
Documents the new --cell auto-application behavior and its gating. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
mhamza15
left a comment
There was a problem hiding this comment.
I feel slightly iffy on parsing the version from the image tag. I understand there isn't any existing pattern for gating on version, but maybe we should introduce a stronger pattern, maybe adding a new field in the CRD(s)? It's probably worth asking some of our fellow Kubernetes operator experts on how this is usually implemented, maybe they have some good ideas.
What's this?
Resolves #786 — VTOrc's
--cellflag is required in Vitess v25+ and the operator doesn't currently set it. After upgrading to v25 every VTOrc managed by this operator wouldFAILED_PRECONDITIONat startup 😱Why version-gating instead of always passing it?
--cellwas introduced in Vitess v24 (vitessio/vitess#19047) and became required in v25 (vitessio/vitess#20048). On≤v23the flag doesn't exist — and I verified againstupstream/release-22.0thatvtorcuses a plain cobraMainwith noFParseErrWhitelistorDisableFlagParsing(neither ingo/cmd/vtorc/cli/cli.gonorgo/vt/servenv/), so passing--cellto ≤v23 vtorc would CrashLoopBackOff withunknown flag: --cell. We need to only emit the flag when we know the binary supports it.How it works
vitess.MajorVersionFromImage(image)parses the major version from the image tag (handlesvitess/lite:v24.0.0-mysql80, RC suffixes, digest-only refs, rolling tags)vtorc.Spec.flags()calls the helper againstspec.Imageand adds"cell": spec.Cellonly whenok && major >= 24ExtraFlagscontinues to override base flags (the existing loop atdeployment.go:125-130runs after the base map), so users on rolling tags or who need a custom value can still force itspec.Cellis populated fromtabletPool.Cellatpkg/controller/vitessshard/reconcile_vtorc.go:145— there's already one vtorc Deployment per(cluster, keyspace, shard, cell)tuple, so the value is always the cell that this particular vtorc is responsible forWhat's covered
vitess/lite:v25.0.0-mysql80--cellvitess/lite:v24.0.0-rc1-mysql80--cellvitess/lite:v23.0.5-mysql80--cellvitess/lite:mysql80(rolling)--cellvitess/lite:latest--cellvitess/lite@sha256:...(digest)--cell--cellRelated Issue(s)
Resolves: #786
Test plan
go test ./pkg/operator/vitess/... ./pkg/operator/vtorc/... -count=1(12 cases for the version helper, 9 for the vtorc flag gate)go build ./...cleangofumpt -d+goimports -lcleanvitess/lite:mysql80(rolling) which by design won't get--cell. When the e2e tag pins to a v25+ release it will start emitting automaticallyRelease notes
Added a callout in
docs/release-notes/2_18_0_summary.mdClaude Code assisted with the implementation and prepared this PR summary — I provided direction and reviewed