Skip to content

vtorc: pass --cell flag on Vitess v24+#787

Open
timvaillancourt wants to merge 4 commits into
mainfrom
vtorc-cell-flag
Open

vtorc: pass --cell flag on Vitess v24+#787
timvaillancourt wants to merge 4 commits into
mainfrom
vtorc-cell-flag

Conversation

@timvaillancourt
Copy link
Copy Markdown
Member

What's this?

Resolves #786 — VTOrc's --cell flag is required in Vitess v25+ and the operator doesn't currently set it. After upgrading to v25 every VTOrc managed by this operator would FAILED_PRECONDITION at startup 😱

Why version-gating instead of always passing it?

--cell was introduced in Vitess v24 (vitessio/vitess#19047) and became required in v25 (vitessio/vitess#20048). On ≤v23 the flag doesn't exist — and I verified against upstream/release-22.0 that vtorc uses a plain cobra Main with no FParseErrWhitelist or DisableFlagParsing (neither in go/cmd/vtorc/cli/cli.go nor go/vt/servenv/), so passing --cell to ≤v23 vtorc would CrashLoopBackOff with unknown flag: --cell. We need to only emit the flag when we know the binary supports it.

How it works

  1. New helper vitess.MajorVersionFromImage(image) parses the major version from the image tag (handles vitess/lite:v24.0.0-mysql80, RC suffixes, digest-only refs, rolling tags)
  2. vtorc.Spec.flags() calls the helper against spec.Image and adds "cell": spec.Cell only when ok && major >= 24
  3. ExtraFlags continues to override base flags (the existing loop at deployment.go:125-130 runs after the base map), so users on rolling tags or who need a custom value can still force it

spec.Cell is populated from tabletPool.Cell at pkg/controller/vitessshard/reconcile_vtorc.go:145 — there's already one vtorc Deployment per (cluster, keyspace, shard, cell) tuple, so the value is always the cell that this particular vtorc is responsible for

What's covered

Image tag Result
vitess/lite:v25.0.0-mysql80 emits --cell
vitess/lite:v24.0.0-rc1-mysql80 emits --cell
vitess/lite:v23.0.5-mysql80 no --cell
vitess/lite:mysql80 (rolling) no --cell
vitess/lite:latest no --cell
vitess/lite@sha256:... (digest) no --cell
Empty / unparseable no --cell

Related Issue(s)

Resolves: #786

Test plan

  • go test ./pkg/operator/vitess/... ./pkg/operator/vtorc/... -count=1 (12 cases for the version helper, 9 for the vtorc flag gate)
  • go build ./... clean
  • gofumpt -d + goimports -l clean
  • e2e suite isn't updated — it uses vitess/lite:mysql80 (rolling) which by design won't get --cell. When the e2e tag pins to a v25+ release it will start emitting automatically

Release notes

Added a callout in docs/release-notes/2_18_0_summary.md


Claude Code assisted with the implementation and prepared this PR summary — I provided direction and reviewed

`--cell` was introduced as a VTOrc flag in Vitess v24
(vitessio/vitess#19047) and becomes a hard requirement in v25
(vitessio/vitess#20048) where vtorc returns FAILED_PRECONDITION if
the flag is empty.

Add a `MajorVersionFromImage` helper under `pkg/operator/vitess` that
parses the major version from the image tag, and have vtorc's
`flags()` emit `--cell=<cell>` only when the parsed major is >= 24.
Rolling tags (e.g. `vitess/lite:mysql80`), `latest`, digest-only
references, and unparseable tags do not get the flag. Pre-v24 users
are unaffected; users on rolling tags can still force `cell` via
`ExtraFlags` if needed.

Resolves #786

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Replace the custom `^v(\d+)\.` regex with `semver.ParseTolerant`
from `github.com/blang/semver/v4`, which was already an indirect
dependency. ParseTolerant strips the optional `v` prefix and
correctly treats suffixes like `-rc1-mysql80` as SemVer prerelease
identifiers.

The image-tag extraction (drop digest, split on last `:`) stays
manual since the tag itself is what we want to feed semver.

Test matrix is unchanged for our real-world tags; the previously
rejected `vitess/lite:24.0.0` (no `v` prefix) now parses to v24,
which is the more correct behaviour.

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
The initial cluster YAML (`101_initial_cluster.yaml`) pins
`vitess/lite:v24.0.0-rc1-mysql80` for vtorc, which is on the v24+
side of the operator's new --cell version gate. Add a
`checkPodSpecBySelectorWithTimeout` assertion that --cell=zone1
shows up in the rendered vtorc pod spec.

The assertion is intentionally placed before `upgradeToLatest`,
which swaps to the rolling `mysql80` tag where the operator can't
determine the major version and therefore (correctly) omits the
flag.

Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
@timvaillancourt timvaillancourt marked this pull request as ready for review May 12, 2026 15:36
Signed-off-by: Tim Vaillancourt <tim@timvaillancourt.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the VTOrc Deployment generation to pass the --cell flag when running Vitess versions that support (and in v25+, require) it, avoiding startup failures on upgrade while keeping compatibility with pre-v24 images that reject the flag.

Changes:

  • Add vitess.MajorVersionFromImage(image) to extract the Vitess major version from versioned image tags (and return “unknown” for rolling/digest-only refs).
  • Gate VTOrc’s --cell flag emission on major >= 24, while still allowing ExtraFlags to override.
  • Add unit tests for the version parsing helper and the VTOrc flag gating, plus an e2e assertion that v24-based VTOrc pods include --cell.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/operator/vitess/version.go Introduces image-tag major version parsing used for version-gated behavior.
pkg/operator/vitess/version_test.go Adds test coverage for version parsing across tags/RCs/rolling/digest-only refs.
pkg/operator/vtorc/deployment.go Emits --cell only when the configured VTOrc image is Vitess v24+.
pkg/operator/vtorc/deployment_test.go Verifies base flags and --cell gating behavior for VTOrc.
test/endtoend/upgrade_test.sh Ensures the v24 initial cluster’s VTOrc pod spec includes --cell=zone1.
go.mod Adds a direct dependency on github.com/blang/semver/v4 for version parsing.
docs/release-notes/2_18_0_summary.md Documents the new --cell auto-application behavior and its gating.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

@mhamza15 mhamza15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel slightly iffy on parsing the version from the image tag. I understand there isn't any existing pattern for gating on version, but maybe we should introduce a stronger pattern, maybe adding a new field in the CRD(s)? It's probably worth asking some of our fellow Kubernetes operator experts on how this is usually implemented, maybe they have some good ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: pass --cell flag to VTOrc deployments (required in Vitess v25+)

3 participants