ci: Update bump-version workflow to regenerate snapshot tests#552
Conversation
The bump-version workflow now includes: 1. Verification that pyproject.toml has exactly one line changed 2. A step to update snapshot tests when bumping the version 3. Verification that each snapshot SVG file has exactly one line changed This ensures that version bumps are clean and only affect the version string in pyproject.toml and the splash screen version in snapshot files. Changes: - Add verification that pyproject.toml diff has exactly 1 line change - Add 'Update snapshot tests' step that runs pytest with --snapshot-update - Add 'Verify snapshot diffs' step to check each SVG has exactly 1 line diff - Stage tests/snapshots/ files in the commit step - Update PR description to show diff verification status Co-authored-by: openhands <openhands@all-hands.dev>
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable with critical fixes needed
Verdict: ❌ Needs rework - Error hiding must be fixed before merge
Key Insight: Solves a real problem pragmatically, but the || true hides failures and could commit broken snapshots.
Removing the error suppression ensures the workflow fails if snapshot regeneration encounters any issues, preventing broken state from being committed.
The variable was captured and exported to env but never used - the PR body uses a hardcoded template instead. Removing dead code to reduce complexity.
The variable name now accurately reflects what it measures - the numstat total (added + deleted lines), not logical lines changed. Also improved the log message to clarify the expectation.
For consistency with the pyproject.toml check, renamed the variable to accurately reflect what it measures - the numstat total (added + deleted). Also improved log messages to clarify the diff lines terminology.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟡 Acceptable with critical concerns
Verdict: ❌ Needs rework - Brittle validations will cause false failures in legitimate scenarios
Key Insight: You're validating diff line counts instead of outcomes. Parse and verify what matters (version changed correctly), don't count implementation details (how many lines changed).
Instead of checking that exactly 1 line changed (which fails in legitimate scenarios like updating dependencies during release), verify that the version field has the expected value. This checks what matters, not how many lines changed.
The per-file diff line check was over-specified and would break when: - pytest-textual-snapshot changes SVG generation formatting - Version string appears in multiple SVG elements - Whitespace normalization occurs The workflow already runs snapshot tests - if they're broken, the tests will fail. Trust the actual test run to catch broken snapshots instead of over-specifying validation based on implementation details.
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste
Verdict: ✅ Ship it
Key Insight: Clean, pragmatic solution that addresses all previous feedback. Validates outcomes (version value) instead of process (diff line counts), trusts tools (pytest) to do their job, and fails fast if something breaks. No over-engineering, no special cases. This is exactly what the previous reviews pushed for - engineering fundamentals over theoretical perfection.
|
OpenHands bot, if you are reviewing this PR and believe that its worth merging please approve the PR as well |
all-hands-bot
left a comment
There was a problem hiding this comment.
Taste Rating: 🟢 Good taste
Verdict: ✅ Ship it
Assessment: Clean, pragmatic solution. All previous feedback addressed correctly:
- Validates outcomes (version value) instead of process (diff line counts)
- Trusts pytest to regenerate snapshots correctly
- Fails fast if snapshot update breaks (no
|| true) - No over-engineering or brittle special cases
The workflow solves a real problem (version appears in 25 splash screen snapshots) with a straightforward approach: run the snapshot update tool and commit the results. If something breaks, CI tests catch it.
Key Insight: This PR demonstrates good engineering taste - it does the necessary work without trying to validate every detail of how it happened. The previous iterations correctly removed brittle validation logic in favor of trusting tools and testing outcomes.
Summary
This PR updates the
bump-version.ymlworkflow to automatically regenerate snapshot tests when bumping the version, and adds verification that each file has exactly 1 line changed.Problem
The CLI displays its version in the splash screen (
OpenHands CLI v1.12.2), which is captured in 25 snapshot SVG files used for visual regression testing. Previously, when bumping the version, these snapshots would not be updated, causing snapshot tests to fail.Changes
1. pyproject.toml Verification (lines 71-82)
Added a check that verifies
pyproject.tomlhas exactly one line changed (the version line). If more lines are changed, the workflow fails with an error showing the diff.2. Snapshot Tests Update (lines 103-112)
Added a step that runs
pytest tests/snapshots/ --snapshot-updateto regenerate all snapshot SVG files with the new version.3. Snapshot Diff Verification (lines 114-138)
Added verification that each updated snapshot SVG file has exactly 1 line changed:
tests/snapshots/4. Enhanced PR Description (line 205)
The draft PR body now shows:
### pyproject.toml (exactly 1 line changed ✅)(each with exactly 1 line diff ✅)Example PR Description
When the workflow runs, it creates a PR with this description:
Other changes
uv.lockfile