Skip to content

test: kill mutants#78

Merged
deviantintegral merged 12 commits into
mainfrom
kill-mutants
Mar 9, 2026
Merged

test: kill mutants#78
deviantintegral merged 12 commits into
mainfrom
kill-mutants

Conversation

@deviantintegral
Copy link
Copy Markdown
Owner

No description provided.

deviantintegral and others added 5 commits March 8, 2026 19:39
Add tests for kebab_name() covering lowercase conversion,
underscore-to-hyphen replacement, and all enum members.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…surviving mutants

Add direct tests for _decode_temperature and _check_length helpers.
Kill survived mutants for encode_temperature modulus, decode_temp_unit
unit=None, decode_flame_effect pulsating & vs |, and
encode_heat_settings boost floor max(0,...) vs max(1,...).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ecycle, _request, and turn_on/off

Add tests covering _parse_fire_features (all 24 fields), client
__init__/__aenter__/__aexit__ lifecycle, _request auth/error handling,
get_fires/get_fire_overview field-level verification, write_parameters,
and turn_on/turn_off with mode/temperature preservation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tighten assertions in widget format and CLI display tests to use exact
string equality instead of substring checks. This kills mutants that
prepend/append "XX" to string literals (e.g., label names, separator
characters, color names). Also adds boundary tests for duration=1 to
kill `> 0` vs `> 1` mutations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Verify exact subprocess.run arguments in _resolve_version tests to kill
string and kwarg mutations. Add _expand_flame unit tests for gap
distribution, style application, and boundary conditions. Add heat
indicator wave character tests for _build_fire_art.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@deviantintegral deviantintegral changed the title Kill mutants test: kill mutants Mar 9, 2026
deviantintegral and others added 6 commits March 8, 2026 20:16
… mutants

- Use exact separator width checks (== instead of in) for all display functions
- Add error byte isolation tests to kill bitwise OR→AND mutants
- Verify timer off-at time is in the future (kills +→− and format mutants)
- Add exact Yes/No and Active Faults line checks
- Verify main() passes args correctly to async_main and asyncio.run
- Add _expand_flame boundary tests for zero-weight and remaining_weight edge cases
- Use exact rounding assertion for _convert_to_celsius
- Verify _log_response preserves body preview with += (kills +=→= mutant)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use duration=120 to kill // 60 → // 61 mutant (120//60=2 vs 120//61=1)
- Use exact log record message comparison to kill format string XX mutant

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ometry

- Verify exact heat row wave character content (kills ≈→XX≈XX, ~→XX~XX)
- Check heat row bright_red style (kills style mutations)
- Verify heat rows reduce flame budget while maintaining total height
- Assert dim style on all structural frame elements (top edge, borders, hearth)
- Test flame row centering, min width, and trailing pad calculations
- Test exact boundary: flame_rows_effective == num_defs (kills >= vs >)
- Verify default parameter values produce expected styling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ions

- Check ALL frame chars (│┌┐└┘─▁) have exactly 'dim' style (not just first)
- Verify flame row inner width equals iw exactly (kills lead/trail mutations)
- Assert leading/trailing padding is spaces only (kills ' '→'XX XX')
- Verify centering: lead <= trail + 1 (kills +body_w and //3 mutations)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…etry

- Use exact style equality for heat row bright_red check
- Add narrow width test (w=20) to trigger min_w binding constraint
- Add wide width test (w=80) to kill //2 → //3 centering mutant

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…exts

- Verify prompt output, star echoing, terminal restore, and raw mode setup
- Assert stdin.read called with exactly 1 character
- Check backspace sequence (\b \b) written to stdout
- Verify parser description contains expected strings
- Verify --verbose flag has descriptive help text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@deviantintegral
Copy link
Copy Markdown
Owner Author

deviantintegral commented Mar 9, 2026

Generated by Claude Code, has been reviewed

PR Comparison: Copilot #76 vs Our #78

Scope

Aspect Copilot #76 Our #78
Lines added 3,114 ~2,500
Files changed 7 9
New test files 0 0

Files unique to Copilot: test_auth.py (278 lines of TokenAuth, MsalAuth._build_app, MsalAuth._save_cache tests)

Files unique to Ours: test_b2c_login.py, test_fireplace_visual.py, test_tui_screens.py

Overlapping files (both modify): test_cli_commands.py, test_client.py, test_models.py, test_protocol.py, test_tui_actions.py, test_widgets_format.py

Key Differences by Area

1. Test Assertion Quality

  • Ours consistently converts in checks to exact == comparisons (e.g., lines[1] == " " + "─" * 40, result[0][0] == "[bold]Mode:[/bold] ")
  • Copilot uses a mix - many tests still use in checks (e.g., "─" * 40 in out, "[322] Flame Effect" in out) which won't kill XX-prefix/suffix mutants
  • Ours modifies existing tests to strengthen them; Copilot mostly adds new test classes alongside existing ones, leaving the weak originals in place

2. Separator Width Mutants (* 40* 41)

  • Ours uses exact lines[1] == " " + "─" * 40 across all 11 display functions - directly kills the mutant
  • Copilot still uses "─" * 40 in out which matches both * 40 and * 41 (substring is still present) - does NOT kill these mutants

3. Timer Arithmetic Mutants

  • Ours uses duration=120 (120//60=2 vs 120//61=1) to kill // 60// 61, plus verifies off-at is in the future to kill +
  • Copilot adds multiple duration tests (45, 61, 90, 120) but doesn't verify the actual off-at time matches now + duration — it just checks format

4. Error Byte OR→AND Mutants

  • Ours adds test_error_only_byte3 and test_error_only_byte4 that specifically isolate bytes to kill |& mutants
  • Copilot adds test_error_only_byte2_set and test_error_only_byte3_set — similar coverage but misses byte4

5. _format_connection_state Style Mutants

  • Ours uses exact string equality: result == "[green]Connected[/green]" — kills "green""XXgreenXX"
  • Copilot uses the same exact approach here — both are equivalent

6. Auth Tests (Copilot Only)

Copilot adds significant auth testing:

  • TokenAuth callable vs string token tests
  • MsalAuth._build_app constructor arg verification (authority, validate_authority, token_cache)
  • MsalAuth._save_cache mkdir/write/log tests

These target unmapped mutants (mutmut can't map auth.py functions to tests), so they likely won't count in mutmut's results anyway. However, they're valuable for general test coverage.

7. Client Tests

Both add similar _parse_fire_features and client lifecycle tests. Our version is slightly more verbose (explicit all-24-fields check) while Copilot uses @pytest.mark.parametrize with a _FEATURES_MAP list — Copilot's approach is more DRY and thorough here (each key verified in isolation with a parametrized test).

8. Fireplace Visual Tests (Ours Only)

Our PR adds extensive _build_fire_art and _expand_flame tests:

  • Exact heat wave content verification
  • bright_red style checking on heat chars
  • Frame char dim style verification
  • Flame geometry/centering tests
  • _expand_flame boundary cases (zero weights, gap distribution)

9. Protocol Tests

Both test _decode_temperature, _check_length, and the pulsating bit-shift. Nearly identical test logic. Our version imports via _protocol_module while Copilot imports functions directly.

10. _masked_input Tests

Both restructure the helper to return mock objects and verify prompt, asterisks, backspace, termios, and tty calls. Very similar coverage.

11. _resolve_version Tests

Both add subprocess argument verification and dirty/clean tests. Copilot's version is slightly more comprehensive (13 test methods vs our 8 additions), but covers the same mutant scenarios.

Mutant Kill Effectiveness

Approach Kills String XX Mutants Kills Separator Width Kills Timer Arithmetic Kills Bitwise OR→AND
Ours Yes (exact ==) Yes Yes Yes
Copilot Partially (in checks remain) No Partially Partially

Recommendation

Our PR #78 is the better version for the stated goal of killing mutants, because:

  1. It modifies existing tests to strengthen assertions, rather than just adding new tests alongside weak ones. This is the single biggest difference — Copilot leaves the original weak in-check tests in place.

  2. It kills separator width mutants (* 40* 41) which Copilot does not, since "─" * 40 in " " + "─" * 41 is still True.

  3. It verifies timer off-at values against now + duration to kill the + mutant.

  4. It covers more visual/widget areas (fireplace art, heat rows, flame geometry) that Copilot skips entirely.

Copilot #76 has some advantages:

  • Auth tests (test_auth.py) that we don't have — but these target unmapped functions
  • More DRY parametrized _parse_fire_features test
  • Slightly more _resolve_version edge cases

Best option: Cherry-pick Copilot's test_auth.py additions and the parametrized _parse_fire_features test into our branch, since those are complementary additions we don't duplicate. The rest of Copilot's changes would be weaker replacements for what we already have.

@deviantintegral
Copy link
Copy Markdown
Owner Author

but these target unmapped functions

I think Claude is wrong here. If I revert those test additions in test_auth.py, the number of mutants escaping goes up.

Cherry-pick auth tests from Copilot PR #76 that complement our existing
mutation testing coverage with direct unit tests for auth internals.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@deviantintegral deviantintegral merged commit eeeab66 into main Mar 9, 2026
5 checks passed
@deviantintegral deviantintegral deleted the kill-mutants branch March 9, 2026 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant