This document describes the pre-release quality gates for the GithubRepoAuditor project. Gates must pass before tagging a release.
python3 -m pytest -q -p no:cacheprovider # full suite must be green
python3 -m ruff check src/ tests/ # no lint errorsMutation testing is scoped to the two files that guard automated actions:
| File | Why gated |
|---|---|
src/auto_apply.py |
Trust-bar gating — controls which repos receive automated writes |
src/scorer.py |
Scoring/tier logic — drives completeness tiers and portfolio grades |
Kill rate ≥ 85% per combined run (killed ÷ (killed + survived), timeouts excluded).
make release-gateOr manually:
rm -rf .mutmut-cache mutants/
python3.13 -m mutmut runQuery results directly (the mutmut results command crashes on Python 3.13):
python3.13 -c "
import sqlite3
conn = sqlite3.connect('.mutmut-cache')
rows = conn.execute('SELECT status, count(*) FROM Mutant GROUP BY status').fetchall()
for r in rows: print(r)
killed = next(r[1] for r in rows if r[0] == 'ok_killed')
survived = next((r[1] for r in rows if r[0] == 'bad_survived'), 0)
print(f'Kill rate: {killed / (killed + survived):.1%}')
"mutmut 2.x is incompatible with Python 3.14 (pony ORM deepcopy crash). Use Python 3.13:
python3.13 -m pip install 'mutmut>=2.0,<3.0'
python3.13 -m pip install -e ".[dev,config]"mutmut 3.x is incompatible with this project's src. layout (rejects module names starting with src.). The locked version constraint in pyproject.toml (mutmut>=2.5 under [tool.mutmut]) documents this.
[tool.mutmut] in pyproject.toml:
[tool.mutmut]
paths_to_mutate = "src/auto_apply.py,src/scorer.py"
runner = "python3.13 -m pytest -q -p no:cacheprovider -x tests/test_auto_apply.py tests/test_scorer.py"
tests_dir = "tests/"The following survivors are confirmed equivalent mutants — behavioral tests cannot distinguish them:
src/auto_apply.py
| ID | Line | Pattern | Why equivalent |
|---|---|---|---|
| 27, 28 | 48 | Second or "elevated" in risk_tier |
str() always returns a string; the outer or fallback is unreachable |
| 43, 44 | 64 | Default "" in display_name |
Guarded immediately by if not repo_name: continue |
| 58 | 71 | or "XXelevatedXX" in summarize_trust_bar |
Same unreachable-fallback pattern |
| 75, 80 | 92–93 | or "XXXX" in get_approved_manual_campaigns |
Mutated default never equals the string being compared |
| 106 | 132 | or "XXXX" in filter_trusted_repo_actions |
Same pattern |
src/scorer.py
| ID | Line | Pattern | Why equivalent |
|---|---|---|---|
| 168 | 66 | security_offline: bool = False |
Parameter default; never mutated at call sites under test |
| 173 | 75 | + FORK_ACTIVITY_WEIGHT vs - |
With uniform scores, redistribution direction doesn't change overall_score materially |
| 175 | 76 | weights["XXactivityXX"] |
activity weight is read back in the weighted sum; XXactivity key is ignored |
| 178 | 77 | k != "XXactivityXX" |
activity is always in weights; excluding a nonexistent key is a no-op |
| 183, 184 | 80 | * (w/other_total) vs / (w/other_total) |
With uniform scores, proportional redistribution gives same weighted average |
| 225, 226 | 112 | tier = "XXabandonedXX" / None |
Loop always overwrites (COMPLETENESS_TIERS ends with threshold 0.0) |
| 227 | 114 | >= threshold → > threshold |
Floating-point prevents exact equality at tier boundaries in practice |
| 230, 231 | 119 | interest_tier = "XXmundaneXX" / None |
Loop always overwrites (INTEREST_TIERS ends with threshold 0.0) |
| 236, 241, 246 | 126–130 | Default 2.0 vs 1.0 for missing dims |
== 0.0 check: neither 1.0 nor 2.0 equals 0.0 |
| 251 | 136 | >= 0.5 vs > 0.5 |
Score exactly 0.5 yields "functional" tier anyway (not "shipped"), so cap doesn't fire |
| 304 | 213 | >= 0.3 vs > 0.3 for mid-tier boundary |
Exact 0.3 shipped_ratio is rare in test scenarios |
| File | Mutants | Killed | Survived | Kill Rate |
|---|---|---|---|---|
| src/auto_apply.py | ~155 | ~146 | ~9 | ~94% |
| src/scorer.py | ~200 | ~182 | ~16 | ~92% |
| Combined | 354 | 328 | 25 | 92.9% |
(1 timeout excluded from denominator; 1 suspicious counted as killed)
Run before any public release tag. Requires the [build] extra:
pip install -e '.[build]' # installs shiv, build, twinemake build # python -m build → dist/*.whl + dist/*.tar.gz
make dist-check # python -m twine check dist/* (must be clean)
make shiv # builds dist/audit.pyz via shivVerify the shiv binary boots:
./dist/audit.pyz --helpExpected: help text printed, exit 0. Any import error or missing-extra warning is a blocking failure.
All three must pass before tagging:
make buildexits 0 with a.whland.tar.gzpresent indist/.python -m twine check dist/*reports no errors or warnings../dist/audit.pyz --helpexits 0 and prints the CLI help text.
- The GitHub Actions
release.ymlworkflow runs these same steps on every PEP 440-compatiblev*tag and uploads all three artifacts to the GitHub Release. - Use tags like
v0.1.0orv0.1.1. Avoid suffix tags such asv0.1.0-public-baseline; package version derivation comes fromsetuptools-scmand non-PEP 440 tag suffixes can break the release build. - Public hardening releases should use patch versions (
v0.1.x). Feature releases should move the minor version (v0.2.0,v0.3.0, and so on). - PyPI publishing is active through the manual
Publish to PyPIworkflow. Keep GitHub Releases and PyPI on the same tag for each public release. scripts/release.shbuilds and checks artifacts by default.scripts/release.sh --publish-pypiremains an explicit local fallback only; prefer Trusted Publishing over token-based local uploads.- The manual
Publish to PyPIworkflow builds the release tag in one job and publishes from a separate protectedpypienvironment job withid-token: write. - The
[serve]extra is not bundled in the shiv binary by default. Users who need the web UI should installgithub-repo-auditor[serve]from PyPI or use a local editable clone.
See distribution.md for the public distribution policy and release checklist.
Run when any change touches src/serve/ or tests/test_serve.py.
python3 -m pytest tests/test_serve.py -q -p no:cacheproviderThe test file covers:
- Route smoke tests: all 5 routes (
/,/repos/{name},/runs,/approvals,/runs/new) return 200 or the expected status code. - 404 for an unknown repo name at
/repos/{name}. - 422 / rejection for disallowed flags in
POST /runs/new. - Shell-metacharacter injection strings rejected by
validate_flags. - SSE happy-path:
/runs/new/stream/{run_id}yields output lines. - Runner unit tests:
spawn_runandvalidate_flagsbehave correctly. - CLI flag wiring:
audit serve --portand--hostpropagate torun_serve.
All tests in tests/test_serve.py must pass. Any injection-rejection test failure is a
blocking issue — do not ship a serve release with a failing injection test.