Skip to content

Commit 0692b15

Browse files
raullenchaiclaude
andcommitted
test(pr_validate): pin _extract_findings → _split_findings_by_tier pipeline
DeepSeek round 3 surfaced a real integration gap: the tier splitter and the numbered-list extractor are tested in isolation, but the contract between them (extractor strips leading "1. ", splitter sees "[BLOCKING]" at start) was never asserted end-to-end. If someone later refactors `_extract_findings` to preserve the leading list number, every finding would default to BLOCKING and tiering would silently break. Added a single test that feeds a realistic 4-finding model review (mixed 2 BLOCKING + 2 NIT) through both helpers and asserts the split is 2/2, not 4/0. Test fails loudly with a diagnostic if the contract regresses. This was DeepSeek's only legitimate concern across both round-3 findings (the other was the same point about test coverage — addressed by this same test). Convergence achieved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent fa4f3ce commit 0692b15

1 file changed

Lines changed: 35 additions & 1 deletion

File tree

tests/test_pr_validate_google_review.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,10 @@
2727
from scripts.pr_validate.steps.cl_description_quality import (
2828
CLDescriptionQualityStep,
2929
)
30-
from scripts.pr_validate.steps.deepseek_review import _split_findings_by_tier
30+
from scripts.pr_validate.steps.deepseek_review import (
31+
_extract_findings,
32+
_split_findings_by_tier,
33+
)
3134

3235
# ---------------------------------------------------------------------------
3336
# _split_findings_by_tier
@@ -117,6 +120,37 @@ def test_prefix_with_extra_whitespace(self):
117120
assert blocking[0].startswith("a.py")
118121
assert nits[0].startswith("b.py")
119122

123+
def test_pipeline_extract_then_split_realistic_model_output(self):
124+
"""End-to-end pipeline test: raw model review text →
125+
``_extract_findings`` → ``_split_findings_by_tier``. The two
126+
helpers must compose correctly — if ``_extract_findings`` ever
127+
starts preserving the leading list number, the tier-prefix
128+
regex would silently fail to match and EVERY finding would
129+
default to BLOCKING (defeating tiering). This test pins the
130+
contract between the two helpers."""
131+
review = (
132+
"1. [BLOCKING] vllm_mlx/routes/chat.py:918 — `assert isinstance(_msg, dict)`\n"
133+
" is stripped under `python -O`, leaving the guard inert in production.\n"
134+
"2. [NIT] tests/test_x.py:42 — assertion is loose; use `result.status_code`.\n"
135+
"3. [BLOCKING] vllm_mlx/engine.py:55 — race condition on shared dict.\n"
136+
"4. [NIT] tests/test_y.py:9 — comment could be clearer.\n"
137+
)
138+
findings = _extract_findings(review)
139+
assert len(findings) == 4, (
140+
f"_extract_findings produced {len(findings)} findings, expected 4"
141+
)
142+
143+
blocking, nits = _split_findings_by_tier(findings)
144+
# The contract: 2 blocking + 2 nits — NOT 4 blocking (which
145+
# would mean leading numbers leaked through and broke tiering).
146+
assert len(blocking) == 2, (
147+
f"got {len(blocking)} blocking — leading list number likely "
148+
f"leaked: {blocking!r}"
149+
)
150+
assert len(nits) == 2, f"got {len(nits)} nits, expected 2: {nits!r}"
151+
assert "chat.py" in blocking[0]
152+
assert "engine.py" in blocking[1]
153+
120154

121155
# ---------------------------------------------------------------------------
122156
# CLDescriptionQualityStep

0 commit comments

Comments
 (0)