fix(ci): run full non-e2e suite, drop -m v2 blind spot (#669)#697
Conversation
The backend CI gate ran `pytest -m v2`, but the root conftest never auto-marked v2, so ~514 legit tests (40% of files) were silently deselected and never gated a PR. The v1->v2 refactor that the filter existed for is done. Categorized the 514 deselected tests: 497 pass, 16 skip, 1 is a real-LLM `lifecycle` test (correctly excluded). No dead tests remain (the v1 /api/projects pair was removed in #667). Changes: - CI: `-m v2` -> `-m "not lifecycle"` so every non-e2e test gates PRs; only real-LLM lifecycle tests (run locally via scripts/lifecycle) are excluded. Timeout 15->20m for the ~500 extra tests under coverage. - pytest.ini: document the new two-tier gate; v2 marker kept for back-compat but no longer gates. - tests/auth/test_query_param_token.py: sign JWTs with the LIVE manager.SECRET instead of the import-time binding. Un-hiding the suite surfaced this order-dependent flake: any test that starts the app via TestClient refreshes SECRET from .env (lifespan), which made import-bound tokens unverifiable. Full non-e2e run: 3830 passed, 11 skipped, 6 deselected (lifecycle), 0 failed.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
WalkthroughThe CI ChangesCI Test Marker Switch and Auth Token Fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Code Review — fix(ci): run full non-e2e suite, drop -m v2 blind spotOverviewThis PR fixes a significant CI blind spot: the What is good
Minor observations1. Sentinel default in _make_token The type annotation says 2. 126 files still carry the v2 marker They will continue to run (they were always gated). But now that the marker is no longer a CI gate, 3. Watch the first few CI runs post-merge The PR reports 497 pass for the newly un-hidden tests. Worth watching whether any have latent fixture or env-var sensitivities in CI that do not show locally. The PR description implies this was verified, but the first live CI run will confirm. VerdictThis is a correctness fix to the CI gate with clear pre-merge verification. The code changes are minimal and tightly scoped. The type annotation nit is optional cleanup. Approve with minor notes. Nothing here should block merge. |
Follow-up — CLAUDE.md documentation updatePicking up from my prior review (which approved with minor notes). The latest commit updates
This is exactly the right kind of update to make alongside a CI gate change. The developer guide now matches what CI actually runs, so contributors won't be confused about what to verify before opening a PR. My prior approval stands. Nothing in this update changes the analysis — the CLAUDE.md delta is correct and consistent with the rest of the PR. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@CLAUDE.md`:
- Line 156: Update line 156 in CLAUDE.md to align the documented CI gate command
with the configuration in pyproject.toml. Change the command from `uv run pytest
--ignore=tests/e2e -m "not lifecycle"` to `uv run pytest tests/
--ignore=tests/e2e -m "not lifecycle"` by adding the explicit `tests/` directory
argument to match the exact command specified in pyproject.toml for consistency
and clarity.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Follow-up Review — No new concernsConsistent with my prior two reviews on this PR. The diff now shows all four files changed:
The one minor nit from my first review still stands: Nothing new to flag. This is ready to merge. |
Closes #669.
Problem
The backend CI gate ran
pytest tests/ --ignore=tests/e2e -m v2, but the roottests/conftest.pynever auto-marksv2. So-m v2silently deselected ~514tests (~40% of files) — they never gated a PR. The
v1→v2refactor the filterexisted for is finished.
What I did
Categorized the 514 deselected tests (
-m "not v2"): 497 pass, 16 skip, 1fails — and that 1 is a real-LLM
lifecycletest, excluded from CI by design.=> No dead tests remain (the v1
/api/projectspair was already removed in#667). The blind spot was ~500 legit tests that simply never ran in CI.
Changes
.github/workflows/test.yml):-m v2→-m "not lifecycle". Everynon-e2e test now gates PRs; only real-LLM
lifecycletests are excluded (theyneed a key + cost money — run locally via
scripts/lifecyclebefore the PR).timeout-minutes15 → 20 for the ~500 extra tests under coverage.pytest.ini: document the two-tier gate. Thev2marker is kept forback-compat but no longer gates — anything non-e2e/non-lifecycle runs by default,
so the suite can't silently shrink again.
tests/auth/test_query_param_token.py: sign JWTs with the livemanager.SECRETinstead of the import-time binding.Two-tier test strategy (explicit going forward)
-m "not lifecycle"— fast, no API key, environment-correctness.scripts/lifecycle --mode cli|api|web|all— the real-LLMend-to-end tests. Not skipped, just a separate tier.
Why the test change was needed
Un-hiding the suite surfaced a real order-dependent flake.
test_query_param_tokenbound
SECRETat import and signed tokens with it, butget_current_userverifiesagainst the live
manager.SECRET. Any test that starts the app viaTestClientrefreshes the secret from
.env(lifespan →refresh_secret), making import-boundtokens unverifiable → 3 happy-path tests 401. CI's backend job has no
.env, so itwas a local-only landmine, but it's now fixed and deterministic.
Verification
-m "not lifecycle"): 3830 passed, 11 skipped, 6 deselected(lifecycle only), 0 failed in 8m06s (was 3333 gated / 514 blind).
adapters + health + query_param): 3 failed → 44 passed after the fix.ruff checkclean.Acceptance criteria
not lifecycle).Known limitations
lifecycletests still don't run in CI by design (real LLM). They're covered bythe local
scripts/lifecycletier, not this gate.Summary by CodeRabbit
lifecycletests.lifecycleCI selection behavior.