WIP feat(coverage): Add coverage.py code coverage collection#87
WIP feat(coverage): Add coverage.py code coverage collection#87sohil-kshirsagar wants to merge 13 commits intomainfrom
Conversation
When TUSK_COVERAGE_PORT env var is set, the SDK starts a tiny HTTP server that manages coverage.py. On each /snapshot request: - Stop coverage, get data, erase (reset), restart - Returns per-file line counts with clean per-test data - No diffing needed (coverage.py supports stop/erase/start cycle) Works with Flask, FastAPI, Django, gunicorn, uvicorn - any framework, because the SDK runs inside the app process. Requires: pip install coverage (or pip install tusk-drift[coverage])
When /snapshot?baseline=true is called, uses coverage.analysis2() to get ALL coverable statements (including uncovered) for the denominator. Regular /snapshot calls only return executed lines (for per-test data).
- Threading lock protects stop/get_data/erase/start sequence - stop_coverage_server() for clean shutdown, integrated into SDK shutdown() - Module-level server reference for proper cleanup
- Enable branch=True in coverage.py initialization - Extract branch data via cov._analyze(filename) API: - numbers.n_branches, n_missing_branches for totals - missing_branch_arcs() for per-line branch detail - Return branch data in /snapshot response alongside line coverage - Python shows accurate branch coverage (93.3% for demo app)
betterproto treats messages with all default values as falsy. CoverageSnapshotRequest(baseline=False) was falsy, causing per-test snapshots to be skipped. Changed 'if not request' to 'if request is None'. Also separated coverage initialization from HTTP server so coverage.py starts via start_coverage_collection() for the protobuf channel. Extracted take_coverage_snapshot() as reusable function.
- Remove HTTP server code (CoverageSnapshotHandler, start_coverage_server, _coverage_server global, HTTPServer import) - Replace with clean module-level state (_cov_instance, _source_root, _lock) - Extract _is_user_file() helper - stop_coverage_server() -> stop_coverage_collection() - Update module docstring to reflect protobuf-only architecture
Add _lock protection to stop_coverage_collection() to prevent race condition where shutdown sets _cov_instance=None while a snapshot is in progress on the background reader thread.
Add docs/coverage.md explaining coverage.py integration, branch coverage via arc tracking, thread safety, and limitations. Update environment-variables.md with coverage env vars section.
- Use getattr() for betterproto oneof field access (prevents AttributeError) - Fix _is_user_file path prefix collision (/app matching /application) - Add os.path.realpath() for symlink-safe path comparison - Add thread lock to start_coverage_collection() - Add double-init guard (stop existing instance before creating new) - Narrow */test* omit pattern to */tests/* and */test_*.py - Log failed file analysis at debug level instead of silent swallow
| # Start coverage collection early (before any SDK mode checks that might return early). | ||
| # Coverage data is accessed via protobuf channel (communicator handles requests). | ||
| from .coverage_server import start_coverage_collection | ||
| start_coverage_collection() |
There was a problem hiding this comment.
Coverage silently restarts and loses data on re-initialization
Medium Severity
start_coverage_collection() is called at line 166, before the cls._initialized early-return guard at line 187. When initialize() is called a second time, coverage.py is stopped and a fresh instance is created (the double-init guard restarts rather than skipping), silently erasing all accumulated coverage data. Then the _initialized check returns early, making the restart entirely unnecessary. Moving the call after the _initialized check, or changing the double-init guard to return early when already running, would prevent the data loss.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 2a383b0. Configure here.
- Move _cov_instance None check inside lock (TOCTOU race fix) - Fix branch counting to only include actual branch points, not all arcs
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4d156c4. Configure here.
| _cov_instance.erase() | ||
| _cov_instance.start() | ||
|
|
||
| return coverage |
There was a problem hiding this comment.
Missing try/finally leaves coverage permanently stopped on error
Medium Severity
In take_coverage_snapshot, _cov_instance.stop() is called at the top, but _cov_instance.erase() and _cov_instance.start() at the bottom are not protected by a try/finally. If any exception occurs during processing (e.g., get_data(), measured_files(), or data.lines() in the non-baseline path which has no per-file try/except), coverage is left permanently stopped. All subsequent snapshot requests will operate on a stopped instance that never restarts, producing empty or broken coverage data for the remainder of the session.
Reviewed by Cursor Bugbot for commit 4d156c4. Configure here.


Add code coverage collection using coverage.py to the Python SDK. When the CLI enables coverage, the SDK manages coverage.py programmatically and returns per-file line/branch coverage via protobuf.
How it works
TUSK_COVERAGE=trueenv varbranch=Trueduring initializationCoverageSnapshotRequestvia protobuf channelstop()→get_data()/analysis2()→erase()→start()cycleCoverageSnapshotResponsewith per-file dataKey design decisions
_analyze(): Uses coverage.py's private_analyze()API for arc-based branch tracking. The public API only provides aggregate branch counts. Documented as a known limitation.threading.Lock(). Safe for concurrent protobuf handler access.Files changed
drift/core/coverage_server.py— Coverage lifecycle management (new file)drift/core/communication/communicator.py— Coverage snapshot handler + betterproto fixesdrift/core/communication/types.py— Import new proto typesdrift/drift_sdk.py— Hook coverage start/stop into SDK lifecycledocs/coverage.md— SDK coverage internals documentationdocs/environment-variables.md— Coverage env vars sectionAlso includes (non-coverage)
getattr()fix:SetTimeTravelRequesthandler usedif not request:which fails for messages with all-default values (betterproto falsy bug). Fixed withgetattr(cli_message, "field", None). Applied to both time travel and coverage handlers.TODOs before merge
tusk-drift-schemasdependency once schemas are published_analyze()API stabilityEdge cases / gotchas
_is_user_fileusesos.septrailing separator to prevent/appmatching/applicationos.path.realpath()resolves symlinks for consistent path comparisonstart_coverage_collection()twice stops the existing instance first*/test*omit was too broad (matchedtestimony.py), narrowed to*/tests/*and*/test_*.py