Add adaptive TP/FP/FN validation mosaic export#2271
Conversation
Codecov Report❌ Patch coverage is ❌ Your project check has failed because the head coverage (82%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2271 +/- ##
========================================
Coverage 82% 82%
========================================
Files 68 68
Lines 9369 9507 +138
========================================
+ Hits 7677 7811 +134
- Misses 1692 1696 +4 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.
Changes:
- Added
save_result_imagesandsave_directory_pathoptions toConfusionMatrix.benchmark(...)to export validation mosaics underresult/. - Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
- Added a regression test for image export and updated benchmarking documentation to mention the feature.
Assessment (n/5):
- Code quality: 3/5
- Tests: 2/5
- Docs: 4/5
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/supervision/metrics/detection.py |
Adds the visualization export pipeline and new benchmark(...) parameters to save per-image GT/TP/FP/FN mosaics. |
tests/metrics/test_detection.py |
Adds a regression test that exercises save_result_images=True and checks a saved mosaic is created/readable. |
docs/how_to/benchmark_a_model.md |
Documents the new save_result_images option and what gets written to result/. |
|
Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed. |
|
Addressed the Copilot review comments and updated the implementation/tests accordingly. CI is now passing. Please let me know if there is anything else that should be adjusted. |
- remove top-level cv2/annotator imports; lazy-load inside rendering functions - remove save_result_images bool; save_directory_path is now keyword-only after metric_target - drop hardcoded result/ subdirectory from benchmark output path - propagate metric_target into _split_detections_by_outcome for correct OBB IoU dispatch - add filename collision UserWarning in benchmark loop - remove dead/unreachable combined None-check in _split_detections_by_outcome - add Google-style docstrings to all 5 new private visualization functions - add TestSplitDetectionsByOutcome covering 7 edge cases (empty inputs, cross-class, confidence-None) - fix FP/FN pixel assertions to check interior box pixels rather than border/title regions - fix benchmark_a_model.md: full panel names, add Visual Benchmarking section, update API examples --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Before submitting
Description
Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.
Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:
Type of Change
Motivation and Context
Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.
This feature adds optional qualitative visualization exports to simplify:
without affecting existing benchmark behavior.
Changes Made
save_result_imagesandsave_directory_pathsupport inConfusionMatrix.benchmark(...)result/directorytest_detection.pyTesting
Additional Notes
Existing benchmark behavior remains unchanged unless
save_result_images=Trueis enabled.