Add adaptive TP/FP/FN validation mosaic export by K-saif · Pull Request #2271 · roboflow/supervision

K-saif · 2026-05-25T10:11:12Z

Before submitting

Self-reviewed the code
Updated documentation
Added/updated tests
All tests pass locally

Description

Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.

Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:

Ground Truth
True Positives
False Positives
False Negatives

Type of Change

✨ New feature
📝 Documentation update
🧪 Test update

Motivation and Context

Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.

This feature adds optional qualitative visualization exports to simplify:

model debugging
localization error inspection
false positive analysis
false negative analysis

without affecting existing benchmark behavior.

Changes Made

added optional save_result_images and save_directory_path support in ConfusionMatrix.benchmark(...)
added per-image GT/TP/FP/FN mosaic export under a result/ directory
added class-consistent bounding-box coloring across panels
added adaptive label and bounding-box scaling based on image resolution
added white outer borders and center dividers for readability
updated benchmark documentation
added regression coverage in test_detection.py

Testing

I have tested this code locally
I have added unit tests that prove the feature works
All new and existing tests pass

Additional Notes

Existing benchmark behavior remains unchanged unless save_result_images=True is enabled.

CLAassistant · 2026-05-25T10:11:22Z

All committers have signed the CLA.

codecov · 2026-05-26T16:42:25Z

Codecov Report

❌ Patch coverage is 92.80576% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82%. Comparing base (2aa43bc) to head (a53ad80).

❌ Your project check has failed because the head coverage (82%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #2271    +/-   ##
========================================
  Coverage       82%     82%            
========================================
  Files           68      68            
  Lines         9369    9507   +138     
========================================
+ Hits          7677    7811   +134     
- Misses        1692    1696     +4

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.

Changes:

Added save_result_images and save_directory_path options to ConfusionMatrix.benchmark(...) to export validation mosaics under result/.
Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
Added a regression test for image export and updated benchmarking documentation to mention the feature.

Assessment (n/5):

Code quality: 3/5
Tests: 2/5
Docs: 4/5

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
`src/supervision/metrics/detection.py`	Adds the visualization export pipeline and new `benchmark(...)` parameters to save per-image GT/TP/FP/FN mosaics.
`tests/metrics/test_detection.py`	Adds a regression test that exercises `save_result_images=True` and checks a saved mosaic is created/readable.
`docs/how_to/benchmark_a_model.md`	Documents the new `save_result_images` option and what gets written to `result/`.

K-saif · 2026-05-31T05:41:32Z

Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed.

K-saif · 2026-06-17T07:36:43Z

Hi @Borda @SkalskiP, just checking on this PR, happy to address any feedback or make improvements if needed

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

K-saif · 2026-06-18T08:33:06Z

Addressed the Copilot review comments and updated the implementation/tests accordingly. CI is now passing. Please let me know if there is anything else that should be adjusted.

- remove top-level cv2/annotator imports; lazy-load inside rendering functions - remove save_result_images bool; save_directory_path is now keyword-only after metric_target - drop hardcoded result/ subdirectory from benchmark output path - propagate metric_target into _split_detections_by_outcome for correct OBB IoU dispatch - add filename collision UserWarning in benchmark loop - remove dead/unreachable combined None-check in _split_detections_by_outcome - add Google-style docstrings to all 5 new private visualization functions - add TestSplitDetectionsByOutcome covering 7 edge cases (empty inputs, cross-class, confidence-None) - fix FP/FN pixel assertions to check interior box pixels rather than border/title regions - fix benchmark_a_model.md: full panel names, add Visual Benchmarking section, update API examples --- Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>

Borda · 2026-06-26T19:31:44Z

from pathlib import Path

import cv2
import numpy as np

import supervision as sv
from supervision.assets import ImageAssets, download_assets

ROOT = Path(__file__).resolve().parents[2]
OUTPUT_DIR = ROOT / "output" / "validation_visualization"
SOURCE_IMAGE_PATH = OUTPUT_DIR / "input" / "people_walking_crop.jpg"
MODEL_PATH = ROOT / "examples" / "count_people_in_zone" / "yolo11x.pt"
PEOPLE_WALKING_CROP = (1120, 260, 1680, 860)
PERSON_CLASS_ID = 0
EMPTY_FLOOR_FALSE_POSITIVE = np.array([[116, 116, 198, 256]], dtype=np.float32)


def _load_people_walking_crop() -> np.ndarray:
    image_path = Path(download_assets(ImageAssets.PEOPLE_WALKING))
    image = cv2.imread(str(image_path))
    if image is None:
        raise RuntimeError(f"Could not read asset image from {image_path}")

    x_min, y_min, x_max, y_max = PEOPLE_WALKING_CROP
    return image[y_min:y_max, x_min:x_max]


def _build_dataset(image_path: Path) -> sv.DetectionDataset:
    targets = sv.Detections(
        xyxy=np.array(
            [
                [19, 247, 82, 397],
                [326, 60, 395, 225],
                [343, 214, 422, 385],
                [326, 444, 402, 560],
            ],
            dtype=np.float32,
        ),
        class_id=np.array([0, 0, 0, 0]),
    )
    return sv.DetectionDataset(
        classes=["person"],
        images=[str(image_path)],
        annotations={str(image_path): targets},
    )


def _predict(image: np.ndarray) -> sv.Detections:
    if not MODEL_PATH.exists():
        raise FileNotFoundError(
            f"Expected local YOLO weights at {MODEL_PATH}. "
            "Place the model file there before running this example."
        )

    from ultralytics import YOLO

    model = YOLO(str(MODEL_PATH))
    result = model(image, conf=0.5, verbose=False)[0]
    detections = sv.Detections.from_ultralytics(result)

    if detections.class_id is None or detections.confidence is None:
        return sv.Detections.empty()

    is_person = detections.class_id == PERSON_CLASS_ID
    centers = (detections.xyxy[:, :2] + detections.xyxy[:, 2:]) / 2
    is_intentionally_missed_runner = (
        (centers[:, 0] > 330) & (centers[:, 1] > 170) & (centers[:, 1] < 370)
    )
    keep_mask = is_person & ~is_intentionally_missed_runner

    xyxy = np.concatenate(
        [detections.xyxy[keep_mask], EMPTY_FLOOR_FALSE_POSITIVE],
        axis=0,
    )
    confidence = np.concatenate(
        [detections.confidence[keep_mask], np.array([0.99], dtype=np.float32)],
        axis=0,
    )

    return sv.Detections(
        xyxy=xyxy,
        confidence=confidence,
        class_id=np.zeros(len(xyxy), dtype=int),
    )


def main() -> None:
    SOURCE_IMAGE_PATH.parent.mkdir(parents=True, exist_ok=True)
    validation_mosaic_path = OUTPUT_DIR / SOURCE_IMAGE_PATH.name
    confusion_matrix_path = OUTPUT_DIR / "confusion_matrix.png"
    validation_mosaic_path.unlink(missing_ok=True)
    confusion_matrix_path.unlink(missing_ok=True)
    (OUTPUT_DIR / "synthetic_validation.jpg").unlink(missing_ok=True)
    (OUTPUT_DIR / "input" / "synthetic_validation.jpg").unlink(missing_ok=True)

    image = _load_people_walking_crop()
    if not cv2.imwrite(str(SOURCE_IMAGE_PATH), image):
        raise RuntimeError(f"Could not write source image to {SOURCE_IMAGE_PATH}")

    dataset = _build_dataset(SOURCE_IMAGE_PATH)

    confusion_matrix = sv.ConfusionMatrix.benchmark(
        dataset=dataset,
        callback=_predict,
        conf_threshold=0.5,
        iou_threshold=0.5,
        save_directory_path=OUTPUT_DIR,
    )
    confusion_matrix.plot(
        save_path=str(confusion_matrix_path),
        normalize=False,
    )

    print(f"Validation mosaic: {validation_mosaic_path}")
    print(f"Confusion matrix: {confusion_matrix_path}")


if __name__ == "__main__":
    main()

feat(metrics): add TP/FP/FN validation mosaic export

18e5e94

K-saif requested a review from SkalskiP as a code owner May 25, 2026 10:11

pre-commit-ci Bot and others added 4 commits May 25, 2026 10:13

fix(pre_commit): 🎨 auto format pre-commit hooks

89de32e

Fix lint and typing issues

e492af0

fix(pre_commit): 🎨 auto format pre-commit hooks

f1bbd84

fix mypy issue

4285062

Borda requested a review from Copilot May 26, 2026 16:40

Copilot started reviewing on behalf of Borda May 26, 2026 16:40 View session

Copilot AI reviewed May 26, 2026

View reviewed changes

K-saif and others added 3 commits May 27, 2026 13:28

fix all issues and suggestions

08e5b60

fix(pre_commit): 🎨 auto format pre-commit hooks

6695d7f

Align validation visualization matching with confusion matrix

5b31583

github-actions Bot added the has conflicts label Jun 6, 2026

Borda self-assigned this Jun 17, 2026

Borda requested a review from Copilot June 17, 2026 09:28

Copilot started reviewing on behalf of Borda June 17, 2026 09:29 View session

Copilot AI reviewed Jun 17, 2026

View reviewed changes

Comment thread tests/metrics/test_detection.py

Comment thread docs/how_to/benchmark_a_model.md Outdated

K-saif added 2 commits June 18, 2026 13:14

fix: address copilot review comments

3e848b2

Merge branch 'develop' into feat-validation-visualization

2f22996

github-actions Bot removed the has conflicts label Jun 18, 2026

fix(pre_commit): 🎨 auto format pre-commit hooks

1793205

Borda and others added 2 commits June 26, 2026 20:26

Merge branch 'develop' into feat-validation-visualization

a53ad80

Borda added the enhancement New feature or request label Jun 26, 2026

Borda approved these changes Jun 26, 2026

View reviewed changes

Borda merged commit 57bb5e7 into roboflow:develop Jun 26, 2026
26 checks passed

Uh oh!

Conversation

K-saif commented May 25, 2026

Description

Type of Change

Motivation and Context

Changes Made

Testing

Additional Notes

Uh oh!

CLAassistant commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

K-saif commented May 31, 2026

Uh oh!

K-saif commented Jun 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

K-saif commented Jun 18, 2026

Uh oh!

Borda commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CLAassistant commented May 25, 2026 •

edited

Loading

codecov Bot commented May 26, 2026 •

edited

Loading

Borda commented Jun 26, 2026 •

edited

Loading