Skip to content

Add adaptive TP/FP/FN validation mosaic export#2271

Merged
Borda merged 13 commits into
roboflow:developfrom
K-saif:feat-validation-visualization
Jun 26, 2026
Merged

Add adaptive TP/FP/FN validation mosaic export#2271
Borda merged 13 commits into
roboflow:developfrom
K-saif:feat-validation-visualization

Conversation

@K-saif

@K-saif K-saif commented May 25, 2026

Copy link
Copy Markdown
Contributor
Before submitting
  • Self-reviewed the code
  • Updated documentation
  • Added/updated tests
  • All tests pass locally

Description

Adds an optional validation visualization flow for exporting per-image GT/TP/FP/FN mosaics during confusion-matrix benchmarking.

Generated mosaics improve qualitative error analysis by providing a side-by-side comparison of:

  • Ground Truth
  • True Positives
  • False Positives
  • False Negatives

Type of Change

  • ✨ New feature
  • 📝 Documentation update
  • 🧪 Test update

Motivation and Context

Current benchmarking utilities provide aggregate metrics but limited per-image visual inspection support.

This feature adds optional qualitative visualization exports to simplify:

  • model debugging
  • localization error inspection
  • false positive analysis
  • false negative analysis

without affecting existing benchmark behavior.

Changes Made

  • added optional save_result_images and save_directory_path support in ConfusionMatrix.benchmark(...)
  • added per-image GT/TP/FP/FN mosaic export under a result/ directory
  • added class-consistent bounding-box coloring across panels
  • added adaptive label and bounding-box scaling based on image resolution
  • added white outer borders and center dividers for readability
  • updated benchmark documentation
  • added regression coverage in test_detection.py

Testing

  • I have tested this code locally
  • I have added unit tests that prove the feature works
  • All new and existing tests pass

Additional Notes

Existing benchmark behavior remains unchanged unless save_result_images=True is enabled.

@K-saif K-saif requested a review from SkalskiP as a code owner May 25, 2026 10:11
@CLAassistant

CLAassistant commented May 25, 2026

Copy link
Copy Markdown

CLA assistant check
All committers have signed the CLA.

@codecov

codecov Bot commented May 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 92.80576% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 82%. Comparing base (2aa43bc) to head (a53ad80).

❌ Your project check has failed because the head coverage (82%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop   #2271    +/-   ##
========================================
  Coverage       82%     82%            
========================================
  Files           68      68            
  Lines         9369    9507   +138     
========================================
+ Hits          7677    7811   +134     
- Misses        1692    1696     +4     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an optional qualitative visualization export to ConfusionMatrix.benchmark(...), writing per-image 2x2 mosaics (GT / TP / FP / FN) to disk to aid error analysis during benchmarking.

Changes:

  • Added save_result_images and save_directory_path options to ConfusionMatrix.benchmark(...) to export validation mosaics under result/.
  • Implemented per-image panel rendering with class-consistent box coloring, labels, and grid styling.
  • Added a regression test for image export and updated benchmarking documentation to mention the feature.

Assessment (n/5):

  • Code quality: 3/5
  • Tests: 2/5
  • Docs: 4/5

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/supervision/metrics/detection.py Adds the visualization export pipeline and new benchmark(...) parameters to save per-image GT/TP/FP/FN mosaics.
tests/metrics/test_detection.py Adds a regression test that exercises save_result_images=True and checks a saved mosaic is created/readable.
docs/how_to/benchmark_a_model.md Documents the new save_result_images option and what gets written to result/.

Comment thread src/supervision/metrics/detection.py
Comment thread src/supervision/metrics/detection.py Outdated
Comment thread src/supervision/metrics/detection.py Outdated
Comment thread src/supervision/metrics/detection.py Outdated
Comment thread src/supervision/metrics/detection.py Outdated
Comment thread src/supervision/metrics/detection.py Outdated
Comment thread src/supervision/metrics/detection.py
Comment thread tests/metrics/test_detection.py Outdated
@K-saif

K-saif commented May 31, 2026

Copy link
Copy Markdown
Contributor Author

Hi @Borda, i have resolved all the issues raised by the copilot in the latest commit. Can you please review the latest commit, happy to address any feedback or make improvements if needed.

@K-saif

K-saif commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Hi @Borda @SkalskiP, just checking on this PR, happy to address any feedback or make improvements if needed

@Borda Borda self-assigned this Jun 17, 2026
@Borda Borda requested a review from Copilot June 17, 2026 09:28

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comment thread tests/metrics/test_detection.py
Comment thread docs/how_to/benchmark_a_model.md Outdated
@K-saif

K-saif commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Addressed the Copilot review comments and updated the implementation/tests accordingly. CI is now passing. Please let me know if there is anything else that should be adjusted.

Borda and others added 2 commits June 26, 2026 20:26
- remove top-level cv2/annotator imports; lazy-load inside rendering functions
- remove save_result_images bool; save_directory_path is now keyword-only after metric_target
- drop hardcoded result/ subdirectory from benchmark output path
- propagate metric_target into _split_detections_by_outcome for correct OBB IoU dispatch
- add filename collision UserWarning in benchmark loop
- remove dead/unreachable combined None-check in _split_detections_by_outcome
- add Google-style docstrings to all 5 new private visualization functions
- add TestSplitDetectionsByOutcome covering 7 edge cases (empty inputs, cross-class, confidence-None)
- fix FP/FN pixel assertions to check interior box pixels rather than border/title regions
- fix benchmark_a_model.md: full panel names, add Visual Benchmarking section, update API examples

---
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@Borda

Borda commented Jun 26, 2026

Copy link
Copy Markdown
Member
from pathlib import Path

import cv2
import numpy as np

import supervision as sv
from supervision.assets import ImageAssets, download_assets

ROOT = Path(__file__).resolve().parents[2]
OUTPUT_DIR = ROOT / "output" / "validation_visualization"
SOURCE_IMAGE_PATH = OUTPUT_DIR / "input" / "people_walking_crop.jpg"
MODEL_PATH = ROOT / "examples" / "count_people_in_zone" / "yolo11x.pt"
PEOPLE_WALKING_CROP = (1120, 260, 1680, 860)
PERSON_CLASS_ID = 0
EMPTY_FLOOR_FALSE_POSITIVE = np.array([[116, 116, 198, 256]], dtype=np.float32)


def _load_people_walking_crop() -> np.ndarray:
    image_path = Path(download_assets(ImageAssets.PEOPLE_WALKING))
    image = cv2.imread(str(image_path))
    if image is None:
        raise RuntimeError(f"Could not read asset image from {image_path}")

    x_min, y_min, x_max, y_max = PEOPLE_WALKING_CROP
    return image[y_min:y_max, x_min:x_max]


def _build_dataset(image_path: Path) -> sv.DetectionDataset:
    targets = sv.Detections(
        xyxy=np.array(
            [
                [19, 247, 82, 397],
                [326, 60, 395, 225],
                [343, 214, 422, 385],
                [326, 444, 402, 560],
            ],
            dtype=np.float32,
        ),
        class_id=np.array([0, 0, 0, 0]),
    )
    return sv.DetectionDataset(
        classes=["person"],
        images=[str(image_path)],
        annotations={str(image_path): targets},
    )


def _predict(image: np.ndarray) -> sv.Detections:
    if not MODEL_PATH.exists():
        raise FileNotFoundError(
            f"Expected local YOLO weights at {MODEL_PATH}. "
            "Place the model file there before running this example."
        )

    from ultralytics import YOLO

    model = YOLO(str(MODEL_PATH))
    result = model(image, conf=0.5, verbose=False)[0]
    detections = sv.Detections.from_ultralytics(result)

    if detections.class_id is None or detections.confidence is None:
        return sv.Detections.empty()

    is_person = detections.class_id == PERSON_CLASS_ID
    centers = (detections.xyxy[:, :2] + detections.xyxy[:, 2:]) / 2
    is_intentionally_missed_runner = (
        (centers[:, 0] > 330) & (centers[:, 1] > 170) & (centers[:, 1] < 370)
    )
    keep_mask = is_person & ~is_intentionally_missed_runner

    xyxy = np.concatenate(
        [detections.xyxy[keep_mask], EMPTY_FLOOR_FALSE_POSITIVE],
        axis=0,
    )
    confidence = np.concatenate(
        [detections.confidence[keep_mask], np.array([0.99], dtype=np.float32)],
        axis=0,
    )

    return sv.Detections(
        xyxy=xyxy,
        confidence=confidence,
        class_id=np.zeros(len(xyxy), dtype=int),
    )


def main() -> None:
    SOURCE_IMAGE_PATH.parent.mkdir(parents=True, exist_ok=True)
    validation_mosaic_path = OUTPUT_DIR / SOURCE_IMAGE_PATH.name
    confusion_matrix_path = OUTPUT_DIR / "confusion_matrix.png"
    validation_mosaic_path.unlink(missing_ok=True)
    confusion_matrix_path.unlink(missing_ok=True)
    (OUTPUT_DIR / "synthetic_validation.jpg").unlink(missing_ok=True)
    (OUTPUT_DIR / "input" / "synthetic_validation.jpg").unlink(missing_ok=True)

    image = _load_people_walking_crop()
    if not cv2.imwrite(str(SOURCE_IMAGE_PATH), image):
        raise RuntimeError(f"Could not write source image to {SOURCE_IMAGE_PATH}")

    dataset = _build_dataset(SOURCE_IMAGE_PATH)

    confusion_matrix = sv.ConfusionMatrix.benchmark(
        dataset=dataset,
        callback=_predict,
        conf_threshold=0.5,
        iou_threshold=0.5,
        save_directory_path=OUTPUT_DIR,
    )
    confusion_matrix.plot(
        save_path=str(confusion_matrix_path),
        normalize=False,
    )

    print(f"Validation mosaic: {validation_mosaic_path}")
    print(f"Confusion matrix: {confusion_matrix_path}")


if __name__ == "__main__":
    main()
people_walking_crop

@Borda Borda added the enhancement New feature or request label Jun 26, 2026
@Borda Borda merged commit 57bb5e7 into roboflow:develop Jun 26, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants