Add extensive pycocotools comparison tests with large synthetic datasets by Copilot · Pull Request #71 · MiXaiLL76/faster_coco_eval

Copilot · 2026-02-04T00:57:26Z

Motivation

Existing tests validate equality with pycocotools using only 1-2 small examples (3-9 annotations). Insufficient coverage across dataset scales and task types blocks production adoption.

Modification

New test module: test_extensive_pycocotools_comparison.py

12 parameterized tests validating bit-for-bit equality (tolerance 1e-10) across:
- Object detection (bbox): 10/50/100 images, 50-1500 annotations
- Instance segmentation (segm): Same scales with RLE masks
- Keypoint detection: Same scales with 17 keypoints/instance
- Edge cases: perfect predictions, low confidence, mixed object sizes

Synthetic data generation

Realistic COCO-formatted datasets with proper size distributions (small/medium/large)
Variable image dimensions, prediction noise, false positives
Keypoint visibility/occlusion modeling

Documentation

tests/README.md: Test suite organization and execution guide
README.md: Added "Testing & Reliability" section

BC-breaking (Optional)

None. Tests only.

Use cases (Optional)

Validates drop-in replacement guarantee at scale. All existing 91 tests pass.

Checklist

✅ Pre-commit/linting applied
✅ 12 new tests added, all pass (91 existing + 12 new)
N/A - Test-only changes
✅ Documentation updated (README.md, tests/README.md)

Original prompt

This section details on the original issue you should resolve

<issue_title>Add more extensive tests checking equality with pycocotools</issue_title>
<issue_description>Hi, I help maintain the rf-detr repo . You have an open PR in this repo. In looking at tests in this Repo, we see that you check equality of results with pycocotools only with one example. We would like to see a larger test suite to ensure equality across a wider range of possible inputs. Would you be willing to provide either a test script or adding to your test suite here comparing the 2 methods using a full suite of model predictions on Coco for all task types?

Obviously, we would prefer that rf-detr and rf-detr-seg are used to generate the predictions jsons for objdet and instseg respectively, whereas we don't have a keypoints model ready to use, so a yolo-pose model could generate said predictions json.

We understand the value of faster evaluation and would like to use this repo, but the above test is a bit of a blocker for us. Otherwise we will do the above test when our team finds time to prioritize it.

Thanks for your PR and your work maintaining this repo!

cc: @Borda @isaacrob</issue_description>

<agent_instructions>Adding tests aligned with actual code base and in particular validation on larger batch</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes Add more extensive tests checking equality with pycocotools #70

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>

…d maxDiff Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>

Copilot

Pull request overview

This PR adds extensive validation tests to ensure faster_coco_eval produces identical results to pycocotools across large-scale synthetic datasets. The addition addresses concerns about insufficient test coverage by creating comprehensive parameterized tests spanning object detection, instance segmentation, and keypoint detection tasks with realistic COCO-formatted data.

Changes:

Added 12 parameterized tests validating exact equality (tolerance 1e-10) across varying dataset scales (10/50/100 images, 50-1500 annotations)
Implemented synthetic COCO dataset generation with realistic size distributions, RLE masks, and keypoint visibility modeling
Created comprehensive documentation explaining test organization, execution, and validation criteria

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
tests/test_extensive_pycocotools_comparison.py	New test module with synthetic data generation and 12 parameterized tests validating equality across bbox, segmentation, and keypoint tasks
tests/README.md	New documentation detailing test suite organization, execution commands, and validation criteria
README.md	Added "Testing & Reliability" section highlighting comprehensive test coverage and pycocotools comparison

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-06T12:01:43Z

+                    keypoints = []
+                    num_keypoints = 17
+                    num_visible = 0
+                    for i in range(num_keypoints):


The variable i is declared in the loop but never used within the loop body. Consider using _ instead to indicate it's intentionally unused: for _ in range(num_keypoints):

Suggested change

for i in range(num_keypoints):

for _ in range(num_keypoints):

Copilot · 2026-02-06T12:01:44Z

+            if iou_type == "keypoints":
+                # Create dummy keypoints
+                keypoints = []
+                for i in range(17):


The loop variable i is unused in the loop body. Replace with _ to indicate intentional discard: for _ in range(17):

Suggested change

for i in range(17):

for _ in range(17):

Copilot · 2026-02-06T12:01:44Z

+
+### Comprehensive Test Suite
+
+- **90+ automated tests** covering all functionality


The count '90+ automated tests' may become outdated as tests are added or removed. Consider using a more maintainable phrasing like 'Comprehensive automated test suite' or implement dynamic test counting if precision is important.

Suggested change

- **90+ automated tests** covering all functionality

- **Comprehensive automated test suite** covering all functionality

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Initial plan

5154d3a

Copilot AI assigned Copilot and Borda Feb 4, 2026

Copilot started work on behalf of Borda February 4, 2026 00:57 View session

Copilot AI and others added 3 commits February 4, 2026 01:01

Add extensive pycocotools comparison tests for all task types

820a74f

Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>

Add comprehensive test documentation in README and tests/README.md

97caa7b

Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>

Address code review feedback - improve documentation and remove unuse…

2ff2a7b

…d maxDiff Co-authored-by: Borda <6035284+Borda@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add extensive tests for equality with pycocotools~~ Add extensive pycocotools comparison tests with large synthetic datasets Feb 4, 2026

Copilot AI requested a review from Borda February 4, 2026 01:05

Copilot finished work on behalf of Borda February 4, 2026 01:05

MiXaiLL76 mentioned this pull request Feb 4, 2026

Add more extensive tests checking equality with pycocotools #70

Closed

Borda requested review from MiXaiLL76 and Copilot February 6, 2026 11:58

Borda marked this pull request as ready for review February 6, 2026 11:58

Copilot started reviewing on behalf of Borda February 6, 2026 11:58 View session

Copilot AI reviewed Feb 6, 2026

View reviewed changes

MiXaiLL76 merged commit 768455a into main Feb 11, 2026
18 checks passed

MiXaiLL76 mentioned this pull request Feb 21, 2026

update version #76

Merged

7 tasks

Borda deleted the copilot/add-tests-for-pycocotools-equality branch February 22, 2026 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add extensive pycocotools comparison tests with large synthetic datasets#71

Add extensive pycocotools comparison tests with large synthetic datasets#71
MiXaiLL76 merged 4 commits intomainfrom
copilot/add-tests-for-pycocotools-equality

Copilot AI commented Feb 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI Feb 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	for i in range(num_keypoints):
	for _ in range(num_keypoints):


		### Comprehensive Test Suite

		- 90+ automated tests covering all functionality

	- 90+ automated tests covering all functionality
	- Comprehensive automated test suite covering all functionality

Conversation

Copilot AI commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

BC-breaking (Optional)

Use cases (Optional)

Checklist

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Copilot AI commented Feb 4, 2026 •

edited

Loading