|
| 1 | +# Test Suite Documentation |
| 2 | + |
| 3 | +This directory contains the test suite for `faster_coco_eval`, which validates that the library produces identical results to `pycocotools` while being significantly faster. |
| 4 | + |
| 5 | +## Test Organization |
| 6 | + |
| 7 | +### Core Functionality Tests |
| 8 | +- **test_basic.py** - Basic COCO evaluation functionality |
| 9 | +- **test_coco_metric.py** - COCO metrics with pycocotools comparison (small examples) |
| 10 | +- **test_keypoints.py** - Keypoint evaluation |
| 11 | +- **test_cocoapi_fake_data.py** - Tests with synthetic data |
| 12 | + |
| 13 | +### Extensive Comparison Tests |
| 14 | +- **test_extensive_pycocotools_comparison.py** - **NEW**: Comprehensive validation against pycocotools with large synthetic datasets |
| 15 | + |
| 16 | +### Dataset-Specific Tests |
| 17 | +- **test_lvis_metric.py** - LVIS dataset support |
| 18 | +- **test_crowdpose.py** - CrowdPose keypoints dataset |
| 19 | + |
| 20 | +### API and Integration Tests |
| 21 | +- **test_init_pycocotools.py** - Drop-in replacement compatibility |
| 22 | +- **test_torchmetrics.py** - PyTorch integration (if available) |
| 23 | +- **test_mask_api.py** - Mask utilities |
| 24 | +- **test_boundary.py** - Boundary evaluation |
| 25 | + |
| 26 | +### Visualization and Utilities |
| 27 | +- **test_extra_draw.py**, **test_extra_utils.py**, **test_simple_extra.py** - Visualization features |
| 28 | +- **test_ranges.py**, **test_dataset.py** - Utility functions |
| 29 | + |
| 30 | +## Extensive PyCocoTools Comparison Tests |
| 31 | + |
| 32 | +The `test_extensive_pycocotools_comparison.py` module provides comprehensive validation that `faster_coco_eval` produces **identical results** to `pycocotools` across a wide range of scenarios. |
| 33 | + |
| 34 | +### Test Coverage |
| 35 | + |
| 36 | +#### Object Detection (BBox) Tests |
| 37 | +Tests bounding box detection with datasets of varying sizes: |
| 38 | +- **Small dataset**: 10 images, 5 categories, ~50 annotations |
| 39 | +- **Medium dataset**: 50 images, 10 categories, ~500 annotations |
| 40 | +- **Large dataset**: 100 images, 20 categories, ~1500 annotations |
| 41 | + |
| 42 | +Each test validates that both libraries produce identical mAP, mAP@50, mAP@75, and size-specific metrics (small/medium/large objects). |
| 43 | + |
| 44 | +#### Instance Segmentation Tests |
| 45 | +Tests segmentation masks with the same dataset size variations as bbox tests. Validates pixel-level mask IoU calculations match exactly between implementations. |
| 46 | + |
| 47 | +#### Keypoint Detection Tests |
| 48 | +Tests keypoint pose estimation with datasets containing: |
| 49 | +- **Small dataset**: 10 images with 17 keypoints per person |
| 50 | +- **Medium dataset**: 50 images with multiple people per image |
| 51 | +- **Large dataset**: 100 images with varied keypoint visibility |
| 52 | + |
| 53 | +Validates that OKS (Object Keypoint Similarity) calculations are identical. |
| 54 | + |
| 55 | +#### Edge Cases |
| 56 | +- **Perfect predictions**: All predictions match ground truth exactly (IoU=1.0) |
| 57 | +- **Low confidence predictions**: Tests with very low-scoring detections |
| 58 | +- **Mixed object sizes**: Validates correct assignment to small/medium/large categories |
| 59 | + |
| 60 | +### Test Data Generation |
| 61 | + |
| 62 | +The tests use **synthetic but realistic** COCO-formatted datasets that mimic actual model predictions: |
| 63 | + |
| 64 | +- **Varied image sizes**: Random dimensions between 400x400 and 800x800 pixels |
| 65 | +- **Realistic bounding boxes**: Objects categorized as small (<32²), medium (32²-96²), or large (>96²) |
| 66 | +- **Segmentation masks**: RLE-encoded binary masks matching bbox regions |
| 67 | +- **Keypoint annotations**: 17 keypoints per instance with realistic visibility flags |
| 68 | +- **Prediction noise**: Simulated detection errors with bbox jitter and confidence scores |
| 69 | +- **False positives**: Includes spurious detections to test precision/recall |
| 70 | + |
| 71 | +### Running the Tests |
| 72 | + |
| 73 | +Run all extensive comparison tests: |
| 74 | +```bash |
| 75 | +cd tests/ |
| 76 | +pytest test_extensive_pycocotools_comparison.py -v |
| 77 | +``` |
| 78 | + |
| 79 | +Run specific test categories: |
| 80 | +```bash |
| 81 | +# Only bbox tests |
| 82 | +pytest test_extensive_pycocotools_comparison.py -k "bbox" -v |
| 83 | + |
| 84 | +# Only segmentation tests |
| 85 | +pytest test_extensive_pycocotools_comparison.py -k "segmentation" -v |
| 86 | + |
| 87 | +# Only keypoint tests |
| 88 | +pytest test_extensive_pycocotools_comparison.py -k "keypoints" -v |
| 89 | + |
| 90 | +# Only large dataset tests |
| 91 | +pytest test_extensive_pycocotools_comparison.py -k "large" -v |
| 92 | +``` |
| 93 | + |
| 94 | +### Test Success Criteria |
| 95 | + |
| 96 | +Tests pass if and only if: |
| 97 | +1. All metrics (mAP, mAP@50, mAP@75, mAP_small, mAP_medium, mAP_large, etc.) are **numerically identical** between `faster_coco_eval` and `pycocotools` |
| 98 | +2. Floating-point comparison uses tolerance of `1e-10` (essentially exact) |
| 99 | +3. All intermediate calculations (IoU, OKS) produce identical results |
| 100 | + |
| 101 | +### Why These Tests Matter |
| 102 | + |
| 103 | +These extensive tests address the requirement for **confidence in correctness** when using `faster_coco_eval` as a drop-in replacement for `pycocotools`: |
| 104 | + |
| 105 | +- **Broader coverage**: Tests hundreds to thousands of annotations vs. single-digit examples in original tests |
| 106 | +- **Real-world scenarios**: Synthetic data mimics actual model predictions with realistic error patterns |
| 107 | +- **All task types**: Validates bbox, segmentation, and keypoints independently |
| 108 | +- **Edge cases**: Ensures correct behavior in corner cases that might not appear in small datasets |
| 109 | +- **Continuous validation**: Runs in CI/CD to catch any regression in numerical accuracy |
| 110 | + |
| 111 | +## Running All Tests |
| 112 | + |
| 113 | +Run the complete test suite: |
| 114 | +```bash |
| 115 | +cd tests/ |
| 116 | +pytest --cov=faster_coco_eval . |
| 117 | +``` |
| 118 | + |
| 119 | +Run tests for a specific Python version (CI/CD runs Python 3.9-3.13): |
| 120 | +```bash |
| 121 | +pytest --cov=faster_coco_eval . -v |
| 122 | +``` |
| 123 | + |
| 124 | +## Test Requirements |
| 125 | + |
| 126 | +Install test dependencies: |
| 127 | +```bash |
| 128 | +pip install "faster-coco-eval[tests]" |
| 129 | +``` |
| 130 | + |
| 131 | +Or from source: |
| 132 | +```bash |
| 133 | +cd /path/to/faster_coco_eval |
| 134 | +pip install -e ".[tests]" |
| 135 | +``` |
| 136 | + |
| 137 | +Required packages: |
| 138 | +- `pytest` - Test framework |
| 139 | +- `pytest-cov` - Coverage reporting |
| 140 | +- `parameterized` - Parameterized test cases |
| 141 | +- `pycocotools` - Original COCO API for comparison tests |
| 142 | +- `numpy` - Numerical operations |
| 143 | + |
| 144 | +## Contributing Tests |
| 145 | + |
| 146 | +When adding new features to `faster_coco_eval`, please: |
| 147 | + |
| 148 | +1. Add corresponding tests that validate **exact equality** with `pycocotools` behavior |
| 149 | +2. Use parameterized tests to cover multiple scenarios efficiently |
| 150 | +3. Generate synthetic test data programmatically for reproducibility |
| 151 | +4. Set `np.random.seed()` for deterministic test data |
| 152 | +5. Document what each test validates and why it's important |
0 commit comments